Distributed Intelligence

Policy Extraction

Extracting Strong Policies for Robotics Tasks from Zero-order Trajectory Optimizers

Solving high-dimensional, continuous robotic tasks is a challenging optimization problem. Model-based methods that rely on zero-order optimizers like the cross-entropy method (CEM) have so far shown strong performance and are considered state-of-the-art in the model-based reinforcement learning community. However, this success comes at the cost of high computational complexity, being therefore not suitable for real-time control. In this paper, we propose a technique to jointly optimize the trajectory and distill a policy, which is essential for fast execution in real robotic systems. Our method builds upon standard approaches, like guidance cost and dataset aggregation, and introduces a novel adaptive factor which prevents the optimizer from collapsing to the learner's behavior at the beginning of the training. The extracted policies reach unprecedented performance on challenging tasks as making a humanoid stand up and opening a door without reward shaping.

Publication:
Pinneri*, C., Sawant*, S., Blaes, S., Martius, G. Extracting Strong Policies for Robotics Tasks from Zero-order Trajectory Optimizers In 9th International Conference on Learning Representations (ICLR 2021), May 2021

Privacy settings

Our website uses cookies. Some of them are mandatory, while others allow us to improve your user experience on our website. The settings you have made can be edited at any time.

or

Essential

in2code

Videos

in2code
YouTube
Google