True autonomy in the real world is not bound to a single task under fixed environmental dynamics. In fact, tasks are hardly ever clearly specified, e.g. through well-shaped rewards, and it really is left to the agent to determine what tasks to pursue in order to prepare for unknown future challenges.
In, we equip a Reinforcement Learning (RL) agent with different abilities that support this self-organized learning process and make it efficient. The goal is to have an agent that explores its environment and thereby figures out how to solve a number of tasks that require it to manipulate different parts of the state space. Once the agent learns all tasks sufficiently well, we can ask the agent to solve a certain task by manipulating the corresponding part of the state space until a goal state is achieved. In the end, the agent should be capable of controlling all controllable parts of its state.
The main abilities we equip the agent with are depicted in the figure and are described as follows: A task selector (part [2] in the figure) that allows the agent to distribute its available resource budget among all possible tasks it could learn (part [1]) such that, at any given time, most of the resource budget is spent on tasks that the agent can make the most progress in. A task planner (part [3]) that learns a potentially existing inter-dependency between tasks, i.e. if one task can be solved faster or is only enabled by another task. A dependency graph (part [4]) that the agent uses to plan subtask sequences that allow it to solve a final desired task. A subgoal generator (part [5]) that generates for each subtask in the plan a goal state that, if reached by the agent, makes it easier to solve the next subtask. All components are learned concurrently from an intrinsic motivation signal (part [8]) that is computed from the experience the agent collects while autonomously interacting with the environment (parts [6, 7] in the figure).
In this work, we pair an explicit (or fixed) but general planning structure with data-driven learning to solve challenging control tasks. By extracting information about the interrelationship between tasks from the data, the agent's planner can specialize on the specific set of problems it faces in the current environment. One next step in this line of research is to make the planning structure itself more flexible so that it can be fully learned or dynamically adapted to better fit the specific needs of the problems at hand.
Confronted with the open-ended learning setting, we select learning progress as the main driving force of exploration and as an additional training signal for differentiating between task-relevant and task-irrelevant information in the training data.
Another future direction of research is to find other types of intrinsic motivation that can drive the exploration behavior of the agent.
The code, the poster presented at NeurIPS 2019 and a 3 min summary video can be found here.
Our website uses cookies. Some of them are mandatory, while others allow us to improve your user experience on our website. The settings you have made can be edited at any time.
or
Essential
in2cookiemodal-selection
Required to save the user selection of the cookie settings.
3 months
be_lastLoginProvider
Required for the TYPO3 backend login to determine the time of the last login.
3 months
be_typo_user
This cookie tells the website whether a visitor is logged into the TYPO3 backend and has the rights to manage it.
Browser session
ROUTEID
These cookies are set to always direct the user to the same server.
Browser session
fe_typo_user
Enables frontend login.
Browser session
Videos
iframeswitch
Used to show all third-party contents.
3 months
yt-player-bandaid-host
Is used to display YouTube videos.
Persistent
yt-player-bandwidth
Is used to determine the optimal video quality based on the visitor's device and network settings.
Persistent
yt-remote-connected-devices
Saves the settings of the user's video player using embedded YouTube video.
Persistent
yt-remote-device-id
Saves the settings of the user's video player using embedded YouTube video.
Persistent
yt-player-headers-readable
Collects data about visitors' interaction with the site's video content - This data is used to make the site's video content more relevant to the visitor.
Persistent
yt-player-volume
Is used to save volume preferences for YouTube videos.
Persistent
yt-player-quality
Is used to save the quality settings for YouTube videos.
Persistent
yt-remote-session-name
Saves the settings of the user's video player using embedded YouTube video.
Browser session
yt-remote-session-app
Saves the settings of the user's video player using embedded YouTube video.
Browser session
yt-remote-fast-check-period
Saves the settings of the user's video player using embedded YouTube video.
Browser session
yt-remote-cast-installed
Saves the user settings when retrieving a YouTube video integrated on other web pages
Browser session
yt-remote-cast-available
Saves user settings when retrieving integrated YouTube videos.
Browser session
ANID
Used for targeting purposes to profile the interests of website visitors in order to display relevant and personalized Google advertising.
2 years
SNID
Google Maps - Google uses these cookies to store user preferences and information when you view pages with Google Maps.
1 month
SSID
Used to store information about how you use the site and what advertisements you saw before visiting this site, and to customize advertising on Google resources by remembering your recent searches, your previous interactions with an advertiser's ads or search results, and your visits to an advertiser's site.
6 months
1P_JAR
This cookie is used to support Google's advertising services.
1 month
SAPISID
Used for targeting purposes to profile the interests of website visitors in order to display relevant and personalized Google advertising.
2 years
APISID
Used for targeting purposes to profile the interests of website visitors in order to display relevant and personalized Google advertising.
6 months
HSID
Includes encrypted entries of your Google account and last login time to protect against attacks and data theft from form entries.
2 years
SID
Used for security purposes to store digitally signed and encrypted records of a user's Google Account ID and last login time, enabling Google to authenticate users, prevent fraudulent use of login credentials, and protect user data from unauthorized parties. This may also be used for targeting purposes to display relevant and personalized advertising content.
6 months
SIDCC
This cookie stores information about user settings and information for Google Maps.
3 months
NID
The NID cookie contains a unique ID that Google uses to store your preferences and other information.
6 months
CONSENT
This cookie tracks how you use a website to show you advertisements that may be of interest to you.
18 years
__Secure-3PAPISID
This cookie is used to support Google's advertising services.
2 years
__Secure-3PSID
This cookie is used to support Google's advertising services.
6 months
__Secure-3PSIDCC
This cookie is used to support Google's advertising services.
6 months