Current robotic systems are limited by a lack of robust visual processing in the presence of camera movements and independent motion in the scene. These are however conditions which are essential for a robot to operate under. The goal of this project is to investigate the question if a robot can supervise itself to learn representations of visual data for robustly interpreting arbitrary dynamic scenes. Furthermore, we will investigate whether prior knowledge of the robots on its own action can help to make visual inference during ego-motion and independent motion in the scene more robust and efficient. In this way, the project also explores the potential computational benefits of jittering image acquisition that have been reported for humans.