A fundamental property of our visual world is its compositional nature, combining distinct object entities that are characterised by their shape as well as physical properties including colours, reflection, and sound. As an agent acting in this environment, it is therefore crucial to develop a representation of objects in the environment, which allows visual performance that is robust to changes in viewing angles, partial occlusion, lighting conditions, and other aspects. Understanding the objectness and compositional nature of our world is key to generalising efficiently and robustly to novel scenarios. Furthermore, understanding and predicting object properties from their material and object-part relations is a critical element in tasks requiring dexterous manipulation.
While classical theories of vision emphasised the importance of object-centric cue integration, most existing training and evaluation tasks in modern machine vision are oblivious to the objectives that shaped human vision. These tasks often focus on pixel-level information, leading to machine vision systems deploying unstructured “feature-vectors” and behavioural strategies that likely deviate fundamentally from those of human strategies. While such representations may serve as effective shortcuts for specific downstream tasks like object classification, it is less likely that they will yield an accurate understanding of the visual world around us, or facilitate generalisation to new tasks. While large-scale foundation models have enabled significant progress recently, there remains inconsistency between human and machine behaviour in vision-centric tasks.
Theme A explores and investigates artificial vision systems to understand which inductive biases facilitate robust vision in open world conditions similar to human visual processing. The first project, A1, focuses on improving the performance of 3D object inference from video data in open-world settings building on the hypothesis that object-centric modelling of the world is a key inductive bias to achieve human-level robustness. The second project, A2, investigates how such object-centric inference algorithms, combined with other inductive biases like object-part relationships, can facilitate a general 3D world understanding for autonomous behaviour focusing on dexterous manipulation tasks in complex 3D environments.