Uni-Tübingen

Robust Vision Projects

In the third funding period the CRC 1233 "Robust Vision" is organized in ten research projects and one infrastructure project, which are supported by the central coordination project.

Research Theme A: Object-centric vision

A fundamental property of our visual world is its compositional nature, combining distinct object entities that are characterised by their shape as well as physical properties including colours, reflection, and sound. As an agent acting in this environment, it is therefore crucial to develop a representation of objects in the environment, which allows visual performance that is robust to changes in viewing angles, partial occlusion, lighting conditions, and other aspects. Understanding the objectness and compositional nature of our world is key to generalising efficiently and robustly to novel scenarios. Furthermore, understanding and predicting object properties from their material and object-part relations is a critical element in tasks requiring dexterous manipulation.
While classical theories of vision emphasised the importance of object-centric cue integration, most existing training and evaluation tasks in modern machine vision are oblivious to the objectives that shaped human vision. These tasks often focus on pixel-level information, leading to machine vision systems deploying unstructured “feature-vectors” and behavioural strategies that likely deviate fundamentally from those of human strategies. While such representations may serve as effective shortcuts for specific downstream tasks like object classification, it is less likely that they will yield an accurate understanding of the visual world around us, or facilitate generalisation to new tasks. While large-scale foundation models have enabled significant progress recently, there remains inconsistency between human and machine behaviour in vision-centric tasks.
Theme A explores and investigates artificial vision systems to understand which inductive biases facilitate robust vision in open world conditions similar to human visual processing. The first project, A1, focuses on improving the performance of 3D object inference from video data in open-world settings building on the hypothesis that object-centric modelling of the world is a key inductive bias to achieve human-level robustness. The second project, A2, investigates how such object-centric inference algorithms, combined with other inductive biases like object-part relationships, can facilitate a general 3D world understanding for autonomous behaviour focusing on dexterous manipulation tasks in complex 3D environments.

Research Theme B: Robust high-level vision in the human brain

Complementary to Theme A, which focuses on building machine vision systems to test how fundamental inductive biases such as objectness can facilitate the necessary level of efficiency for autonomous agent-centric vision, Theme B focuses on unravelling how high-level representations in the brain are shaped by such inductive biases. Thus, we ask how the human brain extracts and represents high-level information from the rich, dynamic visual inputs that one encounters during natural vision. Natural stimuli contain compositional spatiotemporal regularities that are shaped by the physical laws of our world as well as by the regularities of our environment and behaviour, and these compositional regularities can be exploited by circuits organized in a modular fashion. 
In addition, visual information can be integrated with prior knowledge and memories and other sensory inputs (e.g., audition) in order to build robust representations. The three projects in this research theme will closely collaborate to investigate the neural mechanisms underlying the extraction and representation of high-level stimulus information during robust vision in the human brain. All of our projects will share an interdisciplinary approach that integrates neurophysiological studies of the human brain at different spatio-temporal scales with computational modelling using artificial neural networks. We will use identical sensory stimuli (multisensory feature movie from B1, specifically manipulated natural movies from B2, and ground-truth motion stimuli from B3), collaborate on measurements, artificial neural network (ANN) models, and data-analysis approaches for the comparison between ANNs and human brain activity. Neural and behavioural data from all projects in Theme B will be shared in an open database.

Research Theme C: Active visual inference

One key property of agent-based vision is that agents actively sample the visual input. Projects in Theme C will investigate how active sampling strategies can help achieve robust visual perception. We will concentrate on uncertain environments, in which active sampling can be powerfully used for internal model building and efficient acquisition of visual information. We will investigate how probabilistic temporal regularities can be exploited to build internal models and how those models can be used for orienting and active sampling, using both measurements of human behaviour and artificial agents. In addition, we will explore eye movements in primates and humans as active visual inference in foveated systems, using dynamic video stimuli to integrate eye movements, neural responses in macaques and humans, and visual discrimination into a holistic model of gaze control. The two projects C1 and C2 are interconnected components of the broader research goal to understand and model the complex interactions between perception, cognition, and action in uncertain environments, with a focus on adaptive internal modelling and active selection of fixation locations and gaze control, respectively.

Research Theme D: Early information selection

Research Theme D is dedicated to modelling pre-cortical image transformations, particularly the question of how such transformations are optimised to dynamically select information from naturalistic visual stimuli. A key and shared focus of Theme D projects involves measuring and modelling neural responses to naturalistic movies along the two major visual pathways in mice: the retino-collicular and the geniculo-cortical pathways, and modelling these data with “digital-twin” models. The consolidation of former Projects TP10, TP12, TP13, along with the inclusion of PIs Franke, Macke, and Sinz into a single research theme, will expand the highly successful data and model sharing practised between former Projects TP10 and TP12 in the previous funding periods. This strategy will be fuelled by the commitment of all projects of Theme D to use the advanced naturalistic stimuli generated by project D1. Additionally, the projects will employ various common modelling techniques, including the analysis of optimised images derived from digital twin models, a methodology pioneered by PI Sinz  and collaboratively used by PIs Franke and Euler. The projects are also linked by the overarching theme of investigating inductive biases, which Theme D will address by considering how environmental and behavioural statistics are related to neural representations and ultimately perceptual performance. Together, the shared stimulus, computational and conceptual framework centred on neural adaptations to natural environments and behaviours, is anticipated to foster extensive synergies within Theme D.

Infrastructure project

A central goal of the CRC is to build and provide open model-, data-, and evaluation-platforms that enable building of “digital twin” models at different abstract levels and quantitative comparisons between models and data, and support collaborative usage by the entire research community. In this cross-sectional project, we will therefore, in collaboration with all PIs, both contribute conceptual approaches and computational tools for evaluating models of vision at different scales and levels of detail, and provide support for providing open-source datasets, models, and tools.