Computergrafik

SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention

Simon Doll1,3 , Richard Schulz1 , Lukas Schneider1 , Viviane Benzin1 ,
Markus Enzweiler2 , and Hendrik P.A. Lensch3
Mercedes-Benz, simon.doll@mercedes-benz.com1
Esslingen University of Applied Sciences2
University of Tübingen3

Abstract

Based on the key idea of DETR this paper introduces an
object-centric 3D object detection framework that operates on a limited
number of 3D object queries instead of dense bounding box proposals
followed by non-maximum suppression. After image feature extraction a
decoder-only transformer architecture is trained on a set-based loss. SpatialDETR
infers the classification and bounding box estimates based on
attention both spatially within each image and across the different views.
To fuse the multi-view information in the attention block we introduce a
novel geometric positional encoding that incorporates the view ray geometry
to explicitly consider the extrinsic and intrinsic camera setup. This
way, the spatially-aware cross-view attention exploits arbitrary receptive
fields to integrate cross-sensor data and therefore global context. Extensive
experiments on the nuScenes benchmark demonstrate the potential
of global attention and result in state-of-the-art performance. Code available
at https://github.com/cgtuebingen/SpatialDETR.

Links

Bibtex

@inproceedings{Doll2022ECCV,
 author = {Doll, Simon and Schulz, Richard and Schneider, Lukas and Benzin, Viviane and Enzweiler Markus and Lensch, Hendrik P.A.},
 title = {SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention},
 booktitle = {European Conference on Computer Vision (ECCV)},
 year = {2022}
}

Datenschutzeinstellungen

Auf unserer Webseite werden Cookies verwendet. Einige davon werden zwingend benötigt, während es uns andere ermöglichen, Ihre Nutzererfahrung auf unserer Webseite zu verbessern. Ihre getroffenen Einstellungen können jederzeit bearbeitet werden.

oder

Essentiell

in2code

Videos

in2code
YouTube
Google