Finding 3D pose with a monocular camera

Share this on social media:

Tags: 

As winner of the EMVA Young Professional Award 2014, presented at the recent EMVA business conference, Jakob Engel, a PhD student in the Computer Vision Group at the Technical University of Munich, Germany, updates on the potential 3D imaging applications for his novel approach to real-time visual odometry using a monocular camera

The Computer Vision Group of the Technical University Munich has recently developed a novel, direct method to reconstruct the 3D environment from the video of a commodity hand-held camera, while at the same time tracking its exact position in real time. Commonly referred to as monocular SLAM (Simultaneous Localisation and Mapping), such methods are widely used in robotics, autonomous driving, or as the basis for virtual and augmented reality applications.

While multi-camera setups or active sensors such as structured light or time-of-fight cameras simplify the problem, compared to ordinary monocular cameras they are larger, more expensive and require more power, all of which are important criteria for commercialisation. In addition, stereo setups or active sensors have a very limited range at which they can provide reliable information – determined by the baseline of the sensor, for example – or do not work in direct sunlight. Monocular cameras, on the other hand, are scale-independent and fully passive, which allows them to operate in all environments at very different scales.

While all existing monocular SLAM algorithms are based on keypoints, the proposed method is a direct approach: instead of abstracting images to keypoint-observations, the method maps and tracks directly on image intensities (see Fig. 2). This has the fundamental advantage that all information in the images can be used, including the edges for example, instead of only relying on image corners (keypoints). Especially in man-made environments where there is often very little texture, this leads to denser and more detailed 3D reconstructions, as well as more accurate and robust camera tracking.

 

 

Figure 2: Keypoint-based methods abstract images to feature observations, and discard all other information. In contrast, the semi-dense direct approach maps and tracks directly on image intensities: this means, firstly, all information, including edges, are used, and secondly, rich, semi-dense information about the geometry of the scene is directly obtained. 

In the example of autonomous navigation of robots or cars, in unknown terrain a fundamental requirement is knowledge about the robot’s current position and the position of potential obstacles. Like humans and most animals, robots can use vision as a primary sensor to acquire this information. This SLAM technique has potential uses in navigation of unmanned micro-aerial vehicles where the size and power consumption of the sensor is subject to severe limitations. Deployed as a swarm, nano-quadrotors, which fit in the palm of a hand and weigh less than 25 grams, can be equipped with a nano-camera to navigate autonomously using this technique.

Another example of where this technique could be used is in virtual or augmented reality in smartphones. With cameras present in every modern smartphone, tablet or other wearable devices, virtual and augmented reality applications are becoming more and more prominent. A fundamental requirement for many such applications is exact real-time estimation of the pose of the device, as well as the 3D structure of the environment, which is what this technique is particularly good at.

The method is based on estimating and maintaining a semi-dense depth map (containing per-pixel depth) by continuous propagation and probabilistic fusion of pixel-wise stereo comparisons to previous frames. New frames are then tracked using direct image alignment: using the estimated semi-dense depth map, the camera pose is estimated by direct minimisation of the photometric error (intensity differences) between the two frames. To maintain real-time performance, even on a smartphone, only image pixels with a sufficiently large gradient are used, as the depth can only be estimated for these pixels.

Combined with a scale-drift aware pose-graph framework, large 3D scenes can be reconstructed accurately with only an ordinary hand-held monocular camera or from a mobile phone camera.

More information as well as videos can be found at: http://vision.in.tum.de/research/semidense.

Related analysis & opinion

02 December 2019

Takashi Someda, CTO at Hacarus, on the advantages of sparse modelling AI tools

26 July 2019

Limited data is a common problem when training CNNs in industrial imaging applications. Petra Thanner and Daniel Soukup, from the Austrian Institute of Technology, discuss ways of working with CNNs when data is scarce

Zeiss's Smartzoom 5 digital microscope can remove glare from images by using angular illumination

23 May 2019

Reporting from the EMVA’s business conference in Copenhagen, Greg Blackman discovers how angular illumination and computational imaging can dramatically improve the resolution of a system

05 April 2019

Greg Blackman reports on the complexities of training AllGo Systems' driver monitoring neural networks, which the firm's VP of engineering, Nirmal Kumar Sancheti, spoke about at the Embedded World trade fair

A point cloud of a National Research Council Canada artefact superimposed on a CAD model. Credit: NIST

31 July 2020

How do you choose a 3D vision system for a robot cell? Geraldine Cheok and Kamel Saidi at the National Institute of Standards and Technology in the USA discuss an initiative to define standards for industrial 3D imaging

Related features and analysis & opinion

A setup for photometric stereo imaging in which multiple lights are used to illuminate an object from different directions. Credit: Advanced illumination

04 June 2020

Matthew Dale explores the power of computational imaging, all made possible by clever illumination

02 December 2019

Takashi Someda, CTO at Hacarus, on the advantages of sparse modelling AI tools

Engineers at KYB in front of a pick-and-place solution for handling steel metal cylinders. Credit: Pickit

03 August 2020

Car manufacturing has been hit hard by Covid-19, but the need for automation on production lines has not diminished, as Greg Blackman finds out

A point cloud of a National Research Council Canada artefact superimposed on a CAD model. Credit: NIST

31 July 2020

How do you choose a 3D vision system for a robot cell? Geraldine Cheok and Kamel Saidi at the National Institute of Standards and Technology in the USA discuss an initiative to define standards for industrial 3D imaging

04 June 2020

How will the world feed 10 billion people by 2050 with no new land for agriculture? Greg Blackman speaks to machine builder Bühler about how optical sensing can maximise yield in grain processing

Two robots have been installed at Aalborg University Hospital in Denmark. Credit: Kuka

04 June 2020

Keely Portway looks at how robots are automating procedures in hospital testing laboratories, and how imaging underpins this

28 February 2020

Paul Wilson, managing director of Scorpion Vision, describes what it takes to install a 3D robot vision system in a Chinese foundry

The Harvesters (1565), by Pieter Bruegel the Elder

24 February 2020

Greg Blackman speaks to Dr Richard Dudley about the National Physical Laboratory’s 3D imaging rig, which will be scanning wheat in crop breeding trials this summer