Rethinking 3D vision

Share this on social media:

Tolga Birdal at the Technical University of Munich is co-organising a 3D vision workshop at the ICCV computer vision conference in October. Here, he argues that 3D vision's strength - geometry - is also what's holding it back

Depth image, pose estimation, cameras, geometry, point cloud, reconstruction and mesh. These are some of the keywords that computer vision experts come up with in response to the term 3D vision. Yet, for many others, these concepts do not appear to be more than a word cloud, creating a certain mystery among the non-professionals. Despite such a lack of common knowledge about the intricacies of 3D imaging, the impact of 3D vision has reached a remarkable level.

3D technology in factory automation is expected to be valued at $2.13 billion by 2022, according to market research firm MarketsandMarkets. Automotive, pharmacy, food and beverage, and many other sectors expect increased use of 3D components and software. So what is the hype about, which applications benefit from 3D vision, and why is 3D so crucial for healthier progress towards Industry 4.0?

First things first: The information richness in 3D is much greater than in 2D. 3D can generate accurate coordinate, distance or radius measurements, and can output 3D object or camera poses explaining their precise orientation in the real world. As 3D coordinates are simple positions, they do not necessarily contain intensity information meaning that any further processing is not reliant on illumination, as is the case for 2D image understanding.

From an application point of view, 3D perception has always been very beneficial in robotics-related fields like quality inspection, pose estimation, navigation, mapping and grasping. However, 3D algorithms mean much more can be done. It is visible in the AR/VR world, as companies like Microsoft for its Hololens, Magic Leap, Apple, Google (Daydream) or Meta develop remarkably competent solutions to completely shift our view of technology. The fact that systems are able to obtain very accurate 3D reconstruction is also having tremendous impact in 3D printing, for making prosthetics, for example, in healthcare. Autonomous driving is also about to enter our lives and this too is powered by 3D processing. The new focus of NASA and Elon Musk on space exploration is expected to involve 3D vision algorithms in one way or another. All these applications signal significant progress in 3D vision technology.

So, first, let’s take a look at what holds back 3D vision at the moment; I argue that it’s geometry. 3D data comes with geometric properties, whereas within the good old structured 2D domain geometry has been easy to ignore. In 2D images, algebraic approximations have been shown to be very robust against non-ideal geometric conditions, especially if the task itself is free of geometric requirements, such as recognition or identification. Yet, 3D data comes with attributes such as axis of symmetry or rotation, or natural sparsity, and this doesn't immediately allow geometry to be neglected. This is good news for our academic colleagues, as it creates room for research.

One aspect that 3D has not yet fully utilised is the power of machine learning. However, there is now a promising subfield called geometric deep learning, which aims to unite the bests of both worlds – geometric properties and machine learning – so as to maximise the strength of 3D vision. This is, in my perspective, the next leap forward for which the industry should be ready.

But how do engineers get the most out of 3D data? I would start with the right analysis of the problem and requirements, which, of course, is made possible by asking the right questions. Is a prior CAD model available? Are the objects symmetric; could the system benefit from partial symmetry? Can an approximate of the object be made by geometric primitives or is the object completely freeform? Does the system need coordinate measurements or distances; dense reconstruction or will sparse points also suffice? Will the system operate outdoors or indoors? Is the object of interest metallic, shiny or black? Can we trade off a cruder, but faster method over an accurate but slow one? Depending on the application such questions vary. Good questions give rise to useful constraints, which makes the solution engineering less tedious.

Contrary to common belief, not every aspect of 3D is more challenging than dealing with 2D. For instance, 3D alleviates the pesky process of light selection and illumination design. It eliminates the necessity to perform triangulation from multiple views and gives a natural interface to distance measurements. Most applications of 3D can benefit from online calibration, made possible by 3D registration. Thus, depending on the problem, a 3D solution can complement or fully replace a 2D solution, offering a more cost-efficient and robust system.

I am co-organising a workshop on multiple view relationships in 3D data, which will be held on 29 October in conjunction with ICCV in Venice, one of the best computer vision conferences. The goal is to foster discussions and boost knowledge dissemination in 3D vision of multiple cameras. To this end, we have managed to bring in a lot of great speakers and are looking for a high quality set of submissions – authors of all accepted papers receive exciting prizes! For further information, I highly encourage you to visit: https://mvr3d.github.io.

--

Tolga Birdal is a PhD candidate at the Computer Vision Group at the Chair for Computer Aided Medical Procedures, TUM and a Doktorand at Siemens. His research and development is focused on large object detection, pose estimation and reconstruction. Recently, he was awarded Ernst von Siemens Scholarship and the EMVA Young Professional Award 2016 for part of his PhD work.

Other tags: 

Related analysis & opinion

A point cloud of a National Research Council Canada artefact superimposed on a CAD model. Credit: NIST

31 July 2020

How do you choose a 3D vision system for a robot cell? Geraldine Cheok and Kamel Saidi at the National Institute of Standards and Technology in the USA discuss an initiative to define standards for industrial 3D imaging

28 February 2020

Paul Wilson, managing director of Scorpion Vision, describes what it takes to install a 3D robot vision system in a Chinese foundry

Depth map from the SceneScan. Credit: Nerian Vision

21 November 2019

Dr Konstantin Schauwecker, CEO of Nerian Vision, describes the firm’s stereo vision sensor for fast depth perception with FPGAs

Related features and analysis & opinion

Engineers at KYB in front of a pick-and-place solution for handling steel metal cylinders. Credit: Pickit

03 August 2020

Car manufacturing has been hit hard by Covid-19, but the need for automation on production lines has not diminished, as Greg Blackman finds out

A point cloud of a National Research Council Canada artefact superimposed on a CAD model. Credit: NIST

31 July 2020

How do you choose a 3D vision system for a robot cell? Geraldine Cheok and Kamel Saidi at the National Institute of Standards and Technology in the USA discuss an initiative to define standards for industrial 3D imaging

04 June 2020

How will the world feed 10 billion people by 2050 with no new land for agriculture? Greg Blackman speaks to machine builder Bühler about how optical sensing can maximise yield in grain processing

Two robots have been installed at Aalborg University Hospital in Denmark. Credit: Kuka

04 June 2020

Keely Portway looks at how robots are automating procedures in hospital testing laboratories, and how imaging underpins this

28 February 2020

Paul Wilson, managing director of Scorpion Vision, describes what it takes to install a 3D robot vision system in a Chinese foundry

The Harvesters (1565), by Pieter Bruegel the Elder

24 February 2020

Greg Blackman speaks to Dr Richard Dudley about the National Physical Laboratory’s 3D imaging rig, which will be scanning wheat in crop breeding trials this summer

24 February 2020

Keely Portway on the importance of vision for robots working in warehouses