Rethinking 3D vision
Tolga Birdal at the Technical University of Munich is co-organising a 3D vision workshop at the ICCV computer vision conference in October. Here, he argues that 3D vision's strength - geometry - is also what's holding it back
Depth image, pose estimation, cameras, geometry, point cloud, reconstruction and mesh. These are some of the keywords that computer vision experts come up with in response to the term 3D vision. Yet, for many others, these concepts do not appear to be more than a word cloud, creating a certain mystery among the non-professionals. Despite such a lack of common knowledge about the intricacies of 3D imaging, the impact of 3D vision has reached a remarkable level.
3D technology in factory automation is expected to be valued at $2.13 billion by 2022, according to market research firm MarketsandMarkets. Automotive, pharmacy, food and beverage, and many other sectors expect increased use of 3D components and software. So what is the hype about, which applications benefit from 3D vision, and why is 3D so crucial for healthier progress towards Industry 4.0?
First things first: The information richness in 3D is much greater than in 2D. 3D can generate accurate coordinate, distance or radius measurements, and can output 3D object or camera poses explaining their precise orientation in the real world. As 3D coordinates are simple positions, they do not necessarily contain intensity information meaning that any further processing is not reliant on illumination, as is the case for 2D image understanding.
From an application point of view, 3D perception has always been very beneficial in robotics-related fields like quality inspection, pose estimation, navigation, mapping and grasping. However, 3D algorithms mean much more can be done. It is visible in the AR/VR world, as companies like Microsoft for its Hololens, Magic Leap, Apple, Google (Daydream) or Meta develop remarkably competent solutions to completely shift our view of technology. The fact that systems are able to obtain very accurate 3D reconstruction is also having tremendous impact in 3D printing, for making prosthetics, for example, in healthcare. Autonomous driving is also about to enter our lives and this too is powered by 3D processing. The new focus of NASA and Elon Musk on space exploration is expected to involve 3D vision algorithms in one way or another. All these applications signal significant progress in 3D vision technology.
So, first, let’s take a look at what holds back 3D vision at the moment; I argue that it’s geometry. 3D data comes with geometric properties, whereas within the good old structured 2D domain geometry has been easy to ignore. In 2D images, algebraic approximations have been shown to be very robust against non-ideal geometric conditions, especially if the task itself is free of geometric requirements, such as recognition or identification. Yet, 3D data comes with attributes such as axis of symmetry or rotation, or natural sparsity, and this doesn't immediately allow geometry to be neglected. This is good news for our academic colleagues, as it creates room for research.
One aspect that 3D has not yet fully utilised is the power of machine learning. However, there is now a promising subfield called geometric deep learning, which aims to unite the bests of both worlds – geometric properties and machine learning – so as to maximise the strength of 3D vision. This is, in my perspective, the next leap forward for which the industry should be ready.
But how do engineers get the most out of 3D data? I would start with the right analysis of the problem and requirements, which, of course, is made possible by asking the right questions. Is a prior CAD model available? Are the objects symmetric; could the system benefit from partial symmetry? Can an approximate of the object be made by geometric primitives or is the object completely freeform? Does the system need coordinate measurements or distances; dense reconstruction or will sparse points also suffice? Will the system operate outdoors or indoors? Is the object of interest metallic, shiny or black? Can we trade off a cruder, but faster method over an accurate but slow one? Depending on the application such questions vary. Good questions give rise to useful constraints, which makes the solution engineering less tedious.
Contrary to common belief, not every aspect of 3D is more challenging than dealing with 2D. For instance, 3D alleviates the pesky process of light selection and illumination design. It eliminates the necessity to perform triangulation from multiple views and gives a natural interface to distance measurements. Most applications of 3D can benefit from online calibration, made possible by 3D registration. Thus, depending on the problem, a 3D solution can complement or fully replace a 2D solution, offering a more cost-efficient and robust system.
I am co-organising a workshop on multiple view relationships in 3D data, which will be held on 29 October in conjunction with ICCV in Venice, one of the best computer vision conferences. The goal is to foster discussions and boost knowledge dissemination in 3D vision of multiple cameras. To this end, we have managed to bring in a lot of great speakers and are looking for a high quality set of submissions – authors of all accepted papers receive exciting prizes! For further information, I highly encourage you to visit: https://mvr3d.github.io.
Tolga Birdal is a PhD candidate at the Computer Vision Group at the Chair for Computer Aided Medical Procedures, TUM and a Doktorand at Siemens. His research and development is focused on large object detection, pose estimation and reconstruction. Recently, he was awarded Ernst von Siemens Scholarship and the EMVA Young Professional Award 2016 for part of his PhD work.