More varied tasks required to truly test computer vision

Share this on social media:

Researchers have shown current tests of computer vision do not truly reflect the difficulties in viewing a natural, and varying, environment. The research suggests more complicated tasks are required if we are to measure developments in machine vision technology.

Conventional tests require vision systems to recognise objects contained in photographic image sets. State-of-the-art machine vision systems can usually correctly recognise objects about 60 per cent of the time, but James DiCarlo and his team from MIT, USA, believed these tests are too easy to give a real estimate of how the robots would fare in the real world, as the photographs frequently cover the same views and contexts, with very centralised, ‘obvious’ objects.

‘We suspected that the supposedly natural images in current computer vision tests do not really engage the central problem of variability, and that our intuitions about what makes objects hard or easy to recognise are incorrect,’ Nicolas Pinto, one of the researchers explained.

To test their theory, the team created a very simple ‘toy’ vision system, which was then set to compete with more advanced systems in these tests. The toy was designed to capture low-level information about the position and orientation of line boundaries, while lacking the more sophisticated analysis that happens in later stages of visual processing to extract information about higher-level features of the visual scene such as shapes, surfaces or spaces between objects.

When tested on conventional images, the simple ‘toy’ vision system performed just as well as the more advanced systems, suggesting that only the most basic layers of visual processing are necessary to perform the task.

The team then performed a more carefully controlled task with just two categories – planes and cars, introduced variations in position, size and orientation that better reflect the range of variation in the real world. Because it contained just two different types of objects, traditional thought would suggest that this test would be very easy for the simple system to distinguish. However, the variability meant that it actually performed very poorly, suggesting that it is a better measurement of visual processing capability.