Machine vision tool can describe images

Share this on social media:

Researchers from Stanford University have developed a machine vision system capable of describing the details in photos and images. In the near future, the system could allow people to search for specific photos or videos on search engines, and, further down the line, could even lead to robotic systems able to navigate in unknown situations.

At the heart of the Stanford system are algorithms that enable the system to improve its accuracy by scanning scene after scene, looking for patterns, and then using the accumulation of previously described scenes to conclude what is being depicted in the next unknown image.

‘The system can analyse an unknown image and explain it in words and phrases that make sense,’ said Fei-Fei Li, a professor of computer science and director of the Stanford Artificial Intelligence Lab.

Li and her colleagues trained the system on a visual dictionary, using a database of more than 14 million images. Unlike similar machine vision devices able to recognise objects, the new system was trained using a dictionary of scenes, a more complicated task than looking at just objects.

Each scene is described in two ways: in mathematical terms that the machine could use to recognise similar scenes and also using phrases that humans would understand. For instance, one image might be ‘cat sits on keyboard’ while another could be ‘girl rides on horse in field.’

Li's machine-learning algorithm analyses the patterns in these predefined pictures and then applies its analysis to unknown images and uses what it had learned to identify individual objects and provide some rudimentary context. In other words, it can tell a simple story about the image.

‘Telling a story about a picture turns out to be a core element of human visual intelligence but so far it has proven very difficult to do this with computer algorithms,’ Li pointed out. ‘This is an important milestone. It's the first time we've had a computer vision system that could tell a basic story about an unknown image by identifying discrete objects and also putting them into some context.’

Related links

PetMatch app uses machine vision to match pets with owners

Stanford University 

Recent News

06 May 2021

The GTOF0503 sensor features a 5µm three-tap iToF pixel, incorporating an array with a resolution of 640 x 480 pixels

30 April 2021

The algorithm can deduce the shape, size and layout of a room by measuring the time it takes for sound from speakers to return to the phone's microphone

20 April 2021

The Kria K26 SOM is built on top of the Zynq UltraScale+ MPSoC architecture. It has 4GB of DDR4 memory and 245 IOs for connecting sensors

18 March 2021

CEA-Leti scientists have developed a lensless, infrared spectral imaging system for medical diagnostics. It plans to commercialise the technology through a start-up