Machine vision tool can describe images

Share this on social media:

Researchers from Stanford University have developed a machine vision system capable of describing the details in photos and images. In the near future, the system could allow people to search for specific photos or videos on search engines, and, further down the line, could even lead to robotic systems able to navigate in unknown situations.

At the heart of the Stanford system are algorithms that enable the system to improve its accuracy by scanning scene after scene, looking for patterns, and then using the accumulation of previously described scenes to conclude what is being depicted in the next unknown image.

‘The system can analyse an unknown image and explain it in words and phrases that make sense,’ said Fei-Fei Li, a professor of computer science and director of the Stanford Artificial Intelligence Lab.

Li and her colleagues trained the system on a visual dictionary, using a database of more than 14 million images. Unlike similar machine vision devices able to recognise objects, the new system was trained using a dictionary of scenes, a more complicated task than looking at just objects.

Each scene is described in two ways: in mathematical terms that the machine could use to recognise similar scenes and also using phrases that humans would understand. For instance, one image might be ‘cat sits on keyboard’ while another could be ‘girl rides on horse in field.’

Li's machine-learning algorithm analyses the patterns in these predefined pictures and then applies its analysis to unknown images and uses what it had learned to identify individual objects and provide some rudimentary context. In other words, it can tell a simple story about the image.

‘Telling a story about a picture turns out to be a core element of human visual intelligence but so far it has proven very difficult to do this with computer algorithms,’ Li pointed out. ‘This is an important milestone. It's the first time we've had a computer vision system that could tell a basic story about an unknown image by identifying discrete objects and also putting them into some context.’

Related links

PetMatch app uses machine vision to match pets with owners

Stanford University 

Recent News

03 September 2020

Terahertz imaging company, Tihive, has been awarded €8.6m from the European Innovation Council's Accelerator programme to scale up its industrial inspection technology

19 May 2020

The National Institute of Standards and Technology and ASTM Committee E57 have released proceedings on a workshop to define the performance of 3D imaging systems for robots in manufacturing

12 May 2020

The sensors boast a pixel pitch of 5μm thanks to Sony's stacking technology using a copper-to-copper connection. They also deliver high quantum efficiency even in the visible range

06 April 2020

Zensors' algorithms analyse feeds from CCTV cameras to provide real-time data on the number of people in an area and whether safe distances are maintained between them