Machine vision tool can describe images

Share this on social media:

Researchers from Stanford University have developed a machine vision system capable of describing the details in photos and images. In the near future, the system could allow people to search for specific photos or videos on search engines, and, further down the line, could even lead to robotic systems able to navigate in unknown situations.

At the heart of the Stanford system are algorithms that enable the system to improve its accuracy by scanning scene after scene, looking for patterns, and then using the accumulation of previously described scenes to conclude what is being depicted in the next unknown image.

‘The system can analyse an unknown image and explain it in words and phrases that make sense,’ said Fei-Fei Li, a professor of computer science and director of the Stanford Artificial Intelligence Lab.

Li and her colleagues trained the system on a visual dictionary, using a database of more than 14 million images. Unlike similar machine vision devices able to recognise objects, the new system was trained using a dictionary of scenes, a more complicated task than looking at just objects.

Each scene is described in two ways: in mathematical terms that the machine could use to recognise similar scenes and also using phrases that humans would understand. For instance, one image might be ‘cat sits on keyboard’ while another could be ‘girl rides on horse in field.’

Li's machine-learning algorithm analyses the patterns in these predefined pictures and then applies its analysis to unknown images and uses what it had learned to identify individual objects and provide some rudimentary context. In other words, it can tell a simple story about the image.

‘Telling a story about a picture turns out to be a core element of human visual intelligence but so far it has proven very difficult to do this with computer algorithms,’ Li pointed out. ‘This is an important milestone. It's the first time we've had a computer vision system that could tell a basic story about an unknown image by identifying discrete objects and also putting them into some context.’

Related links

PetMatch app uses machine vision to match pets with owners

Stanford University 

Recent News

18 February 2021

Researchers in Southampton, UK and San Francisco have developed a lidar sensor that could pave the way for low-cost, high-performance 3D imaging

10 February 2021

The firm's Lacera technology delivers greater than 90 per cent quantum efficiency and low noise architecture with up to 18-bit readout

09 February 2021

French firm New Imaging Technologies has joined the effort to produce SWIR image sensors with smaller pixels

25 January 2021

It is hoped the photometric stereo imaging approach could open up new ways for robots to sense their environment