Machine vision tool can describe images
Researchers from Stanford University have developed a machine vision system capable of describing the details in photos and images. In the near future, the system could allow people to search for specific photos or videos on search engines, and, further down the line, could even lead to robotic systems able to navigate in unknown situations.
At the heart of the Stanford system are algorithms that enable the system to improve its accuracy by scanning scene after scene, looking for patterns, and then using the accumulation of previously described scenes to conclude what is being depicted in the next unknown image.
‘The system can analyse an unknown image and explain it in words and phrases that make sense,’ said Fei-Fei Li, a professor of computer science and director of the Stanford Artificial Intelligence Lab.
Li and her colleagues trained the system on a visual dictionary, using a database of more than 14 million images. Unlike similar machine vision devices able to recognise objects, the new system was trained using a dictionary of scenes, a more complicated task than looking at just objects.
Each scene is described in two ways: in mathematical terms that the machine could use to recognise similar scenes and also using phrases that humans would understand. For instance, one image might be ‘cat sits on keyboard’ while another could be ‘girl rides on horse in field.’
Li's machine-learning algorithm analyses the patterns in these predefined pictures and then applies its analysis to unknown images and uses what it had learned to identify individual objects and provide some rudimentary context. In other words, it can tell a simple story about the image.
‘Telling a story about a picture turns out to be a core element of human visual intelligence but so far it has proven very difficult to do this with computer algorithms,’ Li pointed out. ‘This is an important milestone. It's the first time we've had a computer vision system that could tell a basic story about an unknown image by identifying discrete objects and also putting them into some context.’