Do you speak neural network?

Share this on social media:

Tags: 

Neural networks will be the common language of the future for computer vision, according to Professor Jitendra Malik from UC Berkeley. Greg Blackman listened to his keynote at the Embedded Vision Summit in Santa Clara

(Credit: ktsdesign)

Neural networks will be the primary language of computer vision in the future, rather like English is the common language for the scientific community. At least that is the hope of Professor Jitendra Malik at the University of California at Berkeley – Malik was speaking at the Embedded Vision Summit, a computer vision conference organised by the Embedded Vision Alliance and held in Santa Clara, California from 1 to 3 May.

Half of the technical insight presentations at the conference focused on deep learning and neural networks, a branch of artificial intelligence where the algorithms are trained to recognise objects in a scene using large datasets, as opposed to the traditional method of writing an algorithm for a specific task.

Jeff Bier, the founder of the Embedded Vision Alliance, when introducing Malik for his keynote address at the conference, said that 70 per cent of vision developers surveyed by the Alliance were using neural networks, a huge shift compared to only three years ago at the 2014 summit when hardly anyone was using them.

Deep learning has also reached the industrial machine vision world to some extent, with the latest version of MVTec’s Halcon software running an OCR tool based on deep learning, and Vidi Systems, now owned by Cognex, offering a deep learning software suite for machine vision.

Malik went further than saying neural networks merely have their place in computer vision, to suggesting that deep learning could be used to unite different strands of computer vision. He gave the example of 3D vision, for which algorithms like simultaneous localisation and mapping (SLAM) have traditionally been used to model the world in 3D, and for which machine learning hasn’t been thought suitable. He said that the world of geometry – which techniques like SLAM fall into – and machine learning need to be brought together.

A human will view a chair, for instance, in 3D, as well as being informed by past experiences of other chairs he or she has seen. Geometry and machine learning are two very different languages in computer vision terms, and Malik said, similar to scientists communicating in English, so a common language should be found in computer vision. ‘In my opinion, it is easier to make everybody learn English, which in this case is neural networks,’ he said.

There are neural networks that start to combine the two worlds of thinking, but Malik noted that putting geometrical data in the language of neural networks requires a fundamental breakthrough. He added that, over the next couple of years, he believes this marriage of geometrical thinking and machine learning-based methods will be achieved.

Malik noted another exciting area of computer vision research is training machines to make predictions, namely predictions about people and social behaviour. This involves work on teaching machines to recognise actions and to make sense of people’s behaviour in light of their possible objectives.

He also suggested that computer vision scientists should take note of the research carried out in neuroscience, since deep neural networks are originally based on findings in neuroscience. ‘Neuroscientists found phenomena in the brain which led us down this path,’ he said, adding that researchers should keep looking in the neuroscience literature to see if there are things that should be exploited.

One other problem in computer vision that Malik felt needed addressing was solving that of limited data. Neural networks learn about the world around them using masses of data, but there will always be instances where there isn’t enough information. He gave the example of work being carried out at the Berkeley Artificial Intelligence Research Laboratory, whereby a robot taught itself to manipulate objects by poking them repeatedly. It’s not being trained explicitly, but it is teaching itself. The work uses two different models – a forward one and an inverse one – and the interplay between them gives an accurate means of decision making for the robot.

Company: 

Related analysis & opinion

Neil Trevett and Chris Yates

16 March 2021

The Khronos Group and the EMVA are to explore software standards for embedded vision. Khronos’ Neil Trevett and EMVA’s Chris Yates explain the work

05 March 2021

Greg Blackman reports from the Embedded World show, where industry experts gave insights into vision processing at the edge

23 November 2020

As AMD buys Xilinx and Nvidia acquires Arm, we ask two industry experts what this could mean for the vision sector

10 November 2020

Greg Blackman explores the efforts underway to improve connectivity in factories

27 January 2020

Prior to speaking at the Embedded World trade fair, The Khronos Group’s president, Neil Trevett, discusses the open API standards available for applications using machine learning and embedded vision

Related features and analysis & opinion

Neil Trevett and Chris Yates

16 March 2021

The Khronos Group and the EMVA are to explore software standards for embedded vision. Khronos’ Neil Trevett and EMVA’s Chris Yates explain the work

05 March 2021

Greg Blackman reports from the Embedded World show, where industry experts gave insights into vision processing at the edge

Vision Components MIPI modules can be connected to various embedded processors, including Nvidia Jetson boards

17 February 2021

Greg Blackman examines the effort that goes into creating an embedded vision system

23 November 2020

As AMD buys Xilinx and Nvidia acquires Arm, we ask two industry experts what this could mean for the vision sector

10 November 2020

Greg Blackman explores the efforts underway to improve connectivity in factories

08 June 2021

With logistics experiencing huge growth, Keely Portway considers business opportunities for vision companies in warehouse automation

Hyperspectral imaging can be used to check for blemishes on food packaging lines. Credit: Brillopak

18 February 2021

Matthew Dale finds out how vision is enabling smaller batch sizes to be processed on packaging lines

The highlighted objects on the left show scaling errors, rotation errors or translation errors, while the objects on the right are a truer representation. Credit: Zivid

14 December 2020

Matthew Dale explores the new 3D vision tools that are enabling automated bin picking