Do you speak neural network?

Share this on social media:


Neural networks will be the common language of the future for computer vision, according to Professor Jitendra Malik from UC Berkeley. Greg Blackman listened to his keynote at the Embedded Vision Summit in Santa Clara

(Credit: ktsdesign)

Neural networks will be the primary language of computer vision in the future, rather like English is the common language for the scientific community. At least that is the hope of Professor Jitendra Malik at the University of California at Berkeley – Malik was speaking at the Embedded Vision Summit, a computer vision conference organised by the Embedded Vision Alliance and held in Santa Clara, California from 1 to 3 May.

Half of the technical insight presentations at the conference focused on deep learning and neural networks, a branch of artificial intelligence where the algorithms are trained to recognise objects in a scene using large datasets, as opposed to the traditional method of writing an algorithm for a specific task.

Jeff Bier, the founder of the Embedded Vision Alliance, when introducing Malik for his keynote address at the conference, said that 70 per cent of vision developers surveyed by the Alliance were using neural networks, a huge shift compared to only three years ago at the 2014 summit when hardly anyone was using them.

Deep learning has also reached the industrial machine vision world to some extent, with the latest version of MVTec’s Halcon software running an OCR tool based on deep learning, and Vidi Systems, now owned by Cognex, offering a deep learning software suite for machine vision.

Malik went further than saying neural networks merely have their place in computer vision, to suggesting that deep learning could be used to unite different strands of computer vision. He gave the example of 3D vision, for which algorithms like simultaneous localisation and mapping (SLAM) have traditionally been used to model the world in 3D, and for which machine learning hasn’t been thought suitable. He said that the world of geometry – which techniques like SLAM fall into – and machine learning need to be brought together.

A human will view a chair, for instance, in 3D, as well as being informed by past experiences of other chairs he or she has seen. Geometry and machine learning are two very different languages in computer vision terms, and Malik said, similar to scientists communicating in English, so a common language should be found in computer vision. ‘In my opinion, it is easier to make everybody learn English, which in this case is neural networks,’ he said.

There are neural networks that start to combine the two worlds of thinking, but Malik noted that putting geometrical data in the language of neural networks requires a fundamental breakthrough. He added that, over the next couple of years, he believes this marriage of geometrical thinking and machine learning-based methods will be achieved.

Malik noted another exciting area of computer vision research is training machines to make predictions, namely predictions about people and social behaviour. This involves work on teaching machines to recognise actions and to make sense of people’s behaviour in light of their possible objectives.

He also suggested that computer vision scientists should take note of the research carried out in neuroscience, since deep neural networks are originally based on findings in neuroscience. ‘Neuroscientists found phenomena in the brain which led us down this path,’ he said, adding that researchers should keep looking in the neuroscience literature to see if there are things that should be exploited.

One other problem in computer vision that Malik felt needed addressing was solving that of limited data. Neural networks learn about the world around them using masses of data, but there will always be instances where there isn’t enough information. He gave the example of work being carried out at the Berkeley Artificial Intelligence Research Laboratory, whereby a robot taught itself to manipulate objects by poking them repeatedly. It’s not being trained explicitly, but it is teaching itself. The work uses two different models – a forward one and an inverse one – and the interplay between them gives an accurate means of decision making for the robot.


Related analysis & opinion

23 November 2020

As AMD buys Xilinx and Nvidia acquires Arm, we ask two industry experts what this could mean for the vision sector

10 November 2020

Greg Blackman explores the efforts underway to improve connectivity in factories

27 January 2020

Prior to speaking at the Embedded World trade fair, The Khronos Group’s president, Neil Trevett, discusses the open API standards available for applications using machine learning and embedded vision

13 January 2020

Vassilis Tsagaris and Dimitris Kastaniotis at Irida Labs say an iterative approach is needed to build a real-world AI vision application on embedded hardware

02 December 2019

Takashi Someda, CTO at Hacarus, on the advantages of sparse modelling AI tools

Related features and analysis & opinion

23 November 2020

As AMD buys Xilinx and Nvidia acquires Arm, we ask two industry experts what this could mean for the vision sector

10 November 2020

Greg Blackman explores the efforts underway to improve connectivity in factories

The panel discussion at the Embedded World trade fair

15 April 2020

Matthew Dale looks at what it will take to increase the adoption of embedded vision

Pegnitz river in Nuremberg

12 February 2020

Vision technology will be one of the highlights at Embedded World in Nuremberg. Here, we preview what to expect

A setup for photometric stereo imaging in which multiple lights are used to illuminate an object from different directions. Credit: Advanced illumination

04 June 2020

Matthew Dale explores the power of computational imaging, all made possible by clever illumination

Engineers at KYB in front of a pick-and-place solution for handling steel metal cylinders. Credit: Pickit

03 August 2020

Car manufacturing has been hit hard by Covid-19, but the need for automation on production lines has not diminished, as Greg Blackman finds out

A point cloud of a National Research Council Canada artefact superimposed on a CAD model. Credit: NIST

31 July 2020

How do you choose a 3D vision system for a robot cell? Geraldine Cheok and Kamel Saidi at the National Institute of Standards and Technology in the USA discuss an initiative to define standards for industrial 3D imaging

04 June 2020

How will the world feed 10 billion people by 2050 with no new land for agriculture? Greg Blackman speaks to machine builder Bühler about how optical sensing can maximise yield in grain processing