Dr Andrew Schofield, who leads the Visual Image Interpretation in Human and Machines network in the UK, asks what computer vision can learn from biological vision, and how the two disciplines can collaborate better
What can drones learn from drones? Almost every month the national news feeds carry a story about the latest development in drone aircraft, self-driving cars, or intelligent robot co-workers. If such systems are to achieve mass usage in a mixed environment with human users they will need advanced vision and artificial reasoning capabilities and will need to both behave, and fail, in ways that are acceptable to humans.
Setting aside recent high profile crashes in self-driven cars, the complexity and un-reliability of our road systems mean that driverless cars will need to act very much like human drivers. A car that refuses to edge out into heavy traffic will cause grid lock. Likewise a drone should not fail to deliver its package because the front door at the target house has been painted since the last Google Street View update.
So what can drones learn from drones – or, to be more precise, worker bees? In surveillance they may have similar tasks: explore the environment looking for particular targets while avoiding obstacles and eventually return home. They also have similar payload and power constraints: neither can afford a heavy, power hungry brain.
The bee achieves its seek and locate task with very little neural hardware and a near zero energy budget. To do so it uses relatively simple navigation, avoidance and detection strategies that produce apparently intelligent behaviour. Much of the technology for this kind of task is already available in the form of optic flow sensors and simple pattern recognisers such as the ubiquitous face locators on camera phones. Even the vastly more complex human brain has within it separate modules or brain regions specialised for short range sub-conscious navigation via optic flow and rapid face detection. However, the human brain is much more adaptable and reliable than even the best computer vision systems.
The Visual Image Interpretation in Human and Machines (ViiHM) network, funded by the Engineering and Physical Sciences Research Council, brings together around 250 researchers to foster the translation of discoveries from biological- to machine-vision systems. This aim is not new. In the early days of machine vision there was a natural crossover between these two fields. The Canny edge detector for example computes edges as luminance gradients in a blurred (de-noised) image and then links weaker edge elements to stronger ones. This method has its roots in Marr and Hildreth’s model of retinal processing plus the contour integration mechanisms found in visual cortex.
More recent examples of biology inspired processing include Deep Convolutional Neural Networks (DNN), which have multiple convolution-based filtering layers separated by non-linear operators and down sampling to achieve increasingly large-scale and complex filters until finally classifications can be made. This structure is very similar to and loosely modelled on the multiple feature detection layers and receptive field properties of biological vision systems. Alternatively the SpikeNet recognition system has a similar convolutional structure but more directly models the production of neuron action potentials. The relationship between machine and biological vision is symbiotic: convolution filters developed for machine vision are used to model biological processing and DNNs have been applied to human behavioural data to characterise the visual system.
However, in recent decades the biological and machine vision communities have diverged. Driven by different success criteria – a desire to understand specific visual systems on the one hand and to rapidly build working engineering solutions on the other – the two disciplines have developed different priorities, and ways of working. The ideal development cycle where observed phenomenon are explored in biology, results modelled computationally, and those models turned into useful applications can be protracted and requires multiple skill sets. The chain is often broken as academics on the biological vision side rush to publish their findings and get on with the next experiment while those working in industrial vision rightly employ any and every tool in the quest for better performance. Progress is hindered by language and understanding barriers with different terminology used even for the most basic concepts.
To counter this separation ViiHM has developed a triad of Grand Challenges for intelligent vision where we think success can best be achieved by working together. The overall aim is to produce a general purpose, embodied, integrated, and adaptive visual system for intelligent robots, mobile and wearable technologies. Within this scope the Application Challenge is to augment and enhance vision in both the visually impaired and normally sighted, and to develop cognitive and personal assistants that can help those with low vision, the elderly, or simply the busy executive to deal with everyday tasks. Such aids might extend from wearable technologies that secretly prompt their user, to fully autonomous robots acting as caregivers and personal assistants. Here it is important that robots think and act like humans while avoiding the ‘uncanny valley’ effect – where people are repulsed by robots appearing almost, but not exactly, like real humans.
These applications will be underpinned by the Technical Challenge of making low-power, small-footprint vision systems. To be acceptable, intelligent visual systems need to run all day on a single charge and be realised in discreet wearable devices. Such power and space savings can be achieved by learning how biological systems are implemented at the physical as well as algorithmic layer. Finally the Theoretical Challenge of general purpose, integrated and adaptive vision will see visual systems that can operate ‘out of the box’ and in the wild, but continuously adapt to and learn from their environment.
Learning the behaviours of their users and co-workers, such systems will be robust and flexible. They will fail gracefully and in ways that are acceptable to the humans they co-operate with. They will, for example, be able to identify people and places despite quite gross changes, to safely navigate new and altered environments and learn from experience over very long periods of time with fixed and limited memory capacities. These are tough challenges but biology has shown them to be solvable.
Andrew Schofield has a BEng in Electronics Engineering, a post-graduate diploma in Psychology and a PhD in Neuroscience. He is currently a senior lecturer in psychology at the University of Birmingham and member of the Centre for Computational Neuroscience and Cognitive Robotics. He also leads the ViiHM Network. ViiHM is open to new members and is currently seeking industry partners to take part in a grant writing event in July 2017.
SLAM: the main event - Greg Blackman reports from a KTN-organised image processing conference, where event cameras and the future of robotic vision were discussed
1. http://www.viihm.org.uk (accessed 28/3/2017)
2. Canny, J. (1986) A Computational Approach To Edge Detection, IEEE Trans. Pattern Analysis and Machine Intelligence, 8(6):679–698.
3. Marr, D., Hildreth, E. (1980) Theory of Edge Detection, Proceedings of the Royal Society of London. Series B, Biological Sciences, 207: 187–217, doi:10.1098/rspb.1980.0020
4. Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) ImageNet Classification with Deep Convolutional Neural Networks in Advances in Neural Information Processing 25, MIT Press, Cambridge, MA
5. Masquelier T., Thorpe S.J. (2007) Unsupervised learning of visual features through spike timing dependent plasticity, PLoS Comput Biol 3(2): e31. doi:10.1371/journal. pcbi.0030031
6. http://www.viihm.org.uk/grand-challenges/ (accessed 28/3/2017)
7. Mori, M. (2012) Translated by MacDorman, K. F.; Kageki, Norri. The uncanny valley, IEEE Robotics and Automation. 19 (2): 98–100. doi:10.1109/MRA.2012.2192811