Neuromorphic pioneer recognised at Vision Stuttgart
Prophesee has won the Vision Award at Vision Stuttgart for its neuromorphic approach to imaging. Greg Blackman speaks to Luca Verre, the firm’s CEO, about prospects for the technology
Vision Stuttgart has returned and with it the Vision Award, which, earlier today, went to Prophesee, recognising the potential its neuromorphic – or event-based – approach to imaging has for the machine vision sector.
Prophesee’s first Vision show was in 2016, fresh off the back of winning best start-up at the investment conference Inpho Venture Summit the month before.
Since then a lot has changed: a few weeks ago Sony announced neuromorphic sensors based on the firm’s technology, and earlier in the year Prophesee won investment from the Chinese AI venture capital firm, Sinovation – the first European company to do so – along with Xiaomi as a corporate investor, showing the technology has scope for use in mobile devices.
Prophesee’s CEO, Luca Verre, explained that machine vision remains an important market segment for the company. Prophesee began by targeting machine vision for the first three iterations of its sensor, in part because the form factor of those sensors was too large for consumer devices but a fit for machine vision. Now, its fourth generation sensor, which has just been released with Sony, has an optical form factor that can fit easily into consumer devices, even mobile, Verre said.
Nevertheless, machine vision is still a key market, as the high-speed, real-time performance of neuromorphic sensing lends itself to machine vision tasks. Unlike conventional imaging, where all the information in the scene is captured for each frame, event-based imaging records changes – or events – in the scene, similar to how the human eye records and interprets visual input. This gives specific advantages: the sensor can run at microsecond time resolution, or greater than 10,000 images per second time resolution equivalent. It is therefore ideal for applications like high-speed counting – counting and measuring the size of particles or objects moving at up to 500,000 pixels per second – or monitoring vibrations in manufacturing equipment at 10kHz for predictive maintenance.
Because the sensor is only capturing changes in the scene, it generates 10 times to 1,000 times less data than frame-based approaches; it offers 120dB dynamic range, imaging in light levels down to 0.08 lx, and power efficiency of 26mW at the sensor level.
The sensor developed with Sony uses Sony’s 3D stacking technology and Cu-Cu interconnects to shrink the pixel to 4.86µm with 80 per cent fill factor; the previous generation of the sensor, Gen 3, based on a 180nm CIS process, has 15µm pixels with 25 per cent fill factor.
The two Sony sensors have a dynamic range of 86dB, and resolutions of 1,280 x 720 pixels (IMX636) and 640 x 512 pixels (IMX637).
As in all imaging, resolution helps with accuracy, for detecting small vibrations in a narrow field of view, for instance, although Verre said the objective is not to reach a multi-megapixel image. However, the bio-inspired nature of the technology has led to some clever tricks to generate high-resolution images.
The human eye is constantly moving, making lots of micro-movements called saccades. The eye does not have a huge physical resolution because the number of receptors is limited, but the brain reconstructs a high-resolution image using these saccades.
In the same way, it’s been shown that by putting Prophesee’s VGA sensor on a piezoelectric stage and then shaking it, a multi-megapixel image can be reconstructed from the stream of events produced by the movement.
Verre said, while not a commercial solution, a metrology system maker in Japan has been experimenting with the sensor to work on surface inspection. The VGA sensor wasn’t able to identify scratches on the surface initially, but when some small vibrations were introduced, over time, the sensor accumulated enough information to identify these defects.
‘This is an interesting approach,’ Verre said, ‘that showed that our technology can use the super high time precision that we have to generate spatial resolution – you can trade off time resolution to generate spatial resolution.’
The new Sony sensor opens up consumer imaging markets for the technology, and the investment from Sinovation Ventures, Xiaomi, along with Inno-Chip, reinforces this. Speaking about the Sinovation investment, Verre commented: ‘They clearly see that our approach to AI is very original, potentially a technology platform that can serve applications from machine vision to IoT, mobile, automotive, drone, robots, etc, which are all important segments worldwide.’
Sinovation was founded by Dr Kai-Fu Lee, a pioneer in AI, and has more than $2.5 billion assets under management.
The involvement of Xiaomi, Verre said, is an investment from a strategic angle. Neuromorphic imaging has the potential to address some of the pain points found in mobile phone cameras, namely motion blur and slow motion video.
There have been various research papers, Verre said, on combining data from an event-based sensor with that from a frame-based sensor. Event data is not constrained by frame rate, by clock, and therefore it can give a better understanding of motion in the scene, and potentially correct for motion blur.
Also, a frame-based sensor running at 10fps could be augmented with event-based data to generate slow motion video without huge amounts of data. The event stream could be used to reconstruct a sequence of images in between each frame of a traditional rolling shutter sensor.
Connecting the dots
‘One of the key challenges for us since the beginning has been to connect dots in the ecosystem,’ Verre said. ‘We started with this sensor technology with fundamental benefits. Very quickly we realised that to make sure you convey these benefits to the end-user you need to work with camera makers and system makers, SoC vendors, software partners, and system integrators. We invested a lot of time and resources to bring all these partners together. We are glad to work with companies like Imago in Germany and Century Arks in Japan, as well as Lucid Vision Labs, Framos, and Macnica ATD Europe. It’s important that they help us to deliver a full solution to the market.’
Prophesee has released an evaluation kit for the Sony sensor; a software development kit with 95 algorithms, 67 code samples, and 11 ready-to-use applications; and a set of open source software modules to optimise machine learning training and inference for event-based applications, including optical flow and object detection. The open source tools have so far registered more than 500 unique users, Verre said, ‘which implies that more engineers and inventors are taking on our technology to start evaluating it, experimenting with it, and creating solutions.’
Cambridge Consultants developed an automated system to look for contamination in cell samples using the firm’s evaluation kit, while Xperi built a driver monitoring solution using Prophesee’s evaluation kit and SDK.
‘Our effort will be to keep providing evaluation kits, development kits, camera reference designs to camera makers to facilitate the integration work so more cameras will use an event-based sensor,’ Verre said. ‘We will keep enriching our SDK with more fundamental algorithms but also application examples, with models that are both commercial – part of the software is only accessible with a licence – but a lot is now available for free because we also want to have a wider community of users.’
Prophesee is also offering training to system integrators and customers. It has put in place a field application team with almost 20 field application engineers worldwide, as well as a network of eight distributors and system integrators for industrial imaging, companies like Framos and Macnina ATD Europe.
It has also reached an agreement with SynSense to develop low power solutions for event-based vision on edge computing. SynSense, founded in 2017, provides neuromorphic computing with a line of asynchronous, event-based vision processors that have low power consumption and low latency. The partnership will combine, in a single chip, SynSense’s vision SNN processor, Dynap-CNN, with Prophesee’s event-based Metavision sensors. The aim is to develop a line of modules that can be manufactured at high volume.
Verre said that more functionality will be integrated into the sensor, ‘working with more partners like Sony and other foundry and image sensor companies to make sure we can open up this market opportunity of event-based technology.’
Prophesee’s technology lends itself to on-sensor processing much more so than a frame-based approach.
Verre added: ‘There will not only be Sony; other companies like Samsung and CIS companies will enter the space, which will open up event-based sensing to more markets.
‘Our role as a pioneer is to keep innovating, stay ahead, make sure we connect dots at the software and system level and work with as many partners as possible, companies like NXP and Qualcomm, so everyone in the ecosystem is able to build a full application.’