How to detect drowsiness with a neural net?

Share this on social media:

Tags: 

Greg Blackman reports on the complexities of training AllGo Systems' driver monitoring neural networks, which the firm's VP of engineering, Nirmal Kumar Sancheti, spoke about at the Embedded World trade fair

How can a machine be taught to spot if a driver is distracted or drowsy? A number of car manufacturers have already installed driver monitoring systems in their vehicles –Toyota being one of the first in 2006 – but the technology to do this reliably has a lot of inherent complexities.

Nirmal Kumar Sancheti, vice president of engineering for AllGo Systems, presented the firm’s driver monitoring system, called See ‘n Sense, at the Embedded World trade fair in Nuremberg at the end of February.

See ‘n Sense was first demonstrated at CES 2018, and makes use of trained neural networks to identify behaviour that suggests the driver is distracted or sleepy. It was first shown on a GPU, but is now ported and optimised for an Arm platform in order to keep cost down, Sancheti said at Embedded World.

The system considers a number of parameters, including head pose estimation, gaze detection, and eye state analysis - blink rate and blink duration – to reach a conclusion about how attentive or otherwise the driver is. It has to do this in varying light conditions, and recognise features for different ages, genders, ethnicities, and expressions. Making a common framework for all these situations is ‘really difficult’, Sancheti said. The system also has to deal with occlusions, such as caps or scarves covering faces.

AllGo Systems uses neural networks for its classification building blocks, as deep learning can ‘generalise way better’ than classical feature extraction, Sancheti said. Conventional approaches to image processing would struggle with fine tuning parameters and generalising – to work in all light conditions for instance. ‘There are a lot of issues with conventional approaches that we try to solve using deep learning,’ Sancheti added.

While deep learning is preferable to using conventional methods for this type of image processing problem, it still has its drawbacks, namely the huge amount of data required for accurate results, and the computational power needed. ‘No one is ready to put a GPU in a car for a driver monitoring system,’ Sancheti remarked.

See ‘n Sense captures images with an infrared camera and infrared illuminator to avoid interference from visible light. It starts by detecting the person’s face. It then looks at head pose, estimating the way the head is oriented in three angles: yaw – or how much the head is turned – pitch and roll. Another network is run on a cropped image of the eyes to pinpoint where the person is looking, giving pitch and yaw angles for gaze. Head position alone is not enough to say whether the person is distracted, which is why it is combined with gaze direction. 

The eyes are then analysed to classify whether the eye is open or closed, as well as how long it has been closed or open. Blink rate and duration are important parameters for detecting drowsiness. The system uses a neural network to make the classification, but because it’s only concentrating on the eye region, which is small, AllGo Systems found that a small network was sufficient. Also, because the system is cropping out the eyes for gaze estimation and eye state analysis, it can deal with a lot of occlusions.

Depth of data

Data is key in deep learning. AllGo Systems gathered data for its deep learning algorithms under different lighting conditions, and using different subjects with different expressions – imaged while yawning, smiling, laughing, etc. The system was also trained to detect the face and eyes when the subject was in different poses - if, for instance, only a portion of the face is visible to the camera.

‘One problem is how do you ground truth the data?’ Sancheti said. ‘If I’ve collected data of a face, how do I tell where you are looking? I don’t know what the angle of your head is from that picture. Manual intervention becomes extremely hard.’

When collecting data for face detection, AllGo Systems made the test subject look in all directions, and asked them to put on different expressions. The person was asked to do a fast blink and a slow blink for the purpose of eye state detection. ‘The hardest problem is that if you want to collect drowsiness data, you cannot ask the person to act sleepy,’ Sancheti explained. ‘We are trying to detect drowsiness based on blink rate and other parameters that use pupil dilation – but no single measure in there is proof of drowsiness.’

For head pose estimation, AllGo Systems used an auxiliary camera trained on markers on the back of the head. So, as the main camera captures an image of the person’s face, the auxiliary camera tracks the markers to show how the head turns.

Collecting data for gaze estimation involved asking people to focus on a screen with a dot that was moved around, and the gaze direction recorded irrespective of head pose.

Once the data is collected and the ground truth established, training was relatively straightforward, according to Sancheti. The company used GPUs and high-end CPUs to train the system, which took a couple of hours to days depending on the data size and the complexity of the problem. The company then optimised See ‘n Sense for an embedded Arm platform.

In terms of choosing a neural network, Sancheti advised not to pick a model that is overly complex. The developer then has to go through a process of optimising the model in terms of quantising the weights – not every weight needs to be floating point, Sancheti said. He added that AllGo Systems would run the training cycle multiple times using fixed weights to find the most efficient process, and said that it’s good to consider different weights at the time of training before truncating the network. The system then was optimised for an embedded target.

Company: 

Related analysis & opinion

26 July 2019

Limited data is a common problem when training CNNs in industrial imaging applications. Petra Thanner and Daniel Soukup, from the Austrian Institute of Technology, discuss ways of working with CNNs when data is scarce

Zeiss's Smartzoom 5 digital microscope can remove glare from images by using angular illumination

23 May 2019

Reporting from the EMVA’s business conference in Copenhagen, Greg Blackman discovers how angular illumination and computational imaging can dramatically improve the resolution of a system

09 October 2018

A group at the University of Bologna is trying to make images from Grand Theft Auto more realistic so that they can act as training data for neural networks. Greg Blackman listens to Pierluigi Zama Ramirez’s presentation at the European Machine Vision Forum in Bologna in September

24 May 2018

Data is now a fiercely guarded asset for most companies and, as the European General Data Protection Regulation (GDPR) comes into force, Framos’ Dr Christopher Scheubel discusses potential new business models based on 3D vision data, following a talk he gave at the Embedded Vision Summit in Santa Clara this week

20 June 2019

The UK is up to 20 per cent less productive than its major competitor countries because it is not investing in automation, Mike Wilson at the British Automation and Robot Association said at UKIVA's machine vision conference in Milton Keynes. Greg Blackman reports

Related features and analysis & opinion

26 July 2019

Limited data is a common problem when training CNNs in industrial imaging applications. Petra Thanner and Daniel Soukup, from the Austrian Institute of Technology, discuss ways of working with CNNs when data is scarce

Zeiss's Smartzoom 5 digital microscope can remove glare from images by using angular illumination

23 May 2019

Reporting from the EMVA’s business conference in Copenhagen, Greg Blackman discovers how angular illumination and computational imaging can dramatically improve the resolution of a system

29 March 2019

Greg Blackman reports from Embedded World, in Nuremberg, where he finds rapid progress in technology for imaging at the edge

16 November 2018

Online retail sales in the US exceeded $453 billion in 2017, according to the US Department of Commerce. Although this may seem like a substantial amount, it only accounts for 13 per cent of the total retail sales made in the region throughout the year, meaning the majority of transactions still take place via the millions of customers walking through their doors and aisles every day.

26 July 2019

As car makers install production lines for electric vehicles, Greg Blackman looks at how vision is currently used in their factories

20 June 2019

The UK is up to 20 per cent less productive than its major competitor countries because it is not investing in automation, Mike Wilson at the British Automation and Robot Association said at UKIVA's machine vision conference in Milton Keynes. Greg Blackman reports

29 March 2019

Ahead of the Control trade fair in May, Greg Blackman speaks to Fraunhofer IPM about three new systems it will present for quality inspection

15 November 2018

In an effort to improve operational efficiency and minimise or eradicate defects, a growing number of manufacturers in the automotive sector are introducing machine vision technology in quality control processes for components, modules, sub-assemblies and finished vehicles. So, what are the main current and potential applications of machine vision technology in automotive manufacturing quality control applications? What are likely to be the key innovations and trends in this area over the next few years? What role might vision technology play in the automotive factory of the future?