How to detect drowsiness with a neural net?

Share this on social media:


Greg Blackman reports on the complexities of training AllGo Systems' driver monitoring neural networks, which the firm's VP of engineering, Nirmal Kumar Sancheti, spoke about at the Embedded World trade fair

How can a machine be taught to spot if a driver is distracted or drowsy? A number of car manufacturers have already installed driver monitoring systems in their vehicles –Toyota being one of the first in 2006 – but the technology to do this reliably has a lot of inherent complexities.

Nirmal Kumar Sancheti, vice president of engineering for AllGo Systems, presented the firm’s driver monitoring system, called See ‘n Sense, at the Embedded World trade fair in Nuremberg at the end of February.

See ‘n Sense was first demonstrated at CES 2018, and makes use of trained neural networks to identify behaviour that suggests the driver is distracted or sleepy. It was first shown on a GPU, but is now ported and optimised for an Arm platform in order to keep cost down, Sancheti said at Embedded World.

The system considers a number of parameters, including head pose estimation, gaze detection, and eye state analysis - blink rate and blink duration – to reach a conclusion about how attentive or otherwise the driver is. It has to do this in varying light conditions, and recognise features for different ages, genders, ethnicities, and expressions. Making a common framework for all these situations is ‘really difficult’, Sancheti said. The system also has to deal with occlusions, such as caps or scarves covering faces.

AllGo Systems uses neural networks for its classification building blocks, as deep learning can ‘generalise way better’ than classical feature extraction, Sancheti said. Conventional approaches to image processing would struggle with fine tuning parameters and generalising – to work in all light conditions for instance. ‘There are a lot of issues with conventional approaches that we try to solve using deep learning,’ Sancheti added.

While deep learning is preferable to using conventional methods for this type of image processing problem, it still has its drawbacks, namely the huge amount of data required for accurate results, and the computational power needed. ‘No one is ready to put a GPU in a car for a driver monitoring system,’ Sancheti remarked.

See ‘n Sense captures images with an infrared camera and infrared illuminator to avoid interference from visible light. It starts by detecting the person’s face. It then looks at head pose, estimating the way the head is oriented in three angles: yaw – or how much the head is turned – pitch and roll. Another network is run on a cropped image of the eyes to pinpoint where the person is looking, giving pitch and yaw angles for gaze. Head position alone is not enough to say whether the person is distracted, which is why it is combined with gaze direction. 

The eyes are then analysed to classify whether the eye is open or closed, as well as how long it has been closed or open. Blink rate and duration are important parameters for detecting drowsiness. The system uses a neural network to make the classification, but because it’s only concentrating on the eye region, which is small, AllGo Systems found that a small network was sufficient. Also, because the system is cropping out the eyes for gaze estimation and eye state analysis, it can deal with a lot of occlusions.

Depth of data

Data is key in deep learning. AllGo Systems gathered data for its deep learning algorithms under different lighting conditions, and using different subjects with different expressions – imaged while yawning, smiling, laughing, etc. The system was also trained to detect the face and eyes when the subject was in different poses - if, for instance, only a portion of the face is visible to the camera.

‘One problem is how do you ground truth the data?’ Sancheti said. ‘If I’ve collected data of a face, how do I tell where you are looking? I don’t know what the angle of your head is from that picture. Manual intervention becomes extremely hard.’

When collecting data for face detection, AllGo Systems made the test subject look in all directions, and asked them to put on different expressions. The person was asked to do a fast blink and a slow blink for the purpose of eye state detection. ‘The hardest problem is that if you want to collect drowsiness data, you cannot ask the person to act sleepy,’ Sancheti explained. ‘We are trying to detect drowsiness based on blink rate and other parameters that use pupil dilation – but no single measure in there is proof of drowsiness.’

For head pose estimation, AllGo Systems used an auxiliary camera trained on markers on the back of the head. So, as the main camera captures an image of the person’s face, the auxiliary camera tracks the markers to show how the head turns.

Collecting data for gaze estimation involved asking people to focus on a screen with a dot that was moved around, and the gaze direction recorded irrespective of head pose.

Once the data is collected and the ground truth established, training was relatively straightforward, according to Sancheti. The company used GPUs and high-end CPUs to train the system, which took a couple of hours to days depending on the data size and the complexity of the problem. The company then optimised See ‘n Sense for an embedded Arm platform.

In terms of choosing a neural network, Sancheti advised not to pick a model that is overly complex. The developer then has to go through a process of optimising the model in terms of quantising the weights – not every weight needs to be floating point, Sancheti said. He added that AllGo Systems would run the training cycle multiple times using fixed weights to find the most efficient process, and said that it’s good to consider different weights at the time of training before truncating the network. The system then was optimised for an embedded target.


Related analysis & opinion

02 December 2019

Takashi Someda, CTO at Hacarus, on the advantages of sparse modelling AI tools

28 February 2020

Paul Wilson, managing director of Scorpion Vision, describes what it takes to install a 3D robot vision system in a Chinese foundry

14 December 2020

Greg Blackman reports on the views of panellists from AIT, MVTec, Irida Labs, and Xilinx discussing AI and machine vision

18 March 2020

Michał Czardybon, CEO of Adaptive Vision, on how to handle data and image annotation when working with deep learning

27 January 2020

Prior to speaking at the Embedded World trade fair, The Khronos Group’s president, Neil Trevett, discusses the open API standards available for applications using machine learning and embedded vision

Related features and analysis & opinion

A setup for photometric stereo imaging in which multiple lights are used to illuminate an object from different directions. Credit: Advanced illumination

04 June 2020

Matthew Dale explores the power of computational imaging, all made possible by clever illumination

Engineers at KYB in front of a pick-and-place solution for handling steel metal cylinders. Credit: Pickit

03 August 2020

Car manufacturing has been hit hard by Covid-19, but the need for automation on production lines has not diminished, as Greg Blackman finds out

Vision Components MIPI modules can be connected to various embedded processors, including Nvidia Jetson boards

17 February 2021

Greg Blackman examines the effort that goes into creating an embedded vision system

14 December 2020

Greg Blackman reports on the views of panellists from AIT, MVTec, Irida Labs, and Xilinx discussing AI and machine vision

MVTec’s deep learning tool provides means to label data simply and efficiently

10 November 2020

Matthew Dale explores the software to simplify the management and labelling of deep learning data

MVTec’s Halcon software library includes a deep learning OCR tool with pre-trained fonts from a wide range of industries. Credit: MVTec

03 August 2020

Matthew Dale explores vision solutions for code reading and inspection in pharmaceutical production