Researchers teach computer to recognise body language

Share this on social media:

Researchers have used a facility featuring hundreds of video cameras to teach a computer to recognise and understand the body language of multiple people in real time, including the movements and position of their individual fingers.

The work could open up new ways for human’s and machines to interact with each other, such as being able to direct computers by simply pointing. The corresponding computer code has been released by the researchers to encourage further research and applications.

The researchers have trained a laptop to recognise body language down to the movement of individual fingers using a single camera. (Credit: Carnegie Mellon University’s Robotics Institute)

The researchers from Carnegie Mellon University’s Robotics Institute in Pennsylvania, USA, used a two-story dome embedded with 500 video cameras to teach a laptop computer to detect the exact poses of a group of people in 2D using a single camera.

Yaser Sheikh, associate professor of robotics at the university, explained that the work could enable new, more natural ways for machines to interact with humans, as understanding the nonverbal communication between individuals would allow robots to serve in social spaces, perceiving what people around them are doing, what moods they are in and whether they can be interrupted.

‘We communicate almost as much with the movement of our bodies as we do with our voice, but computers are more or less blind to it,he said.

The work could also have applications in sports analytics, where real-time pose detection would make it possible for computers to know exactly what players are doing with their arms, legs and heads at each point in time. Autonomous vehicles could also benefit from the technolgy, monitoring the body language of pedestrians to determine whether they are about to step into the street.

Further potential applications have also been identified in teaching machines to recognise conditions such as autism, dyslexia and depression, which could enable new approaches to behavioural diagnoses and rehabilitation.

Tracking multiple interacting people in real time has presented a number of challenges in the past, therefore Sheikh and his colleagues took a bottom-up approach that first localises the types of body parts in a scene, and then associates those parts with particular individuals.

The researchers took a bottom-up approach that first localises the types of body parts in a scene, and then associates those parts with particular individuals. (Credit: Carnegie Mellon University’s Robotics Institute)

The challenges for hand and finger detection were even greater however, as a camera is unlikely to see all the parts of the hand at the same time when objects are being held or gestures are made. The researchers therefore used the university’s Panoptic Studio containing 500 video cameras to obtain multiple images of different hand positions, enabling them to train a computer to predict the position of hidden appendages in an image taken by a single frontal camera.

‘The Panoptic Studio supercharges our research,’ Sheikh said. ‘It now is being used to improve body, face and hand detectors by jointly training them. Also, as work progresses to move from the 2D models of humans to 3D models, the facility’s ability to automatically generate annotated images will be crucial.’

To encourage further research and application, the researchers have released their computer code for both multi-person and hand pose estimation. It is already being widely used by research groups and more than 20 commercial groups, including automotive companies, which have expressed interest in licensing the technology.

Sheikh and his colleagues will present reports on their multiperson and hand-pose detection methods at the Computer Vision and Pattern Recognition Conference at the end of July in Honolulu.

Related news

Flir Firefly DL. Credit: Flir

26 March 2021

Customers can create deep learning models using Neurala’s Brain Builder software and then upload these to a Flir Firefly DL camera

02 March 2021

Teledyne e2v and Yumain aim to make AI technology more accessible by addressing challenges associated with AI

01 March 2021

Amazon Lookout for Vision is a cloud service, providing a machine learning model that customers can train with their images to spot production defects

29 January 2021

FringeAI will form a new AI solutions group within LMI. It offers an AI/IIoT solution package that includes deep learning, edge devices and cloud services