Researchers have used a facility featuring hundreds of video cameras to teach a computer to recognise and understand the body language of multiple people in real time, including the movements and position of their individual fingers.
The work could open up new ways for human’s and machines to interact with each other, such as being able to direct computers by simply pointing. The corresponding computer code has been released by the researchers to encourage further research and applications.
The researchers have trained a laptop to recognise body language down to the movement of individual fingers using a single camera. (Credit: Carnegie Mellon University’s Robotics Institute)
The researchers from Carnegie Mellon University’s Robotics Institute in Pennsylvania, USA, used a two-story dome embedded with 500 video cameras to teach a laptop computer to detect the exact poses of a group of people in 2D using a single camera.
Yaser Sheikh, associate professor of robotics at the university, explained that the work could enable new, more natural ways for machines to interact with humans, as understanding the nonverbal communication between individuals would allow robots to serve in social spaces, perceiving what people around them are doing, what moods they are in and whether they can be interrupted.
‘We communicate almost as much with the movement of our bodies as we do with our voice, but computers are more or less blind to it,’ he said.
The work could also have applications in sports analytics, where real-time pose detection would make it possible for computers to know exactly what players are doing with their arms, legs and heads at each point in time. Autonomous vehicles could also benefit from the technolgy, monitoring the body language of pedestrians to determine whether they are about to step into the street.
Further potential applications have also been identified in teaching machines to recognise conditions such as autism, dyslexia and depression, which could enable new approaches to behavioural diagnoses and rehabilitation.
Tracking multiple interacting people in real time has presented a number of challenges in the past, therefore Sheikh and his colleagues took a bottom-up approach that first localises the types of body parts in a scene, and then associates those parts with particular individuals.
The researchers took a bottom-up approach that first localises the types of body parts in a scene, and then associates those parts with particular individuals. (Credit: Carnegie Mellon University’s Robotics Institute)
The challenges for hand and finger detection were even greater however, as a camera is unlikely to see all the parts of the hand at the same time when objects are being held or gestures are made. The researchers therefore used the university’s Panoptic Studio containing 500 video cameras to obtain multiple images of different hand positions, enabling them to train a computer to predict the position of hidden appendages in an image taken by a single frontal camera.
‘The Panoptic Studio supercharges our research,’ Sheikh said. ‘It now is being used to improve body, face and hand detectors by jointly training them. Also, as work progresses to move from the 2D models of humans to 3D models, the facility’s ability to automatically generate annotated images will be crucial.’
To encourage further research and application, the researchers have released their computer code for both multi-person and hand pose estimation. It is already being widely used by research groups and more than 20 commercial groups, including automotive companies, which have expressed interest in licensing the technology.
Sheikh and his colleagues will present reports on their multiperson and hand-pose detection methods at the Computer Vision and Pattern Recognition Conference at the end of July in Honolulu.