New activity-recognition algorithm to be unveiled

Share this on social media:

Hamed Pirsiavash, a postdoc at MIT, and his former thesis advisor, Deva Ramanan of the University of California at Irvine, will present a new activity-recognition algorithm at IEEE's Conference on Computer Vision and Pattern Recognition, 24-27 June.

One of the advantages is that the algorithm’s execution time scales linearly with the size of the video file it’s searching. The algorithm is able to predict actions part of the way through in incomplete videos, which allows it to handle streamed videos and issue a probability that the action is of the type that it’s looking for. The amount of memory the algorithm requires is fixed, regardless of how many frames of video it’s already reviewed. That means that, unlike many of its predecessors, it can handle video streams of any length.

The algorithm is based on the one used for natural language processing. ‘One of the challenging problems they try to solve is, if you have a sentence, you want to basically parse [scan and analyse] the sentence, saying what is the subject, what is the verb, what is the adverb,’ said Pirsiavash. ‘We see an analogy here, which is, if you have a complex action — like making tea or making coffee — that has some subactions, we can basically stitch together these subactions and look at each one as something like verb, adjective, and adverb.’

These subactions follow similar patterns to grammar; the order of actions is sometimes interchangeable, while some must occur in a certain order that is similar to the organisation of different word types. Using machine vision to learn these rules, Pirsiavash and Ramanan feed their algorithm training examples of videos depicting a particular action, and specify the number of subactions that the algorithm should look for. But they don’t give it any information about what those subactions are, or what the transitions between them look like.

The rules relating subactions are the key to the algorithm’s efficiency. As a video plays, the algorithm constructs a set of hypotheses about which subactions are being depicted where, and it ranks them according to probability. It can’t limit itself to a single hypothesis, as each new frame could require it to revise its probabilities. But it can eliminate hypotheses that don’t conform to its grammatical rules, which dramatically limits the number of possibilities it has to canvass.

The researchers tested their algorithm on eight different types of athletic activity with training videos taken from the internet. They found that, according to metrics standard in the field of computer vision, their algorithm identified new instances of the same activities more accurately than its predecessors.

Pirsiavash is particularly interested in possible medical applications of action detection. The proper execution of physical-therapy exercises, for instance, could have a grammar that’s distinct from improper execution; similarly, the return of motor function in patients with neurological damage could be identified by its unique grammar. Action-detection algorithms could also help determine whether, for instance, elderly patients remembered to take their medication — and issue alerts if they didn’t.

Recent News

03 September 2020

Terahertz imaging company, Tihive, has been awarded €8.6m from the European Innovation Council's Accelerator programme to scale up its industrial inspection technology

19 May 2020

The National Institute of Standards and Technology and ASTM Committee E57 have released proceedings on a workshop to define the performance of 3D imaging systems for robots in manufacturing

12 May 2020

The sensors boast a pixel pitch of 5μm thanks to Sony's stacking technology using a copper-to-copper connection. They also deliver high quantum efficiency even in the visible range

06 April 2020

Zensors' algorithms analyse feeds from CCTV cameras to provide real-time data on the number of people in an area and whether safe distances are maintained between them