Thanks for visiting Imaging and Machine Vision Europe.

You're trying to access an editorial feature that is only available to logged in, registered users of Imaging and Machine Vision Europe. Registering is completely free, so why not sign up with us?

By registering, as well as being able to browse all content on the site without further interruption, you'll also have the option to receive our magazine (multiple times a year) and our email newsletters.

New activity-recognition algorithm to be unveiled

Share this on social media:

Hamed Pirsiavash, a postdoc at MIT, and his former thesis advisor, Deva Ramanan of the University of California at Irvine, will present a new activity-recognition algorithm at IEEE's Conference on Computer Vision and Pattern Recognition, 24-27 June.

One of the advantages is that the algorithm’s execution time scales linearly with the size of the video file it’s searching. The algorithm is able to predict actions part of the way through in incomplete videos, which allows it to handle streamed videos and issue a probability that the action is of the type that it’s looking for. The amount of memory the algorithm requires is fixed, regardless of how many frames of video it’s already reviewed. That means that, unlike many of its predecessors, it can handle video streams of any length.

The algorithm is based on the one used for natural language processing. ‘One of the challenging problems they try to solve is, if you have a sentence, you want to basically parse [scan and analyse] the sentence, saying what is the subject, what is the verb, what is the adverb,’ said Pirsiavash. ‘We see an analogy here, which is, if you have a complex action — like making tea or making coffee — that has some subactions, we can basically stitch together these subactions and look at each one as something like verb, adjective, and adverb.’

These subactions follow similar patterns to grammar; the order of actions is sometimes interchangeable, while some must occur in a certain order that is similar to the organisation of different word types. Using machine vision to learn these rules, Pirsiavash and Ramanan feed their algorithm training examples of videos depicting a particular action, and specify the number of subactions that the algorithm should look for. But they don’t give it any information about what those subactions are, or what the transitions between them look like.

The rules relating subactions are the key to the algorithm’s efficiency. As a video plays, the algorithm constructs a set of hypotheses about which subactions are being depicted where, and it ranks them according to probability. It can’t limit itself to a single hypothesis, as each new frame could require it to revise its probabilities. But it can eliminate hypotheses that don’t conform to its grammatical rules, which dramatically limits the number of possibilities it has to canvass.

The researchers tested their algorithm on eight different types of athletic activity with training videos taken from the internet. They found that, according to metrics standard in the field of computer vision, their algorithm identified new instances of the same activities more accurately than its predecessors.

Pirsiavash is particularly interested in possible medical applications of action detection. The proper execution of physical-therapy exercises, for instance, could have a grammar that’s distinct from improper execution; similarly, the return of motor function in patients with neurological damage could be identified by its unique grammar. Action-detection algorithms could also help determine whether, for instance, elderly patients remembered to take their medication — and issue alerts if they didn’t.

Recent News

24 October 2019

Imec says the new production method promises an order of magnitude gain in fabrication throughput and cost compared to processing conventional infrared imagers

04 October 2019

Each pixel in Prophesee’s Metavision sensor only activates if it detects a change in the scene – an event – which means low power, latency and data processing requirements

18 September 2019

3D sensing company, Outsight, has introduced a 3D semantic camera that combines lidar ranging with hyperspectral material analysis. The camera was introduced at the Autosens conference in Brussels

16 September 2019

OmniVision Technologies will be showing an automotive camera module at the AutoSens conference in Brussels from 17 to 19 September, built using OmniVision’s OX03A1Y image sensor with an Arm Mali-C71 image signal processor