Cutting through the crowd
There’s an old anecdote about image analysis that – while probably mostly urban myth, and certainly getting long in the tooth – highlights an important point.
So the story goes, many years ago the US military wanted to teach tank-based computers to recognise other tanks and speed up identification during a battle setting. They took 50 photos – the number changes according to who tells it – of tanks in a forest setting, then took the tanks away and took the same photos and fed them into the computer to have it identify what was different. They failed to control for one variable: the weather. During the switch the weather had changed from sunny to cloudy, and the military had spent millions teaching a computer to recognise a cloudy day.
Three of the key reasons for this failure are the low resolution of images, the small sample size and the variable nature of outside light – the dynamic range of light from a sunny day and a cloudy day is immense.
Camera technology and database sizes have changed massively since this supposedly happened, and so too have the capabilities of computers. A middling to basic smartphone has a faster processor and can hold more images at higher quality than the computers the military would have had at their disposal. And the scale of the image analysis on even the software used by these phones is astounding.
As for industrial systems, these integrate zoom lenses, high-power LED lighting, more powerful PCs, better white balance, and wide fields of view. In short, the image quality will be many orders of magnitudes higher, especially when moving to 4K resolution.
According to IHS’s report on ‘The top surveillance trends for 2016’, 4K isn’t yet set for mass market adoption. While the analyst house says the number of HD CCTV sold will be ‘over 28 million units in 2016’, more than 40 per cent of the ‘66 million network cameras predicted to ship globally’, the volume of 4K cameras supplied will make up less than one per cent of the total. Indeed, the report says it is unlikely that more than a million 4K network cameras will be shipped globally in a calendar year until 2018.
Do we need it?
According to Stemmer Imaging’s director of corporate market development, Mark Williamson, HD, and particularly 4K, ‘lets you have a wider field of view, covering more area and still be able to run [algorithms for the] face recognition.
‘CCTV footage from five to 10 years ago, even if you had a camera pointed above a till and pointed at a face, you’d still struggle to recognise the face, certainly with a computer.’
Williamson said that a face can typically be recognised from an image with a resolution in the region of 300 x 300 to 500 x 500 pixels. ‘If you go back to original CCTV, that would be one face. If you go to HD, that would be eight faces. If you go to 4K, that would be 32 faces. So what that means is, as you get to a certain point, you’re not worried about getting extra resolution. Unlike for TV, you’re not worried about being able to see the finer details, you just want to be able to recognise a face.
‘What tends to happen is, as you go to 4K, where originally to capture faces they would have needed four cameras, now they only need one. Or, they would have stuck one camera in a place and be limited by cost. For CCTV, 4K gives you wider coverage and is still able to recognise a face.’
In 2013, three people died and 264 were injured during the Boston Marathon after bombs were detonated near the finish line. For this year’s event, which will take place on 18 April, the organisers were keen to stress in press communications that they were monitoring the area with HD CCTV.
The city has worked with Lan-Tel Communications to install a series of 30x zoom PTZ cameras from DVTel (now Flir) across the Marathon route. While no information has been given by Lan-Tel on the specific camera used, given the quoted zoom and the official images, the camera is likely to be the Quasar CP-4221-301, which uses a 1/2.8-inch progressive scan CMOS sensor and has a 4.3-129mm focal length.
At the time, Lan-Tel told the NPR news website the system ‘is programmed to turn automatically toward the sound of gunshots’, was of high-enough resolution that a camera over the Red Sox baseball field could tell if a ‘pitcher has thrown a strike or a ball’, and that the software system would soon be able to alert police when crowds form, or when a certain suspect is recognised by a camera.
Managing the data
A move to 4K gives a lot more data to work with, but this also means a lot more processing and a lot more storage. Compression is getting better, but one emerging trend to manage this data is a shift towards feature detection, rather than true facial recognition. This was pioneered in vehicle recognition, where part of a number plate or a particular make, model and colour of vehicle is tracked and recorded, with extraneous information deleted.
The technology has been developed by several companies and is in use across Europe, including at Gatwick international airport in Scotland, where Stemmer Imaging’s technology is used. Williamson said: ‘Let’s say you have a tip-off on a person with glasses and a beard; you’d be able to go through [the video] and [tell the computer to show you just these]. You can then do this processing as you are acquiring the data, tagging it to say a person has these attributes: male or female, glasses, hat, no hat. And you can also get a lot more specific.
‘Before [you’d tell the system to] take me to the points where there’s motion. Now it’s: take me to the points where I see a male, with glasses and has black hair.’
Doing this gives a high level of accuracy. Williamson said: ‘You tend to set the algorithms to err on the side of safety, bringing up false positives rather than misses.
‘However, accuracy obviously depends on the image quality… and if it’s a poor image it’s less reliable. You can do gender detection up in the high 90 per cent range.’
Another way of managing data is to look for suspicious behaviour. Among the first to launch such a system was the Dutch security vision systems manufacturer, Pebble Group.
Having detected suspicious behaviour – for example a person moving from car to car to car – a person in a control room is alerted and can then talk to the potential perpetrator via a VoIP link in order to dissuade him from continuing. However the algorithms can also be used to identify potentially relevant video for long-term storage.
Matching CCTV footage to databases
‘What you’ll start to see with CCTV in time,’ Williamson noted, ‘is that, because of the mass of data, you’ll record for shorter periods. They’ll then tag and clip the data that’s interesting, so rather than just having a linear recording, eventually everything will be database driven.’
In our personal lives, we’re now used to seeing IT systems like Facebook and Google’s Picasa run facial recognition software on images, and it’s really impressively accurate (if occasionally slightly creepy).
However, these databases are huge; both Facebook and Google’s are several orders of magnitude bigger than the FBI’s, and a lot of the leg work in tagging these with information about who’s in it has been done by their users tagging friends and family in the first place, something that isn’t available for CCTV systems.
The other challenge that CCTV recognition systems face that Facebook and Google don’t is the angle that a photo is taken: police and FBI mug shots are taken from straight on and from the side only, whereas the social media sites have people tagged from every angle.
For many scenarios this can be easily overcome. The Westfield shopping centre in the UK, for example, uses smaller area monitoring, rather than standard CCTV, to identify individuals that have previously been banned from the shopping centre, running facial recognition software against its database of known shoplifters and offenders in order to alert security.
The system uses a very high resolution camera to cover a massive entrance to capture the face wherever a person walks in. The cameras for this are, necessarily, discrete and so combine higher resolution cameras with longer lenses to make it less obvious.
However, discretion isn’t always a good thing. In airports, for example, there will typically be specific access points and, other than in the main duty-free shopping hall, the flow of people will be unidirectional. This makes it easy to have just a standard, static machine vision camera, typically between two and five megapixels, with Ethernet connectivity. In such a location, where the cameras are there to track people’s flow through the airport rather than to identify illicit behaviour, it gives better results if the cameras are prominent, and capture attention as people walk underneath.
Stemmer Imaging, for example, has developed a housing for airport security cameras that looks like a small jet engine and uses spinning LEDs to attract attention. According to Williamson it is ‘placed on the ceiling above where people are walking, and the whole idea is the LED moving round makes people look up, wondering what it is. When [the airport] implemented this, it increased catch rate... significantly.’
However, for standard CCTV, there is still the issue of the angle at which the image is taken. One potential solution about to be implemented by Tokyo’s Metropolitan Police Department (MPD) is 3D imaging. According to reports from the Japanese newspaper, Asahi Shimbun, ‘starting in April, the MPD will place a 3D camera in all of its 102 police stations, and its identification section will manage all the 3D mug shots taken with those cameras.’
The move follows 15 years of testing by the MPD’s research lab, comparing people in custody with those on CCTV.
Asahi Shumbun was told that ‘in photos taken with security cameras on the street and other locations, the faces are looking downward or sideways in many instances. Using the 3D facial images in the database, police can more accurately determine whether the suspect shown in the photo taken by the security camera is the one in custody,’ and that, in many cases, ‘it has been difficult to compare [the old-style side and front 2D] photos with images taken with security cameras’.
The MPD is creating a 3D image database of the arrested suspect, using cameras to capture the image from three positions. The police force can then adjust the angle and size of the face using off-the-shelf image manipulation software to compare it with the CCTV image.
As one high-ranking officer of the MPD told Asahi Shimbun, ‘as we can identify the suspects more quickly and accurately, our arrest rate is expected to become greater.’
Security imaging looks to 4K sensors - Jessica Rowbury reports from the Image Sensors 2015 conference, which took place in London earlier in the month, on the need for 4K imaging in security