— Smart surveillance systems capable of automatically detecting violent crimes could soon be available.
A computer vision system developed in the University of Texas in Austin, US, can already tell the difference between friendly behaviour, such as shaking hands, and aggressive actions like punching or pushing.
The hope is that such systems will simplify the task of monitoring huge quantities of CCTV security footage, says Sangho Park, who worked on the project with colleague Jake Aggarwal.
In the past, surveillance systems have been developed that can detect statistically "unusual" behaviour and have been put to test at subway stations, for example (see Smart statistics keep eye on CCTV).
However, these systems are limited to spotting unusual visual patterns and cannot pick out specific types of activity, says Park, now based at the University of California in San Diego, US. He reckons his new system could change all that.
He and Aggarwal developed software that analyses each frame of footage and identifies clusters of pixels matching a primitive model of the human body. It then examines the interplay of different clusters, in order to classify interactions between individuals. Two videos show the system classifying a hug, and then classifying a push (both .avi format).
Many interactions can be visually ambiguous, however. A person offering someone a stick of gum or a cigarette can look similar to someone being threatened with a knife, for example. To cope with this, Park and Aggarwal chose to build up a profile for each type of behaviour.
Park calls it a "semantic analysis" of the interaction. This means several different factors are considered. For example, when identifying two people shaking hands, their hands must not only be close, but must also move in synchrony.
They meticulously coded a description of these key characteristics, which the software searches for when analysing a scene. This allows it to assign a probability that a particular activity is being observed. At the moment, the system has to capture the interaction from side-on to make its evaluation.
Hugging and punching
"The system works quite accurately," says Park. Tests were carried out on six different pairs of people performing a total of 54 different staged interactions including hugging, punching, kicking and shaking hands. On average, the system was 80% accurate at identifying these activities correctly.
According to Park, a commercial version of his system could be implemented within the next few years.
Mark Everingham, a computer vision researcher at the University of Leeds, UK, says the system needs some refinement: "The vision end of their work is very constrained."
Park accepts this is a limitation, but says he is refining the system to take advantage of multiple cameras and different angles.
Everingham notes that automatically identifying human behaviour correctly is a major challenge, and says applications need not be limited to CCTV. If loaded onto a smart TV it could, for example, make it possible to search for "fight scenes" and other types of activity in a movie.
Journal reference: Computer Vision and Image Understanding (DOI: 10.1016/j.cviu.2005.07.011)