— The software was able distinguish animals from non-animals in the same way that humans (Image: Thomas Serre / MIT)
The human brain can tell the difference between a tiger and a moving branch in less than 20 milliseconds, an ability that can be crucial to survival. Now, a software model of the visual cortex is shining light on how we do it.
As well as offering an explanation for how we make such snap decisions, the software may also provide new ways to build intelligent vision software for robots and security cameras.
The human brain shows evidence of distinguishing between animals and non-animals and faces and non-faces even before a person is are aware of having seen anything. In 1989, Simon Thorpe, now at CNRS in Toulouse, France, first suggested that this extraordinary ability might be the result of an initial "sweep" of neuronal activity occurring before "feedback loops" inside the brain have time to kick in.
Now Tomaso Poggio and colleagues at MIT in the US have built a computer model that appears to support this description of how humans achieve rapid visual recognition. "I am very excited about this," says Thorpe, who was not involved with the work. "It confirms the hypothesis that I made in 1989."
The model contains a simulation of groups of neurons found in the human visual cortex and mimics the response of these neurons to visual features. Signals are passed from one group of neuron to the next in the same hierarchical fashion as in the brain.
Lines and edges
The process starts with neurons associated with basic feature recognition and moves up to ones that perform more sophisticated recognition tasks. The first set of neurons identifies lines and edges, while the next identifies different ways in which lines and edges intersect. This escalation in complexity continues through to neurons that fire when a particular category of objects such as animals is recognised.
Crucially, unlike the human brain, the model does not have the ability to do "back projections", where a signal higher up the neuronal chain is fed backwards to an earlier neuron group for more detailed analysis.
Nonetheless, when shown 150 animal images, and 150 non-animal pictures, the software classified them with the similar accuracy as human subjects. While the human brain performs this task in just 20 milliseconds, according to brainscans, the software takes much longer.
The fact that the software can assign the images without performing "back projections" suggests that humans rely on the same trick, as Thorpe originally suggested. "It confirms the conjecture that these very rapid categorisation tasks are done without the need for feedback," says Poggio.
The software even incorrectly classified the same images as human participants, strengthening evidence that the computer model is doing rapid visual recognition in the same way, says team member Thomas Serre: "It's not proof, but it's very strong evidence."
The next stage of the project is to teach the software how to perform back projections, to probe how humans recognise objects over longer periods.
"Poggio's work is really outstanding," says Luis von Ahn, a computer vision expert at Carnegie Mellon University in the UK, "both in terms of advancing our understanding how our brains work, and attempting to write a computer program that can see as well as we can."
The model may have practical applications. In a separate study published in February 2007 (IEEE Transactions on Pattern Analysis and Machine Intelligence (vol 20, p 411), the team showed that such a model can efficiently recognise objects in street scenes.
Other computer vision systems do not closely mimic the neurons in the brain, so this offers a new direction for vision software. "This is the first time I have seen neuroscience able to teach something to computer science," Poggio says.
Journal reference: Proceedings of the National Academy of Sciences (DOI: 10.1073/pnas.0700622104)