According to Professor James Elder, co-author of a York University study published today, deep convolutional neural networks (DCNNs) don’t see objects the same way humans do, using configural shape perception, and this could be dangerous in real-world AI applications.
Deep learning models fail to capture the configural nature of human shape perception, is the title of a joint study by Nicholas Baker at Loyola College in Chicago and Elder, who holds the York Research Chair in Human and Computer Vision and is Co-Director of York’s Centre for AI & Society. Baker was a former VISTA postdoctoral fellow at York.
The study used novel visual stimuli known as “Frankensteins” in order to investigate how the human brain and DCNNs process holistic, configural object properties.
Frankensteins, according to Elder, are simply items that have been improperly disassembled and reassembled.”As a result, they have all the good things about the area, but in the wrong places.”
The researchers found that the human visual system is confused by Frankensteins, but DCNNs are not. This is because DCNNs are not sensitive to configural object properties.
According to Elder, our findings highlight the need to take into account tasks other than object recognition in order to comprehend how the brain processes visual information. “Our results explain why deep AI models fail under certain conditions.” When attempting to solve challenging recognition tasks, these deep models frequently use “shortcuts.” Although many situations could benefit from these short cuts, some of the real-world AI applications we are working on with our business and government partners could be dangerous. Elder makes a point.
Traffic video safety systems are one such application. According to Elder, “the objects in a busy traffic scene—the vehicles, bicycles, and pedestrians—obstruct each other and arrive at the eye of a driver as a jumble of disconnected fragments.” “To determine the appropriate categories and locations of the objects, the brain must correctly group those fragments. This task will be impossible for an AI traffic safety monitoring system that can only understand the fragments individually, possibly misinterpreting risks to vulnerable road users. ”
The researchers found that attempts to improve networks’ training and architecture in order to make them more brain-like did not result in configural processing, and none of the networks could successfully predict trial-by-trial human object judgments. Elder says, “We think that networks need to be trained to solve a wider range of object tasks than just recognizing categories if they are to match human configural sensitivity.”