Comparing deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals a hierarchical correspondence

Radoslaw M. Cichy
Aditya Khosla
Dimitrios Pantazis
Antonio Torralba
Aude Oliva
MIT
MIT
MIT
MIT
MIT

Construction of the object deep neural model (object DNN) used in [1]: The DNN architecture comprised 8 layers. Each of layers 1-5 contained a combination of convolution, max-pooling and normalization stages, whereas the last three layers were fully connected. The DNN takes pixel values as inputs and propagates information feed-forward through the layers, activating model neurons successively at each layer.


Figure: Object DNN architecture

The object DNN was trained to perform object categorization on everyday object categories (683 categories, with ~1300 images in each category from the ImageNet database [2]) using back propagation. You can download it here.

To compare representations between the deep neural networks and human brains, we used a 118-image set of natural objects on real-world backgrounds. These 118 images were not used for training the deep object network to avoid circular inference. With 94% correct performance in a top-five categorization task on this 118 image set, the network performed at a level comparable to humans.

Acknowledgements

This work was funded by National Eye Institute grant EY020484 (to A.O.), a National Science Foundation Award 1532591 (to A.O., A.T and D.P.), a Google Research Faculty Award (to A.O.), a Feodor Lynen Scholarship of the Humboldt Foundation (to R.M.C), the McGovern Institute Neurotechnology Program (to A.O. and D.P.), and was conducted at the Athinoula A. Martinos Imaging Center at the McGovern Institute for Brain Research, Massachusetts Institute of Technology.

References

[1] Cichy RM, Khosla A, Pantazis D, Torralba A, and Oliva A. Comparing deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals a hierarchical correspondence. Scientific Reports, 2016. [paper] [bibtex]

[2] Deng, J. et al. ImageNet: A large-scale hierarchical image database. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009. doi:10.1109/CVPR.2009.5206848.