Title: SUN: Top-down saliency using natural statistics
Abstract: Abstract When people try to find particular objects in natural scenes they make extensive use of knowledge about how and where objects tend to appear in a scene. Although many forms of such "top-down" knowledge have been incorporated into saliency map models of visual search, surprisingly, the role of object appearance has been infrequently investigated. Here we present an appearance-based saliency model derived in a Bayesian framework. We compare our approach with both bottom-up saliency algorithms as well as the state-of-the-art Contextual Guidance model of Torralba et al. (2006) at predicting human fixations. Although both top-down approaches use very different types of information, they achieve similar performance; each substantially better than the purely bottom-up models. Our experiments reveal that a simple model of object appearance can predict human fixations quite well, even making the same mistakes as people. Keywords: AttentionSaliencyEye movementsVisual searchNatural statistics Acknowledgements The authors would like to thank Antonio Torralba and his colleagues for sharing their dataset of human fixations, the LabelMe image database and toolbox, and a version of their bottom-up saliency algorithm. We would also like to thank Paul Ruvolo, Matus Telgarsky, and everyone in GURU (Gary's Unbelievable Research Unit) for their feedback and advice. This work was supported by the NIH (Grant No. MH57075 to GWC), the James S. McDonnell Foundation (Perceptual Expertise Network, I. Gauthier, PI), and the NSF (Grant No. SBE-0542013 to the Temporal Dynamics of Learning Center, GWC, PI and IGERT Grant No. DGE-0333451 to GWC and V. R. de Sa). CK is also funded by a Eugene Cota-Robles fellowship.