J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, pp.1470-1477, 2003.
DOI : 10.1109/ICCV.2003.1238663

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, Visual categorization with bags of keypoints, Workshop on statistical learning in computer vision, ECCV, pp.1-2, 2004.

S. Lazebnik, C. Schmid, and J. Ponce, Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, in: Computer Vision and Pattern Recognition, pp.2169-2178, 2006.

F. Li and P. Perona, A bayesian hierarchical model for learning natural scene categories, Computer Vision and Pattern Recognition, vol.2, pp.524-531, 2005.

S. Kim, X. Jin, and J. Han, DisIClass, Proceedings of the Tenth International Workshop on Multimedia Data Mining, MDMKDD '10, pp.1-710, 2010.
DOI : 10.1145/1814245.1814252

D. Liu, G. Hua, P. A. Viola, and T. Chen, Integrated feature selection and higher-order spatial feature extraction for object categorization, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587403

S. Savarese, J. Winn, and A. Criminisi, Discriminative Object Class Models of Appearance and Shape by Correlatons, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), pp.2033-2040, 2006.
DOI : 10.1109/CVPR.2006.102

L. Wu, M. Li, Z. Li, W. Ying-ma, and N. Yu, Visual language modeling for image classification, Proceedings of the international workshop on Workshop on multimedia information retrieval , MIR '07, pp.115-124, 2007.
DOI : 10.1145/1290082.1290101

J. Qin and N. H. Yung, Scene categorization via contextual visual words, Pattern Recognition, vol.43, issue.5, pp.1874-1888, 2010.
DOI : 10.1016/j.patcog.2009.11.009

G. Zhou, Z. Wang, J. Wang, and D. Feng, Spatial context for visual vocabulary construction, International Conference on Image Analysis and Signal Processing, pp.176-181, 2010.

N. Morioka and S. Satoh, Building Compact Local Pairwise Codebook with Joint Feature Space Clustering, Proceedings of the 11th European Conference on Computer Vision: Part I, ECCV'10, pp.692-705, 2010.
DOI : 10.1007/978-3-642-15549-9_50

J. Krapac, J. J. Verbeek, and F. Jurie, Modeling spatial layout with fisher vectors for image categorization, 2011 International Conference on Computer Vision, pp.1487-1494, 2011.
DOI : 10.1109/ICCV.2011.6126406

URL : https://hal.archives-ouvertes.fr/inria-00612277

P. Tirilly, V. Claveau, and P. Gros, Language modeling for bag-of-visual words image categorization, Proceedings of the 2008 international conference on Content-based image and video retrieval, CIVR '08, pp.249-258, 2008.
DOI : 10.1145/1386352.1386388

URL : https://hal.archives-ouvertes.fr/hal-00811922

J. Yuan, Y. Wu, and M. Yang, Discovery of collocation patterns: from visual words to visual phrases, in: Computer Vision and Pattern Recognition, pp.1-8, 2007.

J. Yuan, Y. Wu, and M. Yang, From frequent itemsets to semantically meaningful visual patterns, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '07, pp.864-873, 2007.
DOI : 10.1145/1281192.1281284

E. Zhang and M. Mayo, Improving Bag-of-Words model with spatial information, 2010 25th International Conference of Image and Vision Computing New Zealand, 2010.
DOI : 10.1109/IVCNZ.2010.6148795

Y. Zheng, H. Lu, C. Jin, and X. Xue, Incorporating Spatial Correlogram into Bag-of-Features Model for Scene Categorization, Asian Conference on Computer Vision, pp.333-342, 2009.
DOI : 10.1007/978-3-642-12307-8_31

Y. Zheng, M. Zhao, S. Neo, T. Chua, and Q. Tian, Visual synset: Towards a higher-level visual representation, in: Computer Vision and Pattern Recognition, pp.1-8, 2008.

J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman, Discovering objects and their localization in images, International Conference on Computer Vision, pp.370-377, 2005.

S. Zhang, Q. Tian, G. Hua, Q. Huang, and S. Li, Descriptive visual words and visual phrases for image applications, Proceedings of the seventeen ACM international conference on Multimedia, MM '09, pp.75-84, 2009.
DOI : 10.1145/1631272.1631285

N. M. Elfiky, F. S. Khan, J. Van-de-weijer, and J. Gonzàlez, Discriminative compact pyramids for object and scene recognition, Pattern Recognition, vol.45, issue.4, pp.1627-1636, 2012.
DOI : 10.1016/j.patcog.2011.09.020

D. Parikh, Recognizing jumbled images: The role of local and global information in image classification, 2011 International Conference on Computer Vision, pp.519-526, 2011.
DOI : 10.1109/ICCV.2011.6126283

J. C. Van-gemert, J. Geusebroek, C. J. Veenman, and A. W. Smeulders, Kernel Codebooks for Scene Categorization, pp.696-709, 2008.
DOI : 10.1007/978-3-540-88690-7_52

X. Zhou, N. Cui, Z. Li, F. Liang, and T. S. Huang, Hierarchical gaussianization for image classification, pp.1971-1977, 2009.

A. Bosch, A. Zisserman, and X. Munoz, Representing shape with a spatial pyramid kernel, Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR '07, pp.401-408, 2007.
DOI : 10.1145/1282280.1282340

T. Harada, Y. Ushiku, Y. Yamashita, and Y. Kuniyoshi, Discriminative spatial pyramid, CVPR 2011, pp.1617-1624, 2011.
DOI : 10.1109/CVPR.2011.5995691

Y. Cao, C. Wang, Z. Li, L. Zhang, and L. Zhang, Spatial-bag-of-features, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3352-3359, 2010.
DOI : 10.1109/CVPR.2010.5540021

J. Huang, S. R. Kumar, M. Mitra, W. Zhu, and R. Zabih, Image indexing using color correlograms, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.762-768, 1997.
DOI : 10.1109/CVPR.1997.609412

Y. Zhang, Z. Jia, and T. Chen, Image retrieval with geometry-preserving visual phrases, CVPR 2011, pp.809-816, 2011.
DOI : 10.1109/CVPR.2011.5995528

Z. Wu, Q. Ke, M. Isard, and J. Sun, Bundling features for large scale partial-duplicate web image search, Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR '09, pp.25-32, 2009.

J. Matas, O. Chum, M. Urban, and T. Pajdla, Robust wide baseline stereo from, British Machine Vision Conference, pp.384-393, 2002.

A. Agarwal and B. Triggs, Hyperfeatures – multilevel local coding for visual recognition, Proceedings of the 9th European Conference on Computer Vision -Volume Part I, ECCV'06, pp.30-43, 2006.

R. Khan, C. Barat, D. Muselet, and C. Ducottet, Spatial orientations of visual word pairs to improve Bag-of-Visual-Words model, Procedings of the British Machine Vision Conference 2012
DOI : 10.5244/C.26.89

URL : https://hal.archives-ouvertes.fr/ujm-00738708

Y. Yang and S. Newsam, Spatial pyramid co-occurence for image classification, in: Internation Conference of Computer Vision, 2011.

T. Harada, H. Nakayama, and Y. Kuniyoshi, Improving Local Descriptors by Embedding Global and Local Spatial Information, Proceedings of the 9th European Conference on Computer Vision, pp.736-749, 2010.
DOI : 10.1007/978-3-642-15561-1_53

D. G. Lowe, Object recognition from local scale-invariant features, Proceedings of the Seventh IEEE International Conference on Computer Vision, pp.1150-1157, 1999.
DOI : 10.1109/ICCV.1999.790410

J. Yang, K. Yu, Y. Gong, and T. S. Huang, Linear spatial pyramid matching using sparse coding for image classification, in: Computer Vision and Pattern Recognition, pp.1794-1801, 2009.

P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, p.511, 2001.
DOI : 10.1109/CVPR.2001.990517

T. Deselaers and V. Ferrari, Global and efficient self-similarity for object classification and detection, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.1633-1640, 2010.
DOI : 10.1109/CVPR.2010.5539775

E. Shechtman and M. Irani, Matching Local Self-Similarities across Images and Videos, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2007.
DOI : 10.1109/CVPR.2007.383198

J. C. Van-gemert, C. J. Veenman, A. W. Smeulders, and J. Geusebroek, Visual Word Ambiguity, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.7, pp.1271-1283, 2010.
DOI : 10.1109/TPAMI.2009.132

L. Liu, L. Wang, and X. Liu, In defense of soft-assignment coding, IEEE, pp.2486-2493, 2011.

I. Dhillon, S. Mallela, and R. Kumar, A divisive information-theoretic feature clustering algorithm for text classification, pp.1265-1287, 2003.

J. Zhang, M. Marsza-lek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study, International Journal of Computer Vision, vol.36, issue.1, pp.213-238, 2007.
DOI : 10.1007/s11263-006-9794-4

URL : https://hal.archives-ouvertes.fr/inria-00548574

Y. Su and F. Jurie, Visual word disambiguation by semantic contexts, 2011 International Conference on Computer Vision, pp.311-318, 2011.
DOI : 10.1109/ICCV.2011.6126257

URL : https://hal.archives-ouvertes.fr/hal-00808655

A. Oliva and A. Torralba, Modeling the shape of the scene: A holistic representation of the spatial envelope, International Journal of Computer Vision, vol.42, issue.3, pp.145-175, 2001.
DOI : 10.1023/A:1011139631724

L. Fei-fei, R. Fergus, and P. Perona, Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories, Workshop on Generative-Model Based Vision, 2004.
DOI : 10.1016/j.cviu.2005.09.012

G. Griffin, A. Holub, and P. Perona, Caltech-256 object category dataset, 2007.

K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, Procedings of the British Machine Vision Conference 2011, 2011.
DOI : 10.5244/C.25.76

M. J. Swain and D. H. Ballard, Color indexing, International Journal of Computer Vision, vol.31, issue.1, pp.11-32, 1991.
DOI : 10.1007/BF00130487

J. Yang, Y. Jiang, A. G. Hauptmann, and C. Ngo, Evaluating bag-of-visual-words representations in scene classification, Proceedings of the international workshop on Workshop on multimedia information retrieval , MIR '07, pp.197-206, 2007.
DOI : 10.1145/1290082.1290111

J. Wang, J. Yang, K. Yu, F. Lv, T. Huang et al., Gong, Locality-constrained linear coding for image classification, Conference on Computer Vision and Pattern Recognition, pp.3360-3367, 2010.

A. Vedaldi and B. Fulkerson, Vlfeat, Proceedings of the international conference on Multimedia, MM '10, 2008.
DOI : 10.1145/1873951.1874249

S. Mccann and D. G. Lowe, Spatially Local Coding for Object Recognition, Proceedings of the 2012 Asian Conference on Computer Vision, pp.204-217, 2013.
DOI : 10.1007/978-3-642-37331-2_16

L. Seidenari, G. Serra, A. D. Badanov, and A. D. Bimbo, Local Pyramidal Descriptors for Image Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, issue.5, pp.1033-1040, 2013.
DOI : 10.1109/TPAMI.2013.232

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang et al., Decaf: A deep convolutional activation feature for generic visual recognition, Proceedings of the 31st International

S. Avila, N. Thome, M. Cord, E. Valle, A. De et al., Pooling in image representation: The visual codeword point of view, Computer Vision and Image Understanding, vol.117, issue.5, pp.453-465, 2013.
DOI : 10.1016/j.cviu.2012.09.007

URL : https://hal.archives-ouvertes.fr/hal-01172709

F. Perronnin, J. Snchez, and T. Mensink, Improving the Fisher Kernel for Large-Scale Image Classification, European Conference of Computer Vision, 2010.
DOI : 10.1007/978-3-642-15561-1_11

URL : https://hal.archives-ouvertes.fr/inria-00548630

M. Oquab, I. Laptev, L. Bottou, and J. Sivic, Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.222

URL : https://hal.archives-ouvertes.fr/hal-00911179

M. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, 1311.
DOI : 10.1007/978-3-319-10590-1_53