L. Ballan, M. Bertini, D. Bimbo, A. Serra, and G. , Video event classification using string kernels, Multimedia Tools and Applications, pp.69-87, 2010.
DOI : 10.1007/s11042-009-0351-3

S. Battiato, G. Farinella, G. Gallo, and D. Rav-`-rav-`-i, Spatial hierarchy of textons distributions for scene classification, Proceedings of the 15th International Multimedia Modeling Conference on Advances in Multimedia Modeling, MMM '09, pp.333-343, 2009.

Y. Boureau, F. Bach, Y. Lecun, and J. Ponce, Learning mid-level features for recognition, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.2559-2566, 2010.
DOI : 10.1109/CVPR.2010.5539963

Y. Cao, C. Wang, Z. Li, L. Zhang, and L. Zhang, Spatial-bag-of-features, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3352-3359, 2010.
DOI : 10.1109/CVPR.2010.5540021

X. Chen, X. Hu, and X. Shen, Spatial weighting for bag-of-visual-words and its application in contentbased image retrieval, Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD '09, pp.867-874, 2009.

M. Christodoulakis and G. Brey, EDIT DISTANCE WITH COMBINATIONS AND SPLITS AND ITS APPLICATIONS IN OCR NAME MATCHING, International Journal of Foundations of Computer Science, vol.20, issue.06, pp.1047-1068, 2009.
DOI : 10.1142/S0129054109007030

S. E. De-avila, N. Thome, M. Cord, E. Valle, and A. De-albuquerque-araújo, Pooling in image representation: The visual codeword point of view, Computer Vision and Image Understanding, vol.117, issue.5, pp.453-465, 2013.
DOI : 10.1016/j.cviu.2012.09.007
URL : https://hal.archives-ouvertes.fr/hal-01172709

S. Gao, I. W. Tsang, C. , and L. , Kernel Sparse Representation for Image Classification and Face Recognition, Computer Vision?ECCV 2010, pp.1-14, 2010.
DOI : 10.1007/978-3-642-15561-1_1

T. Harada, Y. Ushiku, Y. Yamashita, and Y. Kuniyoshi, Discriminative spatial pyramid, CVPR 2011, pp.1617-1624, 2011.
DOI : 10.1109/CVPR.2011.5995691
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.662.6786

J. He, S. Chang, and L. Xie, Fast kernel learning for spatial pyramid matching, CVPR, 2008.

C. Iovan, D. Picard, N. Thome, and M. Cord, Classification of Urban Scenes from Geo-referenced Images in Urban Street-View Context, 2012 11th International Conference on Machine Learning and Applications, pp.339-344, 2012.
DOI : 10.1109/ICMLA.2012.171
URL : https://hal.archives-ouvertes.fr/hal-00794980

K. Khurshid, C. Faure, and N. Vincent, A Novel Approach for Word Spotting Using Merge-Split Edit Distance, Computer Analysis of Images and Patterns, pp.213-220, 2009.
DOI : 10.1007/978-3-642-03767-2_26

P. N. Klein, T. B. Sebastian, and B. B. Kimia, Shape matching using edit-distance: an implementation, Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pp.781-790, 2001.

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), pp.2169-2178, 2006.
DOI : 10.1109/CVPR.2006.68
URL : https://hal.archives-ouvertes.fr/inria-00548585

H. Li and T. Jiang, A Class of Edit Kernels for SVMs to Predict Translation Initiation Sites in Eukaryotic mRNAs, Journal of Computational Biology, vol.12, issue.6, pp.702-718, 2005.
DOI : 10.1089/cmb.2005.12.702

G. Seni, V. Kripasundar, and R. K. Srihari, Generalizing edit distance to incorporate domain information: Handwritten text recognition as a case study, Pattern Recognition, vol.29, issue.3, pp.405-414, 1996.
DOI : 10.1016/0031-3203(95)00102-6

G. Sharma and F. Jurie, Learning discriminative spatial representation for image classification, Procedings of the British Machine Vision Conference 2011, pp.6-7, 2011.
DOI : 10.5244/C.25.6
URL : https://hal.archives-ouvertes.fr/hal-00722820

J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, pp.1470-1477, 2003.
DOI : 10.1109/ICCV.2003.1238663

P. Tirilly, V. Claveau, and P. Gros, Language modeling for bag-of-visual words image categorization, Proceedings of the 2008 international conference on Content-based image and video retrieval, CIVR '08, pp.249-258, 2008.
DOI : 10.1145/1386352.1386388
URL : https://hal.archives-ouvertes.fr/hal-00811922

V. Viitaniemi and J. Laaksonen, Spatial extensions to bag of visual words, Proceeding of the ACM International Conference on Image and Video Retrieval, CIVR '09, 2009.
DOI : 10.1145/1646396.1646441

R. Wagner and M. Fischer, The String-to-String Correction Problem, Journal of the ACM, vol.21, issue.1, pp.168-173, 1974.
DOI : 10.1145/321796.321811

J. Yang, K. Yu, Y. Gong, and T. Huang, Linear spatial pyramid matching using sparse coding for image classification, Computer Vision and Pattern Recognition CVPR 2009. IEEE Conference on, pp.1794-1801, 2009.

Y. Yang and S. Newsam, Spatial pyramid cooccurrence for image classification, Computer Vision (ICCV), 2011 IEEE International Conference on, pp.1465-1472, 2011.

M. Yeh and K. Cheng, Fast visual retrieval using accelerated sequence matching. Multimedia, IEEE Transactions on, vol.13, issue.2, pp.320-329, 2011.