UJM at INEX 2008 XML mining Track

Abstract : This paper reports our experiments carried out for the INEX XML Mining track, consisting in developing categorization (or classification) and clustering methods for XML documents. We represent XML documents as vectors of index terms. For our first participation, the purpose of our experiments is twofold: Firstly, our overall aim is to set up a categorization text only approach that can be used as a baseline for further work which will take into account the structure of the XML documents. Secondly, our goal is to define two criteria based on terms distribution for reducing the size of the index. Results of our baseline are good and using our two criteria, we improve these results while we slightly reduce the index term. The results are slightly worse when we reduce sharply the index of terms.
Document type :
Conference papers
Complete list of metadatas

Cited literature [8 references]  Display  Hide  Download

https://hal-ujm.archives-ouvertes.fr/ujm-00366436
Contributor : Christine Largeron <>
Submitted on : Tuesday, May 26, 2009 - 3:49:40 PM
Last modification on : Wednesday, November 20, 2019 - 3:04:33 AM
Long-term archiving on: Tuesday, June 8, 2010 - 11:11:50 PM

File

paper61_xmlmining.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : ujm-00366436, version 1

Collections

Citation

Mathias Géry, Christine Largeron, Christophe Moulin. UJM at INEX 2008 XML mining Track. Workshop INtitiative for Evaluation of XML Retrieval (INEX 2008 ), Jan 2008, Dagstuhl, Germany. pp.446-452. ⟨ujm-00366436⟩

Share

Metrics

Record views

112

Files downloads

96