Learning Balls of Strings from Edit Corrections

Leonor Becerra Bonache; Colin de La Higuera; Jean-Christophe Janodet; Frédéric Tantini

Article Dans Une Revue Journal of Machine Learning Research Année : 2008

Learning Balls of Strings from Edit Corrections

(1) , (2) , (2) , (2)

1
2

Leonor Becerra Bonache

Fonction : Auteur

Department of Computer Science

Colin de La Higuera

Fonction : Auteur
PersonId : 18019
IdHAL : colin-de-la-higuera
IdRef : 053503503

Laboratoire Hubert Curien

Jean-Christophe Janodet

Fonction : Auteur

Laboratoire Hubert Curien

Frédéric Tantini

Fonction : Auteur
PersonId : 836491

Laboratoire Hubert Curien

Résumé

When facing the question of learning languages in realistic settings, one has to tackle several problems that do not admit simple solutions. On the one hand, languages are usually defined by complex grammatical mechanisms for which the learning results are predominantly negative, as the few algorithms are not really able to cope with noise. On the other hand, the learning settings themselves rely either on too simple information (text) or on unattainable one (query systems that do not exist in practice, nor can be simulated). We consider simple but sound classes of languages defined via the widely used edit distance: the balls of strings. We propose to learn them with the help of a new sort of queries, called the correction queries: when a string is submitted to the Oracle, either she accepts it if it belongs to the target language, or she proposes a correction, that is, a string of the language close to the query with respect to the edit distance. We show that even if the good balls are not learnable in Angluin's MAT model, they can be learned from a polynomial number of correction queries. Moreover, experimental evidence simulating a human Expert shows that this algorithm is resistant to approximate answers.

Mots clés

grammatical inference oracle learning correction queries edit distance balls of strings

Domaines

Apprentissage [cs.LG]

Frédéric Tantini : Connectez-vous pour contacter le contributeur

https://ujm.hal.science/ujm-00314590

Soumis le : mercredi 27 août 2008-16:06:42

Dernière modification le : vendredi 24 mars 2023-14:52:50

Dates et versions

ujm-00314590 , version 1 (27-08-2008)

Identifiants

HAL Id : ujm-00314590 , version 1

Citer

Leonor Becerra Bonache, Colin de La Higuera, Jean-Christophe Janodet, Frédéric Tantini. Learning Balls of Strings from Edit Corrections. Journal of Machine Learning Research, 2008, 9, pp.1823-1852. ⟨ujm-00314590⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ST-ETIENNE IOGS CNRS LINA-TALN LAHC PARISTECH UDL

302 Consultations

0 Téléchargements

Learning Balls of Strings from Edit Corrections

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager