A Discriminative Model of Stochastic Edit Distance in the form of a Conditional Transducer - Université Jean-Monnet-Saint-Étienne Accéder directement au contenu
Communication Dans Un Congrès Année : 2006

A Discriminative Model of Stochastic Edit Distance in the form of a Conditional Transducer

Marc Bernard
  • Fonction : Auteur
  • PersonId : 836237
Jean-Christophe Janodet
  • Fonction : Auteur
  • PersonId : 836238
Marc Sebban

Résumé

Many real-world applications such as spell-checking or DNA analysis use the Levenshtein edit-distance to compute similarities between strings. In practice, the costs of the primitive edit operations (insertion, deletion and substitution of symbols) are generally hand-tuned. In this paper, we propose an algorithm to learn these costs. The underlying model is a probabilitic transducer, computed by using grammatical inference techniques, that allows us to learn both the structure and the probabilities of the model. Beyond the fact that the learned transducers are neither deterministic nor stochastic in the standard terminology, they are conditional, thus independant from the distributions of the input strings. Finally, we show through experiments that our method allows us to design cost functions that depend on the string context where the edit operations are used. In other words, we get kinds of \textit{context-sensitive} edit distances.
Fichier principal
Vignette du fichier
Bernard_Janodet_Sebban_ICGI_2006.pdf (221.47 Ko) Télécharger le fichier
Loading...

Dates et versions

ujm-00109698 , version 1 (17-11-2006)

Identifiants

  • HAL Id : ujm-00109698 , version 1

Citer

Marc Bernard, Jean-Christophe Janodet, Marc Sebban. A Discriminative Model of Stochastic Edit Distance in the form of a Conditional Transducer. 8th International Colloquium on Grammatical Inference, Sep 2006, Tokyo, Japan. pp.240-252. ⟨ujm-00109698⟩
175 Consultations
298 Téléchargements

Partager

Gmail Facebook X LinkedIn More