Dict2vec : Learning Word Embeddings using Lexical Dictionaries

Julien Tissier; Christophe Gravier; Amaury Habrard

Communication Dans Un Congrès Année : 2017

Dict2vec : Learning Word Embeddings using Lexical Dictionaries

(1) , (1) , (1)

Julien Tissier

Fonction : Auteur

Laboratoire Hubert Curien

Christophe Gravier

Fonction : Auteur
PersonId : 3235
IdHAL : cgravier
ORCID : 0000-0001-8586-6302
IdRef : 12599396X

Laboratoire Hubert Curien

Amaury Habrard

Fonction : Auteur
PersonId : 439
IdHAL : amaury-habrard
ORCID : 0000-0003-3038-9347
IdRef : 084103655

Laboratoire Hubert Curien

Résumé

Learning word embeddings on large unla-beled corpus has been shown to be successful in improving many natural language tasks. The most efficient and popular approaches learn or retrofit such representations using additional external data. Resulting embeddings are generally better than their corpus-only counterparts, although such resources cover a fraction of words in the vocabulary. In this paper, we propose a new approach, Dict2vec, based on one of the largest yet refined datasource for describing words – natural language dictionaries. Dict2vec builds new word pairs from dictionary entries so that semantically-related words are moved closer, and negative sampling filters out pairs whose words are unrelated in dictionaries. We evaluate the word representations obtained using Dict2vec on eleven datasets for the word similarity task and on four datasets for a text classification task.

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

emnlp2017.pdf (212.73 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Christophe Gravier : Connectez-vous pour contacter le contributeur

https://ujm.hal.science/ujm-01613953

Soumis le : jeudi 12 octobre 2017-10:26:56

Dernière modification le : jeudi 20 avril 2023-03:20:00

Dates et versions

ujm-01613953 , version 1 (12-10-2017)

Identifiants

HAL Id : ujm-01613953 , version 1

Citer

Julien Tissier, Christophe Gravier, Amaury Habrard. Dict2vec : Learning Word Embeddings using Lexical Dictionaries. Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Sep 2017, Copenhague, Denmark. pp.254-263. ⟨ujm-01613953⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ST-ETIENNE IOGS CNRS LAHC PARISTECH UDL

622 Consultations

791 Téléchargements

Dict2vec : Learning Word Embeddings using Lexical Dictionaries

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager