Dict2vec : Learning Word Embeddings using Lexical Dictionaries

Abstract : Learning word embeddings on large unla-beled corpus has been shown to be successful in improving many natural language tasks. The most efficient and popular approaches learn or retrofit such representations using additional external data. Resulting embeddings are generally better than their corpus-only counterparts, although such resources cover a fraction of words in the vocabulary. In this paper, we propose a new approach, Dict2vec, based on one of the largest yet refined datasource for describing words – natural language dictionaries. Dict2vec builds new word pairs from dictionary entries so that semantically-related words are moved closer, and negative sampling filters out pairs whose words are unrelated in dictionaries. We evaluate the word representations obtained using Dict2vec on eleven datasets for the word similarity task and on four datasets for a text classification task.
Type de document :
Communication dans un congrès
Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Sep 2017, Copenhague, Denmark. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp.254-263, 〈http://emnlp2017.net/〉
Liste complète des métadonnées

Littérature citée [34 références]  Voir  Masquer  Télécharger

https://hal-ujm.archives-ouvertes.fr/ujm-01613953
Contributeur : Christophe Gravier <>
Soumis le : jeudi 12 octobre 2017 - 10:26:56
Dernière modification le : jeudi 11 janvier 2018 - 06:26:19

Fichier

emnlp2017.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : ujm-01613953, version 1

Collections

Citation

Julien Tissier, Christophe Gravier, Amaury Habrard. Dict2vec : Learning Word Embeddings using Lexical Dictionaries. Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Sep 2017, Copenhague, Denmark. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp.254-263, 〈http://emnlp2017.net/〉. 〈ujm-01613953〉

Partager

Métriques

Consultations de la notice

130

Téléchargements de fichiers

101