Sequence graphs realizations and ambiguity in language models - Département d'informatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Sequence graphs realizations and ambiguity in language models

Résumé

Several natural language models rely on an assumption modeling each word context as a bag of words. We study the combinatorial implications of such assumption for the corresponding word or sentences representations. In particular , we present theoretical results concerning the family of sequence graphs, for which realizations yield equivalent representations given this assumption. Several combinatorial problems are presented, depending on three levels of generalisation (window size, graph orientation, and weights), and whether some of these are NP-complete is left opened. Based on these results, we also establish different algorithms, including a dynamic programming formulation, to count and explicit the different realizations of a sequence graph. This allows us to show that the bag of words assumption can induce an important number of sentences to have the same representations, even for relatively short context window sizes.
Fichier principal
Vignette du fichier
Sequence_graphs_ambiguity_language_models_HAL_v1.pdf (623.16 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02495333 , version 1 (01-03-2020)
hal-02495333 , version 2 (04-03-2020)
hal-02495333 , version 3 (13-01-2021)
hal-02495333 , version 4 (18-05-2023)

Identifiants

  • HAL Id : hal-02495333 , version 3

Citer

Sammy Khalife, Yann Ponty, Laurent Bulteau. Sequence graphs realizations and ambiguity in language models. COCOON 2021 - 27th International Computing and Combinatorics Conference, Oct 2021, Tainan, Taiwan. ⟨hal-02495333v3⟩

Collections

LIGM_MOA
156 Consultations
214 Téléchargements

Partager

Gmail Facebook X LinkedIn More