Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Paper Introduction] A Survey on Emergent Language

Avatar for Mikako Ochiai Mikako Ochiai
June 12, 2025
9

[Paper Introduction] A Survey on Emergent Language

2025/06/12
Paper introduction @TanichuLab
https://sites.google.com/view/tanichu-lab-ku/

Avatar for Mikako Ochiai

Mikako Ochiai

June 12, 2025
Tweet

Transcript

  1. Grounding Metrics: Divergence • Proposed by Havrylov and Titov [15]

    • A metric for evaluating the alignment between EL and NL • It ensurse that the statistical properties of EL messages resemble those of NL. • It allows that the same word can correspond to completely different concepts in the induced EL and NL (Weak grounding). • G_div: Grounding Divergence • k: sample • m: message • P_NL: true NL distribution (Can be approximated by trained language model)
  2. Grounding Metrics: Purity • Proposed by Lazaridou et al. [14]

    • A metric for evaluating the alignment between predefined semantic categories and those observed in an EL. • It requires the existence of predefined ground- truth labels {C_k} : a set of clusters c_k: corresponding majority ground-truth label for each cluster sample C_k 1. Form clusters by grouping samples based on the most frequently activated words to describe them. 2. Evaluate the quality of these clusters using Purity metric
  3. Grounding Metrics: Representational Similarity Analysis • Proposed by Kriegeskorte et

    al. • Measures the global agreement between spaces, independent of their dimensionality • Has been employed to compare the similarity of embedding space structures between input, sender, and receiver in a referential game oξ: agent ξ’s observation. e (oξ) : ground truth structured embedding. φξ: internal meaning representation of agent ξ. R: conversion to rank vectors. (+) Applicable to heterogeneous agents and arbitrary input spaces. (-) Necessity for an embedding (not directly applicable to the language itself, particularly for discrete languages).
  4. Compositionality • In EL research, achieving compositionality often requires deliberate

    guidance, as it does not naturally arise without specific interventions. • When a language is truly compositional, its components can be systematically rearranged or substituted with conceptually equivalent components without altering the overall meaning A language is compositional if L_prod and L_comp act as a homomorphism. (e.g.) L_comp: comprehension function L_prod: production function
  5. Compositionality Metrics: Topographic similarity (topsim) • Proposed by Brighton and

    Kirby, first applied to EL by Lazaridou et al. • Measure Spearman correlation between the pairwise distances in the input and message spaces (internal alignment within an agent’s meaning and message spaces. ) • The intuition behind this measure: semantically similar objects should have similar messages 1. sample k meaning representations from Φ. ∆L, ∆Φ: distance function for language and meaning space discrete space: Hamming or Levenshtein distance continuous spaces: cosine or Euclidean distance 2. Genrate corresponding messages for each sample. 3. Calculate ρ.
  6. Compositionality Metrics: Positional Disentanglement • Proposed by Chaabouni et al.

    • A metric for evaluating the extent to which words in specific positions within a message uniquely correspond to particular attributes of the input. • “posdis assumes a message whose length equals the number of attributes in the input object, and where each message token, in a specific position, represents a single attribute” • characteristic feature of NL structures and is essential for the emergence of sophisticated syntactic patterns m: message w_p: word at position p. f: feature vector of the ground truth.
  7. Compositionality Metrics: Bag of Symbols Disentanglement • Proposed by Chaabouni

    et al. • Relaxes the assumption of posdis; introduce permutation-invariance. • Order of words is irrelevant, and only the frequency of words carries meaning. • Maintains the requirement that each symbol uniquely refers to a distinct meaning, but shifts the focus to symbol counts as the primary informative element.
  8. Compositionality Metrics: Tree Reconstruct Error(TRE) (1/2) • Quantifies the discrepancy

    between a compositional approximation and the actual structure, using a composition function and a distance metric. • Measures the accuracy with which a given communication protocol can be reconstructed while adhering to the compositional structure of the derivation or embedding of the input e ∈ E. • Assumes prior knowledge of the compositional structure within the input data (an oracle setting). (+) • Flexibility across different settings, whether discrete or continuous. • Allows for various choices of compositionality functions, distance metrics, and other parameters (-) • requirement for an oracle-provided ground truth. • Necessity of pre-trained continuous embeddings.
  9. Compositionality Metrics: Tree Reconstruct Error(TRE) (1/2) :Pretrained ground truth oracle

    :Learned language speaker :Learnable approximation function of TRE that satisfies: embedding consistency compositionality k (k ∈ K): sample from dataset m (m ∈ M): corresponding message Requirements δ : Distance fucntion η : Learnable parameters of a distance fucntion ◦ : Compositionality function e ∈ E : Pre-trained embeddings of ground truth
  10. Compositionality Metrics: Tree Reconstruct Error(TRE) (3/3) • Minimize the distance

    between the output of the learned language speaker and the approximation function, based on the ground truth The datum level The dataset level • TRE can be calcualated at two levels: • TRE value of zero indicates perfect reproduction of compositionality.
  11. • Proposed in Kuciński et al. • Quantify the extent

    to which the assignment of features to words in a language deviates from the word’s principal meaning • Useful in scenarios where the language employs synonyms. • (Skip detail) Compositionality Metrics: Conflict Count
  12. • the meaning of each word must be consistent across

    different contexts. • Inconsistent word meanings can render a language practically useless, even if the language is semantically grounded and exhibits compositional properties. • In dialogue settings, particularly in the absence of explicit regularization mechanisms, words often fail to maintain consistent groundings across different instances, leading to ambiguity and reduced communicative effectiveness. Consistency
  13. • Mutual information between messages and their corresponding input features.

    • Ideally, a consistent language will exhibit a high degree of overlap between messages and features, leading to a high mutual information value. Consistency Metrics: Mutual Information
  14. Consistency Metrics: Coherence • Proposed by Bogin et al. •

    Metric for evaluating whether words within a language maintain consistent semantics across varying contexts. • May be considered restrictive, particularly in languages where synonyms are prevalent. • range from 0 to 1. • 1 indicating perfect alignment (each word retains its meaning consistently across different contexts and is thus used coherently). P (w | f) : Probability that a word w is used when a feature f is present. P (f | w) : Probability that a feature f appears when a word w is used
  15. Consistency Metrics: Entropy • Example use in Ohmer et al.

    • A high NI score indicates a strong predictive relationship between messages and features, reflecting high consistency.
  16. Jaccard Similarity Coefficient • Quantifies the similarity between two sets

    by comparing the size of their intersection to the size of their union • Mξi and Mξj represent sets of messages generated by different agents based on the same input. The similarity ranges from 0 to 1, with 1 indicating complete overlap and thus perfect similarity (-) only applicable to scenarios where multiple agents generate messages about the same set of objects. Consistency Metrics: Similarity
  17. Generalization • Generalization in ELs reflects their ability to extend

    beyond specific training instances to novel situations.
  18. unseen input • Evaluate with test set consisting of samples

    with feature combinations not encountered during training. • Evaluate with objects that resemble training data but have unseen properties or entirely novel combinations of features • Evaluate with entirely new input scenarios, such as testing the ability of agents to generalize across different game types (-) requires a ground truth oracle to withhold feature combinations unseen partner (cross-play or zero-shot coordination) • Evaluate models by pairing agents that did not communicate during training. (-) can introduce inefficiencies by requiring additional resources to train novel communication partners for testing. Generalization Metrics: Zero Shot Evaluation
  19. • Proposed by Chaabouni et al. [25] • Evaluates how

    easily new listeners can adapt to an EL on distinct tasks. • Assessing how effectively a deterministic language, developed by a fixed set of speakers, can be transferred to new listeners who are trained on tasks different from the original one for which the language was optimized. Generalization Metrics: Ease of Transfer Learning(ETL)
  20. Summary • Grounding • Divergence • Purity • RSA •

    Compositionality • TopSim • Posdis • Bosdis • TRE • Conflict Count • Consistency • MI • Correlation • Coherence • Entropy (Normalized MI) • Jaccard Similarity • Generalization • Zero-shot eval • ETL