Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Complexity Bias

mllewis
April 21, 2016

Complexity Bias

mllewis

April 21, 2016
Tweet

More Decks by mllewis

Other Decks in Science

Transcript

  1. Molly Lewis Stanford University 15 December 2015 The role of

    communicative pressures in shaping the lexicon
  2. However, limits to arbitrariness (Köhler, 1929; Maurer, et al., 2006;

    Ramachandran & Hubbard, 2001; Farmer, Christiansen, & Monaghan, 2006; Zipf, 1936; Piantadosi, Tily, & Gibson, 2011) horse “The linguistic sign is arbitrary” – Saussure (1916) !kalë !ناصح !ձի !at !zaldi !конь ! ঘা#া !konj !кон !cavall!kabayo ! ! !konj !kůň !hest !paard!ĉevalo !hobune !kabayo !hevonen !cheval !cabalo !Pferd !άλογο!ઘોડો !chwal!doki !סוס !घोड़ा !nees !ló !hestur !anyịnya !kuda !capall!cavallo ! !jaran !!"# !សេះ !݈ມ"າ !equo !zirgs !arklys!коњ !kuda !! !hoiho!घोडा !адуу !घोडा !hest !بسا !koń !cavalo !ਘ"ੜਾ !cal !лошадь !коњ !kôň !konj !faras !caballo !farasi!häst !!"ை !!ర#$ !ม้า !кінь !اڑوھگ !ngựa !ceffyl !!
  3. Complexity Bias A bias to map longer words (in terms

    of phonemes, morphemes, syllables) to more complex referents tupabugorn
  4. Complexity Bias Theories of communication predict tradeoff between length and

    predictability Horn Implicatures (Horn, 1984) I  turned  on  the  car. I  got  the  car  to  turn  on. TYPICAL ATYPICAL Uniform Information Density (Aylett & Turk, 2004; A. Frank & Jaeger, 2008)
  5. Outline I. Do participants have a productive complexity bias? –

    Novel real objects (Study 1) – Artificial objects (Study 2) II. What is complexity? (Study 3) III. Is there a complexity bias in the lexicon? – English (Study 4) – Cross-linguistically (Study 5) IV. Where does the lexical bias come from? (Study 6)
  6. Study 1b: Design Referent complexity x word length (within subject)

    Linguistic stimuli: – short words (e.g., "bugorn,” "ratum,” "lopus”) – long words (e.g., "tupabugorn,” "gaburatum,” "fepolopus") Referent stimuli: – Divided objects into quintiles, based on explicit complexity norms – Tested every pairing of quintiles (15 conditions): 1/1, 1/2, 1/3, 1/4, 1/5, 2/2, 2/3, etc. Procedure: 8 trials/participant
  7. Study 1b: Results • • • • • • •

    • • • • • • • • 1/1 1/2 1/3 1/4 1/5 2/2 2/3 2/4 2/5 3/3 3/4 3/5 4/4 4/5 5/5 r= −0.7 −0.25 0.00 0.25 0.50 0.50 0.75 1.00 complexity rating ratio effect size (cohen's d) N = 1500 Target biased to have long label Target less complex
  8. Evidence for a productive complexity bias in an online mapping

    task But: manipulate complexity correlationally (difficult to interpret causation) Study 2: Direct manipulation of complexity Is there a productive complexity bias?
  9. Study 2: Results N = 750 Target biased to have

    long label Target less complex • • • • • • • • • • • • • • • 1/1 1/2 1/3 1/4 1/5 2/2 2/3 2/4 2/5 3/3 3/4 3/5 4/4 4/5 5/5 r= −0.87 −0.25 0.00 0.25 0.50 0.25 0.50 0.75 1.00 1.25 complexity rating ratio effect size (cohen's d)
  10. Evidence for a productive complexity bias in online mapping task

    – Manipulating complexity both correlationally and directly Complexity quantified in terms of visual complexity But: What is the underlying complexity construct? What is complexity?
  11. In visual cognition, use processing time as index of information

    load (Alvarez & Cavanaugh, 2004) – more information requires more processing time – not perfect measure, but expect monotonic relationship – search rate task < What is complexity?
  12. Recognition memory task measure study time per object (30 objects)

    (60 objects) Study 3: Implicit complexity judgment
  13. • • • • • • • • • •

    • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • r= 0.52 7.0 7.2 7.4 7.6 0.00 0.25 0.50 0.75 1.00 Object Complexity Norms Log RT (ms) Novel object complexity norms NRT =  494 NC =  60
  14. Study 3a: Novel Real Objects • • • • •

    • • • • • • • • • • 1/1 1/2 1/3 1/4 1/5 2/2 2/3 2/4 2/5 3/3 3/4 3/5 4/4 4/5 5/5 r= −0.7 −0.25 0.00 0.25 0.50 0.50 0.75 1.00 complexity rating ratio effect size (cohen's d) Complexity Norms • • • • • • • • • • • • • • • 1/1 1/2 1/3 1/4 1/5 2/2 2/3 2/4 2/5 3/3 3/4 3/5 4/4 4/5 5/5 r= −0.71 −0.25 0.00 0.25 0.50 0.985 0.990 0.995 1.000 RT ratio effect size (cohen's d) RT Norms
  15. • • • • • • • • • •

    • • • • • 1/1 1/2 1/3 1/4 1/5 2/2 2/3 2/4 2/5 3/3 3/4 3/5 4/4 4/5 5/5 r= −0.8 −0.25 0.00 0.25 0.50 0.95 0.96 0.97 0.98 0.99 1.00 RT ratio effect size (cohen's d) Study 3b: Artificial Objects • • • • • • • • • • • • • • • 1/1 1/2 1/3 1/4 1/5 2/2 2/3 2/4 2/5 3/3 3/4 3/5 4/4 4/5 5/5 r= −0.87 −0.25 0.00 0.25 0.50 0.25 0.50 0.75 1.00 1.25 complexity rating ratio effect size (cohen's d) Complexity Norms RT Norms
  16. Exp. 1-2: suggest a productive complexity bias with novel words

    Exp. 3: Complexity bias related to processing time. Next: Is this bias present in natural languages? Study 4: Explicit complexity norms for English words Is this bias in natural language?
  17. Complexity norms Normed 499 English words 30 words/participant N =

    250 participants Word Lengths Word Length (characters) Frequency 2 4 6 8 10 12 0 20 40 60 80 100 120 140
  18. Study 4: Results r CL ŸF = .60 N  =

     250 Characters: Phonemes: r CL = .69 r CL ŸF = .61 Syllables: r CL = .67 r CL ŸF = .58 r= 0.69 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Word Length (characters) Complexity Rating Reliable controlling for concreteness, familiarity and imagability
  19. Study 5: Cross-linguistic Evidence that complexity is related to length

    in English (controlling for other semantic variables) But: does this extend to other languages? Examined relationship between word lengths for normed words in 80 languages Google translate – Native speakers hand-checked 12 languages – Accuracy: 92%
  20. 0.0 0.2 0.4 0.6 english afrikaans maltese danish norwegian macedonian

    yiddish dutch russian serbian croatian portuguese espernto galician basque bosnian welsh armanian italian swedish georgian belarusian icelandic estonian bulgarian german hungarian latvian ukranian spanish thai french nepali polish chinese czech hmong slovenian slovak mongolian hindi zulu vietnamese finnish swahili irish lao hausa filipino lithuanian haitian.creole romanian khmer punjabi catalan gujarati indonesian greek hebrew azerbaijani malay cebuana javanese albanian kanada turkish yoruba maori somali korean telugu urdu tamil bengali arabic latin japanese igbo persian marathi Language Pearson's r 0.0 0.2 0.4 0.6 english afrikaans maltese danish norwegian macedonian yiddish dutch russian serbian croatian portuguese espernto galician basque bosnian welsh armanian italian swedish georgian belarusian icelandic estonian bulgarian german hungarian latvian ukranian spanish thai french nepali polish chinese czech hmong slovenian slovak mongolian hindi zulu vietnamese finnish swahili irish lao hausa filipino lithuanian haitian.creole romanian khmer punjabi catalan gujarati indonesian greek hebrew azerbaijani malay cebuana javanese albanian kanada turkish yoruba maori somali korean telugu urdu tamil bengali arabic latin japanese igbo persian marathi Language Pearson's r 0.0 0.2 0.4 0.6 english afrikaans maltese danish norwegian macedonian yiddish dutch russian serbian croatian portuguese espernto galician basque bosnian welsh armanian italian swedish georgian belarusian icelandic estonian bulgarian german hungarian latvian ukranian spanish thai french nepali polish chinese czech hmong slovenian slovak mongolian hindi zulu vietnamese finnish swahili irish lao hausa filipino lithuanian haitian.creole romanian khmer punjabi catalan gujarati indonesian greek hebrew azerbaijani malay cebuana javanese albanian kanada turkish yoruba maori somali korean telugu urdu tamil bengali arabic latin japanese igbo persian marathi Language Pearson's r # open class words = 453 Correlation between complexity norm and word length
  21. • • • • • • • • • •

    • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Cross−Linguistic Complexity Bias Geographical distribution of complexity bias
  22. Complexity bias by language family 0.0 0.1 0.2 0.3 0.4

    Basque Kartvelian Indo−European Uralic Sino−Tibetan Tai−Kadai Hmong−Mien Austro−Asiatic Creoles and Pidgins Afro−Asiatic Altaic Austronesian Niger−Congo Korean Dravidian Japanese Language Family Pearson's r Complexity bias by language family
  23. 0.0 0.2 0.4 0.6 english afrikaans maltese danish norwegian macedonian

    yiddish dutch russian serbian croatian portuguese espernto galician basque bosnian welsh armanian italian swedish georgian belarusian icelandic estonian bulgarian german hungarian latvian ukranian spanish thai french nepali polish chinese czech hmong slovenian slovak mongolian hindi zulu vietnamese finnish swahili irish lao hausa filipino lithuanian haitian.creole romanian khmer punjabi catalan gujarati indonesian greek hebrew azerbaijani malay cebuana javanese albanian kanada turkish yoruba maori somali korean telugu urdu tamil bengali arabic latin japanese igbo persian marathi Language Pearson's r Where does the bias in language come from? Complexity bias in individual speakers over time leads to the same regularity emerging in the structure of the lexicon Productive complexity bias Lexical complexity bias
  24. Where does the bias in language come from? A mechanism

    over multiple timescales (Griffiths & Kalish, 2007) Conversational timescale (minutes) Language change timescale (many years) t
  25. Study 6a: Lexical learning Generated random lexicon – Words: 3,

    5, 7, 9, 11 characters (CV syllables) – Objects: 2 from each complexity quintile Predictions: - Word forms: Become more stable - Complexity bias: Shorten words for simple objects, lengthen words for complex objects ninop nin ninop ninopen
  26. Study 6a: Results Words are shortened for simple objects •

    • • • • −0.5 0.0 0.5 1.0 1.5 1 2 3 4 5 Complexity quintile Number characters removed Shorter words Complexity N = 50
  27. Study 6b: Iterated lexical learning Iterated learning paradigm – a

    method for simulating language change Gave the labels generated by participants to a new set of participants Iterated for total of 7 generations 50 participants/generation
  28. Study 6b: Iterated lexical learning Word forms become more stable

    Across generations, words tended to: ①become easier to remember ②shorten ③increase in bigram transitional probability ④decrease in variability ⑤decrease in word change (Levenshtein edit distance) • • • • • • • • • • • • • • 0.00 0.25 0.50 0.75 1.00 1 2 3 4 5 6 7 Generation Proportion correct Mean accuracy
  29. Study 6b: Iterated lexical learning Complexity bias persists across time

    1 2 3 4 5 6 7 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 0 1 2 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Complexity quintile Cumulative characters removed Complexity bias across generations Shorter words complexity
  30. But, why doesn’t it strengthen? Pressure to simplify suppresses complexity

    bias Two competing communicative pressures (Horn, 1984): – Speaker/learner pressure à compression – Listener pressure à differentiation Task not communicative – Presence of a listener pressure reduces compression? – Currently running version with interlocutor, and version with binary feedback
  31. Conclusion Evidence for: – a complexity bias in the lexicon

    – productive – related to a basic cognitive process – can emerge from learning biases Suggests: – complexity as constraint on arbitrariness in language – cognitive biases are reflected in the structure of the lexicon – communicative biases may shape the lexicon
  32. The learnability pressure at at the language- change timescale Learnability

    pressure as one factor influencing the morphological complexity of a language (Lupyan & Dale, 2010) Languages adapt to their particular social context à simpler if acquired by diverse population Predicts: Languages with more speakers should have smaller complexity bias.
  33. Evidence for a relationship between learnability pressure and complexity bias

    • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • r=−0.34 0.1 0.2 0.3 0.4 0.5 12.5 15.0 17.5 20.0 Log Population (million) Complexity Bias (Pearson's r) Language Family • • • • • • • • • • • • • • • • Afro−Asiatic Altaic Austro−Asiatic Austronesian Basque Creoles and Pidgins Dravidian Hmong−Mien Indo−European Japanese Kartvelian Korean Niger−Congo Sino−Tibetan Tai−Kadai Uralic
  34. 0.00 0.25 0.50 0.75 1.00 1 2 3 4 5

    Number of syllables Proportion selection complex object
  35. 0.00 0.25 0.50 0.75 1.00 1 2 3 4 5

    Number of syllables Proportion selection complex object glm(responseValue ~  len,  data=dc,  family  =  "binomial”)