Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Word Recognition

Word Recognition

Psycholinguistic Computational/Connectionist Models for [Bilingual] Word Recognition

Morteza Ansarinia

December 15, 2016
Tweet

More Decks by Morteza Ansarinia

Other Decks in Science

Transcript

  1. Anticonstitutionnellement ηλεκτροεγκεφαλογραφήματος Antidisestablishmentarianism Pneumonoultramicroscopicsilicovolcanoconiosis Floccinaucinihilipilification Intergouvernementalisations Donaudampfschiffahrtselektrizitätenhauptbetriebswerk- bauunterbeamtengesellschaft וניתוידפולקיצנאלשכו ﺎﻫﻮﻤﻛﺎﻨﻴﻘﺴﺘﺳﺎﻓأ

    Արևաճաճանչաերկրափայլատակություն ηλεκτροεγκεφαλογράφημα Bundespräsidentenstichwahlwiederholungsverschiebung Morteza Ansarinia Institute for Cognitive Science Studies December 15, 2016 Word Recognition نارﺎﻤﺷراﻮﺨﻧﺎﯾاﺮﮕﯾروﺎﺑﺎﻨﺘﺴﻴﻨﻧادﺰﻳ
  2. Outline ✦ Intro ✦ Computational/ Connectionist Models - Spatial Coding

    Model - Letters in Time and Retinotopic Space - Bayesian Reader Model - Word2vec - Megastudies - Evidence from Neuroscience ✦ Bilingual Word Recognition - Influencing Factors and Interactions - In Isolation, and in Sentence Context ✦ MEG/EEG Studies 2
  3. Word Recognition ✦ Goal: to understand the kinds of capacities

    that underlie the rapid and almost effortless comprehension of words in reading, how we acquire words, and respective impairments. ✦ Symbolic (Modular, or box and arrow) versus connectionist. ✦ Recognition speed is related to the type of text, attention, and reader’s skill. ✦ Perceptual span: -4 to 15 letters away from fixation point. ✦ Dual-route theory: rule-governed words and exceptions. 3
  4. Word Recognition ✦ Subwords (e.g. syllables, morphemes, etc) in connectionist

    models are represented as inter-level morphological representation. ✦ meaning = word + context - CAT + PETTING = FUR - CAT + SCRATCHING = CLAWS 4
  5. Dyslexia ✦ Developmental dyslexia causes impairment in word recognition, or

    delayed phonological impairment between written and spoken content. ✦ Phonological: pronouncing novel words. ✦ Surface: reading irregular words. ✦ Deep: semantic paraphrasing. 5
  6. Spatial Coding Model ✦ Similar to IA (identity, order, and

    learned codes) but processes words of varying lengths and simulate masked priming. ✦ Spatial coding: order of letters is represented by an activation gradient over letter positions. ✦ Superposition matching rule is relatively insensitive to exactly where words begin in the input, and tolerates minor changes in the
 relative position of letters
 (superposition). 6 Letter order in STOP Norris (2013)
  7. Letters in Time and Retinotopic Space ✦ Information about letter

    identity and order accumulates stochastically over time. ✦ Developed to account for perceptual identification and masked priming. ✦ No specific assumptions about the precise form of representations (words, bigrams, trigrams, letters). ✦ Open Bigrams: WO, WR, WD, OR, … for WORD. ✦ JU*GE primes JUDGE, but not JUDPE. 7
  8. Bayesian Reader Model ✦ how much can be explained simply

    by assuming that readers make near-optimal decisions based on the accumulation of noisy evidence. ✦ Identifies word based on fewest number of samples. ✦ Letters are represented as vectors describing coordinates in a multidimensional space.
 The dimensions could be considered to correspond to letter features and positional information. ✦ Models calculate P(word|evidence) by P(word) and P(evidence|word).
 A word can be identified when P(w|e) exceeds some predetermined threshold. 8
  9. Word2vec ✦ Unsupervised generic two-layer neural net, with text corpus

    as input and features vectors for words according to that corpus. ✦ Its purpose is to group vectors of similar words
 in a 500-dimensional vectorspace. ✦ Results: - king:queen::man:woman - Trump:Republican::Obama:Democratic - monkey:human::dinosaur:fossil - knee:leg::elbow:forearm 9
  10. Other Models ✦ PDP Model: two routes for translating orthography

    to phonology: one is via meaning, the other via hidden units. Information is represented in distributed patterns of activation over groups of processing units. ✦ CDP Model: has two non-semantic routes, a sublexical assembly route (2-layer net) and a lexical route (3-layer). ✦ LEX Model: to retrieve from semantic memory, it consists of only a single routine, and three components: letter identification, retrieval, and response generation. ✦ DRC Model: two non-semantic routes: lexical and non-lexical. Non-lexical route converts grapheme to phoneme by a set of rules. 11 Roberts et al. (2003)
  11. Other Models 12 Model Style Task Phenomena Models of visual

    word recognition IA IA PI Word-superiority effect Multiple read-out IA PI, LD Word-superiority effect SCM IA LD, MP Letter order BR Math/comp LD, MP Word frequency, letter order, RT distribution LTRS Math/comp MP, PI Letter order Overlap Math/comp PI Letter order Diffusion model Math/comp LD RT distribution, word frequency SERIOL Math/comp LD, MP Letter order Models of reading aloud CDP++ Localist/symbolic RA Reading aloud DRC IA RA, LD Reading aloud Triangle Distributed connectionist RA Reading aloud Sequence encoder Distributed connectionist RA Reading aloud Junction model Distributed connectionist RA Reading aloud Models of eye-movement control in reading E-Z reader Symbolic R Eye movements SWIFT Symbolic R Eye movements Model of morphology Amorphous discriminative learning [16] Symbolic network Self-paced reading, LD Morphology Norris (2013)
  12. Orthographic Neighbors ✦ Reader must accumulate enough evidence to distinguish

    the word from perceptually similar words (lexical neighbors). ✦ Lexical Competition: perceptually similar words must compete with each other for recognition. ✦ Coltheart’s N: words of equal lengths are neighbors and distance is the number of substitutions. ✦ Levenshtein Distance: number of edits (insertion, deletion, and substitution). ✦ OLD20: average edit distance of the 20 nearest neighbors. 13
  13. Megastudies ✦ Large-scale databases, containing thousands of linguistic decisions and

    words in English, Dutch, French, and British English (e.g. naming, lexical decisions, and eye movements). ✦ Experiments: word frequency, regularity, feedforward consistency, age of acquisition, polysemy, and facilitatory effect of neighborhood density. ✦ 61% of variance can be described by frequency, letter and syllable length, neighborhood density and spelling-to-sound consistency. ✦ 0.7 correlation between megastudies and earlier small-scale human studies (Current models correlation is 0.6). ✦ Most powerful determinant of lexical decision and naming speed is logarithm of the word’s frequency of occurrence in the language. 14
  14. Neuroscience When & Where ✦ MEG and EEG are time-sensitive

    methods with milliseconds resolution; reveal temporal order of neural processes and continuous measure of the intermediate events. ✦ LL N150/N170: differentiate words and pseudowords from other orthographic stimuli (e.g. symbols). - In left inferior occipitotemporal cortex (VWFA). - Response to the frequency of letter combinations, and lexical/phonological effects come into play much later. ✦ N250: modulated by orthographic similarity and lexical factors (letter identity and constant/vowel). 15 Carreiras et al. (2014)
  15. Bilingual Word Recognition In Isolation ✦ Language-non-selective access: both languages

    are activated when reading in one of them. Lexical representation from both languages are activated in parallel. ✦ Cognate facilitation effect: bilinguals respond more quickly to cognates than non-cognates. ✦ L2 cognate words also activate L1 lexical representations (with similar semantic). 16
  16. Bilingual Word Recognition Factors ✦ Language proficiency modules the baseline

    level of activation for both languages. ✦ Age of acquisition improves the strength of connections between two languages. ✦ Form and meaning overlaps increase activation. 17
  17. Bilingual Word Recognition In Sentence Context ✦ High-constraint sentences reduce

    the number of possible word candidates, and consequently eliminate cross-language competition and restrict parallel activation via top-down influence. ✦ Context in VWP greatly reduces
 fixations to between-language
 cognates. ✦ Semantically incongruent words
 enhance N400. ✦ Initially congruent, but semantically
 incongruent words delay N400. ✦ Bidirectional feedbacks from L1
 and L2 in BIA+ model take context
 into consideration. 19 BLINCS Model Gaskell & Mirkovic (2017)
  18. Bilingual Word Recognition Phonological/Lexical Similarity ✦ Cognates share similar form

    and meaning in L1 and L2. ✦ Phonological Similarity means similar sound of words in L1 and L2. ✦ Both ambiguities (ˢ) increase non-target language activation. ✦ Homographs and homophones share visual/auditory form, but do not share meaning (bottom-up ambiguity). ✦ Homographs cause longer reaction time in L1/L2 lexical decision tasks (bilinguals access both meanings). 20
  19. Bilingual Word Recognition BIA+ Model 21 ✦ Bilingual lexicon is

    integrated across
 languages and is accessed in a
 language non-selective way. ✦ bilingual word recognition is
 affected not only by cross-linguistic
 orthographic similarity effects,
 but also by cross-linguistic
 phonological and semantic
 overlap.
  20. Summary ✦ Connectionist models of W.R. accumulate evidence over time,

    make words compete with their neighbors, and recognition result is distributed over a network of nodes. ✦ Frequency of a word is by far the most important factor to recognize it. ✦ In bilinguals, both languages are activated simultaneously, but proficiency and age of acquisition affects activation level. ✦ Semantic restricts L1/L2 interaction (via top-down control), but does not stop language-non-selective access. 22 Thank You :-)