Slide 1

Slide 1 text

Anticonstitutionnellement ηλεκτροεγκεφαλογραφήματος Antidisestablishmentarianism Pneumonoultramicroscopicsilicovolcanoconiosis Floccinaucinihilipilification Intergouvernementalisations Donaudampfschiffahrtselektrizitätenhauptbetriebswerk- bauunterbeamtengesellschaft וניתוידפולקיצנאלשכו ﺎﻫﻮﻤﻛﺎﻨﻴﻘﺴﺘﺳﺎﻓأ Արևաճաճանչաերկրափայլատակություն ηλεκτροεγκεφαλογράφημα Bundespräsidentenstichwahlwiederholungsverschiebung Morteza Ansarinia Institute for Cognitive Science Studies December 15, 2016 Word Recognition نارﺎﻤﺷراﻮﺨﻧﺎﯾاﺮﮕﯾروﺎﺑﺎﻨﺘﺴﻴﻨﻧادﺰﻳ

Slide 2

Slide 2 text

Outline ✦ Intro ✦ Computational/ Connectionist Models - Spatial Coding Model - Letters in Time and Retinotopic Space - Bayesian Reader Model - Word2vec - Megastudies - Evidence from Neuroscience ✦ Bilingual Word Recognition - Influencing Factors and Interactions - In Isolation, and in Sentence Context ✦ MEG/EEG Studies 2

Slide 3

Slide 3 text

Word Recognition ✦ Goal: to understand the kinds of capacities that underlie the rapid and almost effortless comprehension of words in reading, how we acquire words, and respective impairments. ✦ Symbolic (Modular, or box and arrow) versus connectionist. ✦ Recognition speed is related to the type of text, attention, and reader’s skill. ✦ Perceptual span: -4 to 15 letters away from fixation point. ✦ Dual-route theory: rule-governed words and exceptions. 3

Slide 4

Slide 4 text

Word Recognition ✦ Subwords (e.g. syllables, morphemes, etc) in connectionist models are represented as inter-level morphological representation. ✦ meaning = word + context - CAT + PETTING = FUR - CAT + SCRATCHING = CLAWS 4

Slide 5

Slide 5 text

Dyslexia ✦ Developmental dyslexia causes impairment in word recognition, or delayed phonological impairment between written and spoken content. ✦ Phonological: pronouncing novel words. ✦ Surface: reading irregular words. ✦ Deep: semantic paraphrasing. 5

Slide 6

Slide 6 text

Spatial Coding Model ✦ Similar to IA (identity, order, and learned codes) but processes words of varying lengths and simulate masked priming. ✦ Spatial coding: order of letters is represented by an activation gradient over letter positions. ✦ Superposition matching rule is relatively insensitive to exactly where words begin in the input, and tolerates minor changes in the
 relative position of letters
 (superposition). 6 Letter order in STOP Norris (2013)

Slide 7

Slide 7 text

Letters in Time and Retinotopic Space ✦ Information about letter identity and order accumulates stochastically over time. ✦ Developed to account for perceptual identification and masked priming. ✦ No specific assumptions about the precise form of representations (words, bigrams, trigrams, letters). ✦ Open Bigrams: WO, WR, WD, OR, … for WORD. ✦ JU*GE primes JUDGE, but not JUDPE. 7

Slide 8

Slide 8 text

Bayesian Reader Model ✦ how much can be explained simply by assuming that readers make near-optimal decisions based on the accumulation of noisy evidence. ✦ Identifies word based on fewest number of samples. ✦ Letters are represented as vectors describing coordinates in a multidimensional space.
 The dimensions could be considered to correspond to letter features and positional information. ✦ Models calculate P(word|evidence) by P(word) and P(evidence|word).
 A word can be identified when P(w|e) exceeds some predetermined threshold. 8

Slide 9

Slide 9 text

Word2vec ✦ Unsupervised generic two-layer neural net, with text corpus as input and features vectors for words according to that corpus. ✦ Its purpose is to group vectors of similar words
 in a 500-dimensional vectorspace. ✦ Results: - king:queen::man:woman - Trump:Republican::Obama:Democratic - monkey:human::dinosaur:fossil - knee:leg::elbow:forearm 9

Slide 10

Slide 10 text

Word2vec 10

Slide 11

Slide 11 text

Other Models ✦ PDP Model: two routes for translating orthography to phonology: one is via meaning, the other via hidden units. Information is represented in distributed patterns of activation over groups of processing units. ✦ CDP Model: has two non-semantic routes, a sublexical assembly route (2-layer net) and a lexical route (3-layer). ✦ LEX Model: to retrieve from semantic memory, it consists of only a single routine, and three components: letter identification, retrieval, and response generation. ✦ DRC Model: two non-semantic routes: lexical and non-lexical. Non-lexical route converts grapheme to phoneme by a set of rules. 11 Roberts et al. (2003)

Slide 12

Slide 12 text

Other Models 12 Model Style Task Phenomena Models of visual word recognition IA IA PI Word-superiority effect Multiple read-out IA PI, LD Word-superiority effect SCM IA LD, MP Letter order BR Math/comp LD, MP Word frequency, letter order, RT distribution LTRS Math/comp MP, PI Letter order Overlap Math/comp PI Letter order Diffusion model Math/comp LD RT distribution, word frequency SERIOL Math/comp LD, MP Letter order Models of reading aloud CDP++ Localist/symbolic RA Reading aloud DRC IA RA, LD Reading aloud Triangle Distributed connectionist RA Reading aloud Sequence encoder Distributed connectionist RA Reading aloud Junction model Distributed connectionist RA Reading aloud Models of eye-movement control in reading E-Z reader Symbolic R Eye movements SWIFT Symbolic R Eye movements Model of morphology Amorphous discriminative learning [16] Symbolic network Self-paced reading, LD Morphology Norris (2013)

Slide 13

Slide 13 text

Orthographic Neighbors ✦ Reader must accumulate enough evidence to distinguish the word from perceptually similar words (lexical neighbors). ✦ Lexical Competition: perceptually similar words must compete with each other for recognition. ✦ Coltheart’s N: words of equal lengths are neighbors and distance is the number of substitutions. ✦ Levenshtein Distance: number of edits (insertion, deletion, and substitution). ✦ OLD20: average edit distance of the 20 nearest neighbors. 13

Slide 14

Slide 14 text

Megastudies ✦ Large-scale databases, containing thousands of linguistic decisions and words in English, Dutch, French, and British English (e.g. naming, lexical decisions, and eye movements). ✦ Experiments: word frequency, regularity, feedforward consistency, age of acquisition, polysemy, and facilitatory effect of neighborhood density. ✦ 61% of variance can be described by frequency, letter and syllable length, neighborhood density and spelling-to-sound consistency. ✦ 0.7 correlation between megastudies and earlier small-scale human studies (Current models correlation is 0.6). ✦ Most powerful determinant of lexical decision and naming speed is logarithm of the word’s frequency of occurrence in the language. 14

Slide 15

Slide 15 text

Neuroscience When & Where ✦ MEG and EEG are time-sensitive methods with milliseconds resolution; reveal temporal order of neural processes and continuous measure of the intermediate events. ✦ LL N150/N170: differentiate words and pseudowords from other orthographic stimuli (e.g. symbols). - In left inferior occipitotemporal cortex (VWFA). - Response to the frequency of letter combinations, and lexical/phonological effects come into play much later. ✦ N250: modulated by orthographic similarity and lexical factors (letter identity and constant/vowel). 15 Carreiras et al. (2014)

Slide 16

Slide 16 text

Bilingual Word Recognition In Isolation ✦ Language-non-selective access: both languages are activated when reading in one of them. Lexical representation from both languages are activated in parallel. ✦ Cognate facilitation effect: bilinguals respond more quickly to cognates than non-cognates. ✦ L2 cognate words also activate L1 lexical representations (with similar semantic). 16

Slide 17

Slide 17 text

Bilingual Word Recognition Factors ✦ Language proficiency modules the baseline level of activation for both languages. ✦ Age of acquisition improves the strength of connections between two languages. ✦ Form and meaning overlaps increase activation. 17

Slide 18

Slide 18 text

Bilingual Word Recognition Visual World Paradigm 18 Gaskell & Mirkovic (2017)

Slide 19

Slide 19 text

Bilingual Word Recognition In Sentence Context ✦ High-constraint sentences reduce the number of possible word candidates, and consequently eliminate cross-language competition and restrict parallel activation via top-down influence. ✦ Context in VWP greatly reduces
 fixations to between-language
 cognates. ✦ Semantically incongruent words
 enhance N400. ✦ Initially congruent, but semantically
 incongruent words delay N400. ✦ Bidirectional feedbacks from L1
 and L2 in BIA+ model take context
 into consideration. 19 BLINCS Model Gaskell & Mirkovic (2017)

Slide 20

Slide 20 text

Bilingual Word Recognition Phonological/Lexical Similarity ✦ Cognates share similar form and meaning in L1 and L2. ✦ Phonological Similarity means similar sound of words in L1 and L2. ✦ Both ambiguities (ˢ) increase non-target language activation. ✦ Homographs and homophones share visual/auditory form, but do not share meaning (bottom-up ambiguity). ✦ Homographs cause longer reaction time in L1/L2 lexical decision tasks (bilinguals access both meanings). 20

Slide 21

Slide 21 text

Bilingual Word Recognition BIA+ Model 21 ✦ Bilingual lexicon is integrated across
 languages and is accessed in a
 language non-selective way. ✦ bilingual word recognition is
 affected not only by cross-linguistic
 orthographic similarity effects,
 but also by cross-linguistic
 phonological and semantic
 overlap.

Slide 22

Slide 22 text

Summary ✦ Connectionist models of W.R. accumulate evidence over time, make words compete with their neighbors, and recognition result is distributed over a network of nodes. ✦ Frequency of a word is by far the most important factor to recognize it. ✦ In bilinguals, both languages are activated simultaneously, but proficiency and age of acquisition affects activation level. ✦ Semantic restricts L1/L2 interaction (via top-down control), but does not stop language-non-selective access. 22 Thank You :-)