Slide 1

Slide 1 text

1 June 2016 A Quantitative Synthesis of Language Development Using Meta-Analysis Molly Lewis In collaboration with Mika Braginsky, Christina Bergmann, Sho Tsuji, Page Piccinini, Alex Cristia, and Michael C. Frank

Slide 2

Slide 2 text

Our goal: Build predictive, explanatory theories aaaaa oooooo Limited data “dog” “dog” /pragmatics/

Slide 3

Slide 3 text

Ideally, what would our data look like? Veridical description of behavior. e.g., If we think kids of a certain age can discriminate vowels, they actually can. High fidelity. e.g., How good are kids at discriminating vowels? How does this skill change across development? How does the difficulty of this skill compare to other skills?

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Sources of bias Low power (failure to detect real effect) (e.g., Button, et al., 2013) Publication bias (“file drawer problem”) (e.g., Rosenthal, 1979) Analytical flexibility (“p-hacking”) (e.g., Simmons, Nelson, & Simonsohn, 2011) Particularly problematic for language development research (small Ns and effects)

Slide 6

Slide 6 text

Kuhl (2004) Current descriptions of behavior

Slide 7

Slide 7 text

1) Categorical description 2) Lack of variability 3) Cross-domain comparisons difficult success 0 1 2 3 4 5 6 Time Limited fidelity of current descriptions

Slide 8

Slide 8 text

Meta-analysis as a solution Effect size as unit of analysis: Quantitative, scale-free measure of “success” vs. 0.25 0.50 0.75 1.00 Prop. trials fixating novel object 0.00 0.25 0.50 0.75 1.00 Prop. trials fixating novel object chance d Observed mean

Slide 9

Slide 9 text

Meta-analysis as a solution To combine studies: – Treat each study as an sample effect size from a population of studies – Aggregating using quantitative methods (e.g. averaging) – Get point estimate of the true effect size with measure of certainty More precise estimate of effect size than from single study.

Slide 10

Slide 10 text

Meta-analysis supports theory building Veracity: Method for identifying signatures of bias and improving replicability Fidelity: Method for obtaining quantitative descriptions, and comparing across phenomena

Slide 11

Slide 11 text

Aggregates meta-analyses across phenomena in language development Publicly available [metalab.stanford.edu] Summary visualizations of effect sizes, and power calculator Estimate effect sizes for particular phenomena, age, and method

Slide 12

Slide 12 text

Outline I. The MetaLab dataset II. Assess bias in the language development literature III. Toward a theoretical synthesis

Slide 13

Slide 13 text

Conducting a meta-analysis 1. Select phenomenon of interest 2. Select papers via sampling strategy 3. Code statistics reported in papers 4. Calculate effect sizes 5. Pool effect sizes across studies, weighting by sample size

Slide 14

Slide 14 text

0.00 0.25 0.50 0.75 1.00 Prop. trials fixating novel object 0.00 0.25 0.50 0.75 1.00 Prop. trials fixating novel object 0.00 0.25 0.50 0.75 1.00 Prop. trials fixating novel object 0.00 0.25 0.50 0.75 1.00 Prop. trials fixating novel object Example: Mutual exclusivity meta-analysis Where’s the dofa? Bion, et al. (2013) For 24 mo, mean proportion of trials fixating on novel object = .65 (SD = .13) chance .65 .13 d

Slide 15

Slide 15 text

Pool effect sizes across studies, weighting by sample size Grand effect size −1.00 1.00 2.00 3.00 Effect size estimate 8. spiegel 7. markman 6. grassman 5. grassman 4. byers 3. bion 2. bion 1. bion 2011 1988 2010 2010 2009 2013 2013 2013 30 45 48 24 17 30 24 18 72 10 12 12 16 20 25 22 First author Year Age (m.) N Example: Mutual exclusivity meta-analysis Grand effect size estimate

Slide 16

Slide 16 text

Phenomena in MetaLab Prosody Communication Sounds Words

Slide 17

Slide 17 text

Overall effect sizes Random effect models using metafor R package (Viechtbauer, 2010) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Pointing and vocabulary Gaze following Online word recognition Concept−label advantage Mutual exclusivity Word segmentation Statistical sound learning Vowel discrimination (non−native) Vowel discrimination (native) Phonotactic learning IDS preference 0 1 2 3 Effect Size Phenomenon

Slide 18

Slide 18 text

IDS preference Phonotactic learning Vowel discrimination (native) Vowel discrimination (non−native) Statistical sound learning Word segmentation Mutual exclusivity Concept−label advantage Online word recognition Gaze following Pointing and vocabulary −1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 Age (years) Effect size (d)

Slide 19

Slide 19 text

Outline I. The MetaLab dataset II. Assessing bias in the language development literature III. Towards a theoretical synthesis

Slide 20

Slide 20 text

Detecting bias through meta-analysis Some expected variability in effect size due to sample size – need to distinguish this from bias 0.00 0.25 0.50 0.75 1.00 0.4 0.6 0.8 1.0 chance d x x Study 1 Study 2 x Study 3 N = 100 0.4 0.6 0.8 1.0 N = 12

Slide 21

Slide 21 text

Assessing publication bias How many ``missing” studies would have to exist in order for the overall effect size to be zero? Fail-Safe-N (Orwin, 1983)

Slide 22

Slide 22 text

Pointing and vocabulary Gaze following Online word recognition Concept−label advantage Mutual exclusivity Word segmentation Vowel discrimination (non−native) Vowel discrimination (native) IDS preference 0 2500 5000 7500 10000 Fail−Safe−N Phenomenon Fail-Safe-N M = 3914

Slide 23

Slide 23 text

0.0 0.2 0.4 0.6 0.01 0.02 0.03 0.04 0.05 p−value Proportion p−values Baseline (null is true) 0.0 0.2 0.4 0.6 0.01 0.02 0.03 0.04 0.05 p−value Proportion p−values Observed – Evidential value 0.0 0.2 0.4 0.6 0.01 0.02 0.03 0.04 0.05 p−value Proportion p−values Observed – p-hacked Assessing analytical bias “P-curve”: Distribution of p-values of a test statistic across a literature (Simonsohn, Nelson, & Simmons, 2014; Simonsohn et al., 2014; Simonsohn, Simmons, & Nelson, 2015)

Slide 24

Slide 24 text

Phonotactic learning Vowel discrimination (native) Vowel discrimination (non−native) Word segmentation Mutual exclusivity Concept−label advantage 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.01 0.02 0.03 0.04 0.05 0.01 0.02 0.03 0.04 0.05 0.01 0.02 0.03 0.04 0.05 p−value proportion p−values P-curves Baseline (Null) Observed

Slide 25

Slide 25 text

Null (H0 ) H1 critical value α β 1-β Meta-analysis helps maximize power for prospective studies = probability of rejecting a false null hypothesis

Slide 26

Slide 26 text

Improving power Increase N Increase d (Krzywinski & Altman, 2003) Null (H0 ) H1

Slide 27

Slide 27 text

Increasing d through method choice Method choices: Conditioned head-turn Forced-choice by pointing High-amplitude sucking Head-turn preference procedure Central fixation Looking-while-listening Anticipatory eye movements “behavior” “eye-tracking”

Slide 28

Slide 28 text

Increasing d through method choice ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.25 0.00 0.25 0.50 0.75 behavior eye−tracking response mode residualized effect size IDS preference Gaze following Word recognition Mutual exclusivity Sound category learning Vowel discrimination (native) Vowel discrimination (non-native) Word segmentation Concept-label advantage

Slide 29

Slide 29 text

Discussion Fail-Safe-N suggests in most cases there would have to be large number of missing studies for effects to be 0 P-curves do not suggest any evidence of p-hacking In sum: Literature is veridical and thus should form the basis for theory-building Use effect sizes to plan sample sizes prospectively, in order to increase power

Slide 30

Slide 30 text

Outline I. The MetaLab dataset II. Assessing bias in the language development literature III. Toward a theoretical synthesis

Slide 31

Slide 31 text

Theories of language development Stages hypothesis Continuous, synergistic hypothesis – Infants learn phonetic contrasts when supported by word context (Feldman, et al., 2013) – Infants learn word mappings when supported by prosody (Shukla, White, & Aslin, 2011) Linguistic Hierarchy

Slide 32

Slide 32 text

Theories of language acquisition: Hypothesis space effect size age effect size age effect size age effect size age “Stages” Hypothesis “Synergistic” Hypothesis

Slide 33

Slide 33 text

−1 0 1 2 3 0 1 2 3 Age (years) Effect size (d) n 25 50 75 Gaze following Infant directed speech preference Label advantage in concept learning Mutual exclusivity Online word recognition Phonotactic learning Statistical sound category learning Vowel discrimination (native) Vowel discrimination (non−native) Word segmentation

Slide 34

Slide 34 text

Evidence for continuous development across the language hierarchy FIX THIS prosody sounds words communication 0.0 0.5 1.0 1.5 2.0 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 Age (years) Effect size (d) dataset Gaze following Infant directed speech preference Label advantage in concept learning Mutual exclusivity Online word recognition Phonotactic learning Statistical sound category learning Vowel discrimination (native) Vowel discrimination (non−native) Word segmentation IDS preference Gaze following Word recognition Mutual exclusivity Phonotactic learning Sound category learning Vowel discrimination (native) Vowel discrimination (non-native) Word segmentation Concept-label advantage

Slide 35

Slide 35 text

prosody sounds words communication 0 1 2 3 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 Age (years) Effect size (d) response_mode behavior EEG eye−tracking NIRS other dataset Gaze following Infant directed speech preference Label advantage in concept learning Mutual exclusivity Online word recognition Phonotactic learning Statistical sound category learning Vowel discrimination (native) Vowel discrimination (non−native) Word segmentation Evidence for continuous development across the language hierarchy IDS preference Gaze following Word recognition Mutual exclusivity Phonotactic learning Sound category learning Vowel discrimination (native) Vowel discrimination (non-native) Word segmentation Concept-label advantage

Slide 36

Slide 36 text

Limitations Publication bias Magnitude of effect may be related to method Non-representativeness of participant populations Limited number of similar studies in some domains

Slide 37

Slide 37 text

Conclusion Veracity: – Some bias, but evidential value – Use ES to calculate power, reduce bias prospectively Fidelity: – Comparisons across phenomena – Build synthetic, precise theories

Slide 38

Slide 38 text

Toward a quantitative synthesis −1 0 1 2 3 0 1 2 3 Age (years) Effect size (d) n 25 50 75 Gaze following Infant directed speech preference Label advantage in concept learning Mutual exclusivity Online word recognition Phonotactic learning Statistical sound category learning Vowel discrimination (native) Vowel discrimination (non−native) Word segmentation

Slide 39

Slide 39 text

Thanks! metalab.stanford.edu Kyle MacDonald, Bria Long (Harvard University)