Metalab

1 June 2016 A Quantitative Synthesis of Language Development Using
Meta-Analysis Molly Lewis In collaboration with Mika Braginsky, Christina Bergmann, Sho Tsuji, Page Piccinini, Alex Cristia, and Michael C. Frank

Our goal: Build predictive, explanatory theories aaaaa oooooo Limited data
“dog” “dog” /pragmatics/

Ideally, what would our data look like? Veridical description of
behavior. e.g., If we think kids of a certain age can discriminate vowels, they actually can. High fidelity. e.g., How good are kids at discriminating vowels? How does this skill change across development? How does the difficulty of this skill compare to other skills?

Sources of bias Low power (failure to detect real effect)
(e.g., Button, et al., 2013) Publication bias (“file drawer problem”) (e.g., Rosenthal, 1979) Analytical flexibility (“p-hacking”) (e.g., Simmons, Nelson, & Simonsohn, 2011) Particularly problematic for language development research (small Ns and effects)

Kuhl (2004) Current descriptions of behavior

1) Categorical description 2) Lack of variability 3) Cross-domain comparisons
difficult success 0 1 2 3 4 5 6 Time Limited fidelity of current descriptions

Meta-analysis as a solution Effect size as unit of analysis:
Quantitative, scale-free measure of “success” vs. 0.25 0.50 0.75 1.00 Prop. trials fixating novel object 0.00 0.25 0.50 0.75 1.00 Prop. trials fixating novel object chance d Observed mean

Meta-analysis as a solution To combine studies: – Treat each
study as an sample effect size from a population of studies – Aggregating using quantitative methods (e.g. averaging) – Get point estimate of the true effect size with measure of certainty More precise estimate of effect size than from single study.

Meta-analysis supports theory building Veracity: Method for identifying signatures of
bias and improving replicability Fidelity: Method for obtaining quantitative descriptions, and comparing across phenomena

Aggregates meta-analyses across phenomena in language development Publicly available [metalab.stanford.edu]
Summary visualizations of effect sizes, and power calculator Estimate effect sizes for particular phenomena, age, and method

Outline I. The MetaLab dataset II. Assess bias in the
language development literature III. Toward a theoretical synthesis

Conducting a meta-analysis 1. Select phenomenon of interest 2. Select
papers via sampling strategy 3. Code statistics reported in papers 4. Calculate effect sizes 5. Pool effect sizes across studies, weighting by sample size

0.00 0.25 0.50 0.75 1.00 Prop. trials fixating novel object
0.00 0.25 0.50 0.75 1.00 Prop. trials fixating novel object 0.00 0.25 0.50 0.75 1.00 Prop. trials fixating novel object 0.00 0.25 0.50 0.75 1.00 Prop. trials fixating novel object Example: Mutual exclusivity meta-analysis Where’s the dofa? Bion, et al. (2013) For 24 mo, mean proportion of trials fixating on novel object = .65 (SD = .13) chance .65 .13 d

Pool effect sizes across studies, weighting by sample size Grand
effect size −1.00 1.00 2.00 3.00 Effect size estimate 8. spiegel 7. markman 6. grassman 5. grassman 4. byers 3. bion 2. bion 1. bion 2011 1988 2010 2010 2009 2013 2013 2013 30 45 48 24 17 30 24 18 72 10 12 12 16 20 25 22 First author Year Age (m.) N Example: Mutual exclusivity meta-analysis Grand effect size estimate

Phenomena in MetaLab Prosody Communication Sounds Words

Overall effect sizes Random effect models using metafor R package
(Viechtbauer, 2010) • • • • • • • • • • • • • • • • • • • • • • Pointing and vocabulary Gaze following Online word recognition Concept−label advantage Mutual exclusivity Word segmentation Statistical sound learning Vowel discrimination (non−native) Vowel discrimination (native) Phonotactic learning IDS preference 0 1 2 3 Effect Size Phenomenon

IDS preference Phonotactic learning Vowel discrimination (native) Vowel discrimination (non−native)
Statistical sound learning Word segmentation Mutual exclusivity Concept−label advantage Online word recognition Gaze following Pointing and vocabulary −1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 Age (years) Effect size (d)

Outline I. The MetaLab dataset II. Assessing bias in the
language development literature III. Towards a theoretical synthesis

Detecting bias through meta-analysis Some expected variability in effect size
due to sample size – need to distinguish this from bias 0.00 0.25 0.50 0.75 1.00 0.4 0.6 0.8 1.0 chance d x x Study 1 Study 2 x Study 3 N = 100 0.4 0.6 0.8 1.0 N = 12

Assessing publication bias How many ``missing” studies would have to
exist in order for the overall effect size to be zero? Fail-Safe-N (Orwin, 1983)

Pointing and vocabulary Gaze following Online word recognition Concept−label advantage
Mutual exclusivity Word segmentation Vowel discrimination (non−native) Vowel discrimination (native) IDS preference 0 2500 5000 7500 10000 Fail−Safe−N Phenomenon Fail-Safe-N M = 3914

0.0 0.2 0.4 0.6 0.01 0.02 0.03 0.04 0.05 p−value
Proportion p−values Baseline (null is true) 0.0 0.2 0.4 0.6 0.01 0.02 0.03 0.04 0.05 p−value Proportion p−values Observed – Evidential value 0.0 0.2 0.4 0.6 0.01 0.02 0.03 0.04 0.05 p−value Proportion p−values Observed – p-hacked Assessing analytical bias “P-curve”: Distribution of p-values of a test statistic across a literature (Simonsohn, Nelson, & Simmons, 2014; Simonsohn et al., 2014; Simonsohn, Simmons, & Nelson, 2015)

Phonotactic learning Vowel discrimination (native) Vowel discrimination (non−native) Word segmentation
Mutual exclusivity Concept−label advantage 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.01 0.02 0.03 0.04 0.05 0.01 0.02 0.03 0.04 0.05 0.01 0.02 0.03 0.04 0.05 p−value proportion p−values P-curves Baseline (Null) Observed

Null (H0 ) H1 critical value α β 1-β Meta-analysis
helps maximize power for prospective studies = probability of rejecting a false null hypothesis

Improving power Increase N Increase d (Krzywinski & Altman, 2003)
Null (H0 ) H1

Increasing d through method choice Method choices: Conditioned head-turn Forced-choice
by pointing High-amplitude sucking Head-turn preference procedure Central fixation Looking-while-listening Anticipatory eye movements “behavior” “eye-tracking”

Increasing d through method choice • • • • •
• • • • • • • • • • • −0.25 0.00 0.25 0.50 0.75 behavior eye−tracking response mode residualized effect size IDS preference Gaze following Word recognition Mutual exclusivity Sound category learning Vowel discrimination (native) Vowel discrimination (non-native) Word segmentation Concept-label advantage

Discussion Fail-Safe-N suggests in most cases there would have to
be large number of missing studies for effects to be 0 P-curves do not suggest any evidence of p-hacking In sum: Literature is veridical and thus should form the basis for theory-building Use effect sizes to plan sample sizes prospectively, in order to increase power

Outline I. The MetaLab dataset II. Assessing bias in the
language development literature III. Toward a theoretical synthesis

Theories of language development Stages hypothesis Continuous, synergistic hypothesis –
Infants learn phonetic contrasts when supported by word context (Feldman, et al., 2013) – Infants learn word mappings when supported by prosody (Shukla, White, & Aslin, 2011) Linguistic Hierarchy

Theories of language acquisition: Hypothesis space effect size age effect
size age effect size age effect size age “Stages” Hypothesis “Synergistic” Hypothesis

−1 0 1 2 3 0 1 2 3 Age
(years) Effect size (d) n 25 50 75 Gaze following Infant directed speech preference Label advantage in concept learning Mutual exclusivity Online word recognition Phonotactic learning Statistical sound category learning Vowel discrimination (native) Vowel discrimination (non−native) Word segmentation

Evidence for continuous development across the language hierarchy FIX THIS
prosody sounds words communication 0.0 0.5 1.0 1.5 2.0 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 Age (years) Effect size (d) dataset Gaze following Infant directed speech preference Label advantage in concept learning Mutual exclusivity Online word recognition Phonotactic learning Statistical sound category learning Vowel discrimination (native) Vowel discrimination (non−native) Word segmentation IDS preference Gaze following Word recognition Mutual exclusivity Phonotactic learning Sound category learning Vowel discrimination (native) Vowel discrimination (non-native) Word segmentation Concept-label advantage

prosody sounds words communication 0 1 2 3 0 1
2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 Age (years) Effect size (d) response_mode behavior EEG eye−tracking NIRS other dataset Gaze following Infant directed speech preference Label advantage in concept learning Mutual exclusivity Online word recognition Phonotactic learning Statistical sound category learning Vowel discrimination (native) Vowel discrimination (non−native) Word segmentation Evidence for continuous development across the language hierarchy IDS preference Gaze following Word recognition Mutual exclusivity Phonotactic learning Sound category learning Vowel discrimination (native) Vowel discrimination (non-native) Word segmentation Concept-label advantage

Limitations Publication bias Magnitude of effect may be related to
method Non-representativeness of participant populations Limited number of similar studies in some domains

Conclusion Veracity: – Some bias, but evidential value – Use
ES to calculate power, reduce bias prospectively Fidelity: – Comparisons across phenomena – Build synthetic, precise theories

Toward a quantitative synthesis −1 0 1 2 3 0
1 2 3 Age (years) Effect size (d) n 25 50 75 Gaze following Infant directed speech preference Label advantage in concept learning Mutual exclusivity Online word recognition Phonotactic learning Statistical sound category learning Vowel discrimination (native) Vowel discrimination (non−native) Word segmentation

Thanks! metalab.stanford.edu Kyle MacDonald, Bria Long (Harvard University)

Metalab

Metalab

More Decks by mllewis

Other Decks in Science

Featured

Transcript