Intro to the Meta-Analytic Method

Intro to the Meta-Analytic Method 7 December 2020 Graduate Research
Methods Molly Lewis Carnegie Mellon University

What are the long term consequences of getting covid-19?

Source: Moll & Tomasello, 2010 Dependent measure: Looking time to
checkerboard Independent variable: ADS vs. IDS played in pairs of trials within subjects Cooper & Aslin (1990) Do infants prefer IDS to ADS?

ManyBabies (2020) • Multi-lab effort to replicate IDS preference •
Each lab conducted their own replication of Cooper & Aslin (1990), with standardization of the paradigm across labs • 68 labs, 2773 babies!

Many estimates of the size of an effect across many
repeated experiments. Preference for IDS

How do we summarize this pattern? ”The Madison Lab replicated
the finding that infants prefer infant directed speech, while the other five labs did not.” That throws out a lot of information!!

Summarizing literatures is a more general challenge in psychology Psychological
literatures are almost always conflicting Qualitative literature reviews are: • not very precise • difficult when there are many studies (Tsuji, et al. 2014)

Meta-analysis A quantitative approach to summarizing results across studies Meta-analytic
effect size estimate

History of meta-analysis • Mid-1970s: many studies had accumulated that
were important to social decision policies • e.g. do students learn more when class sizes are smaller? • Research findings were conflicting, implications unclear -> difficult to get funding • Glass (1976): Research findings were not as conflicting as appeared • Using meta-analysis, reveals cumulative patterns • The first “big data” (Gurevitch, et al. 2018)

Why do a meta-analysis? 1. Summarize what has been done
in a literature 2. Theory development – compare strength of different effects and moderating factors 3. Evaluate bias in literature (e.g. publication bias) 4. Estimate an effect size so you can determine a sample N = ? (Kuhl, 2004)

How are meta-analyses presented? 1. Stand-alone publications (~ literature review)
2. As part of an empirical paper (meta-analysis + new experiments) 3. Within-paper meta- analysis (“mini-meta- analysis”) (Lewis & Frank, 2016)

Plan for today An example meta-analysis: mutual exclusivity (Lewis, et
al., 2020) Metalab Conducting your own meta- analysis

An example meta-analysis: Mutual exclusivity (Lewis, et al., 2020) (Markman
& Wachtel, 1988)

Open questions • How big is the mutual exclusivity effect?
• Does this effect have evidential value? • How robust is the effect to methodological variability? • What does the developmental trajectory effect look like? • What leads to developmental change?

Conducting a MA of the mutual exclusivity literature An example
paper from this literature:

0.00 0.25 0.50 0.75 1.00 Prop. trials fixating novel object
0.00 0.25 0.50 0.75 1.00 Prop. trials fixating novel object 0.00 0.25 0.50 0.75 1.00 Prop. trials fixating novel object 0.00 0.25 0.50 0.75 1.00 Prop. trials fixating novel object Extracting effect sizes from a paper Where’s the dofa? chance .65 .13 d Bion, et al. (2013) For 24 mo, mean proportion of trials fixating on novel object = .65 (SD = .13)

Grand effect size −1.00 1.00 2.00 3.00 Effect size estimate
8. spiegel 7. markman 6. grassman 5. grassman 4. byers 3. bion 2. bion 1. bion 2011 1988 2010 2010 2009 2013 2013 2013 30 45 48 24 17 30 24 18 72 10 12 12 16 20 25 22 First author Year Age (m.) N (mini) mutual exclusivity meta-analysis Grand effect size estimate 1 “Forest plot” Pool effect sizes across studies, weighting by sample size

Moderators = anything you think might influence the effect size
• Age • Vocabulary size • Population type • Design type • (stimuli type)

Results • Coded 146 effect sizes from 48 papers •
Aggregate effect size was d = 1.27 [0.99, 1.55]

Moderators

Lewis et al. Summary • Effect is robust and large
• Evidence for developmental change, and some evidence it is related to experience • Difficult to make causal claims about the source of this developmental change – that’s the goal of the subsequent experiments in the paper • How does this effect compare to other effects in language acquisition/cognitive development??

al., 2020) Metalab

• Open source, aggregation of meta-analyses of 29 different phenomena
in cognitive development (focus on language acquisition) • Interactive visualizations • In the process of developing an R package to access data (metalabR) • http://metalab.stanford.edu/ (Lewis et al., 2016; Bergmann, et al., 2018)

Concept−label advantage Online word recognition Gaze following Pointing and vocabulary
Statistical sound learning Word segmentation Mutual exclusivity Sound symbolism IDS preference Phonotactic learning Vowel discrimination (native) Vowel discrimination (non−native) 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3 Age (years) Effect size (d) (Lewis et al., 2016)

Theories of language development “Stages” hypothesis “Interactive” hypothesis • Infants
learn phonetic contrasts when supported by word context (Feldman, et al., 2013) • Infants learn word mappings when supported by prosody (Shukla, White, & Aslin, 2011) Linguistic Hierarchy (Lewis et al., 2016)

Theories of language acquisition Stages Interactive Ad hoc Observed 0
1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 Age (years) 0HWKRGïUHVLGXDOLzed effect size WS GF IDS LA ME WR PV SS SSL 9'ï1 9'ï11 Hypotheses (Lewis et al., 2016)

Evidence for continuous development across the language hierarchy (Lewis et
al., 2016)

al., 2020) Metalab Conducting your own meta- analysis

Steps for conducting a meta-analysis 1. Identify phenomenon of interest
2. Literature search 3. Code data reported in papers 4. Calculate study-level effect sizes 5. Pool effect sizes across studies, weighting by sample size

Identifying phenomenon • Tradeoff between breadth and specificity • Too
broad -> comparing apples and oranges • Too narrow -> doesn’t answer question you care about, not many studies • Can be defined by a paradigm (as in mutual exclusivity) • Can start with seminal study • How many studies do you need?? • Answer: at least two • Aggregated evidence is more precise than individual studies • Within-paper meta-analyses sometimes contain only a few (~5 studies; Lewis & Frank, 2016)

2. Literature search

Define inclusion criteria • What studies are you going to
include in your MA? • Every MA is unique • These might change later on as you get to know your topic more • Criteria • Document type (e.g., All literature, journal papers, theses, proceedings papers) • Participants (e.g., adults vs. children) • Method (e.g., eye-tracking vs. pointing) • Stimuli (e.g., objects vs. pictures) • Reasons for exclusion: • not relevant • not empirical (no data) • doesn't satisfy inclusion criteria X

Define search protocol • Database search • Google scholar •
PubMed • … • Scanning references • Recent paper: Who does it cite? • Seminal paper: Who cites it? • Expert list • Direct request • Review paper (can be biased)

Enter results into spreadsheet • Read title and abstract •
Make inclusion exclusion decision • Process should be reproducible • [template]

The PRISMA statement • Standardized diagram for reporting paper selection
process for meta-analytic review • Describes 4 stages: Identification, Screening, Eligibility, Excluded

2. Literature search 3. Code data reported in papers

Code data reported in papers

Moderators • = anything you think might influence the effect
size (continuous or discrete) • Can be of theoretical or methodological interest • Specific to each MA (e.g., age, design, etc.) • Make a codebook for how you will enter each moderator

2. Literature search 3. Code data reported in papers 4. Calculate study-level effect sizes

Code data reported in papers

Standardized measure of the size of an effect Encodes magnitude
and direction of effect Cohen’s d: Cohen’s d diff. between means standard dev. Effect Size =

Interpreting Cohen’s d Size Description Cohen’s Intuition Psychological Example .2
“small” Diff. between the heights of 15 yo and 16 yo girls in the US Bouba-kiki effect in kids (~.15; Lammertink, et al. 2016) .5 “medium” Diff. between the heights of 14 yo and 18 yo girls. Cognitive behavioral therapy on anxiety (~ .4; (Belleville, et al., 2004) Sex difference in implicit math attitudes (~.5; Klein, et al., 2013) .8 “large” Diff. between the heights of 13 yo and 18 yo girls. Syntactic Priming (~.9; Mahowald, et al., under review) Mutual exclusivity ( ~1.0; Lewis & Frank, 2020) (Cohen, 1969) Explore Cohen’s d: https://rpsychologist. com/d3/cohend/

Interpreting Cohen’s d Cognitive (JEP:LMC) Social (JPSP) Psych. Science 0
1 2 0 1 2 3 0 1 2 3 0 1 2 3 absolute cohen's d M = .69 M = .19 M = .78 Estimating the Replicability of Psychological Science (OSF, Science, 2015) N = 97 Relatively “large” effects reported in cognitive psychology

−1 0 1 2 3 Effect Size −1 0 1
2 3 Effect Size Effect size variance and CI n1 = 24 n2 = 24

Effect size measures • Cohen’s d is just one (prototype)
• Appropriate effect size measure depends on aspect of design (e.g., within vs. between subject), and types of variables (e.g. qualitative vs. quantitative). • For any statistical test you conduct can compute effect size (in principle) • the difference is between groups (t-test, d) • the relationship between variables (correlation, r) • the amount of variance accounted for by a factor (ANOVA, regression, f) • … • Can convert between ES metrics

Grand effect size −1.00 1.00 2.00 3.00 Effect size estimate
8. spiegel 7. markman 6. grassman 5. grassman 4. byers 3. bion 2. bion 1. bion 2011 1988 2010 2010 2009 2013 2013 2013 30 45 48 24 17 30 24 18 72 10 12 12 16 20 25 22 First author Year Age (m.) N How do you pool effect sizes? Grand effect size estimate

How do you pool effect sizes? • The goal of
a meta-analysis is to estimate the true population effect size • Treat each study as a sample effect size from a population of studies • Aggregate using quantitative methods (e.g. averaging) • Get point estimate of the true effect size with measure of certainty More precise estimate of effect size than from single study.

Population Sample All the studies we did run (i.e. the
ones in the literature) 0 2 4 6 0.4 0.6 0.8 1.0 Prop. Right count Sample Effect size Count of studies 0 25000 50000 75000 100000 0.0 0.4 0.8 Prop. Right count Population Effect size Count of studies All the studies we could have run Use samples to estimate population. Cohen’s d = .7

Methods for pooling Analogous to logic in single study: •
In a study, sample participants and pool to get estimate of effect in study (unweighted mean) • In meta-analysis, sample studies to get estimate of grand effect (weighted mean) Just as for models across participants, two models for pooling: • Fixed effect: One true population effect • Random effect: Random sample of many population effects, estimates mean

Fixed Effect Model Slide adopted from http://compare2what.blogspot.com/ (Lohse, 2013)

Random Effect Model Random effect models recommended! Slide adopted from
http://compare2what.blogspot.com/ (Lohse, 2013)

Structure of the data for fitting a meta- analytic model
N = 50 effect sizes Effect size Variance of effect size

Fitting the meta-analytic model Grand meta-analytic effect size Grand meta-analytic
effect size confidence interval Is the grand effect size significantly different from zero? Metafor package in R (Viechtbauer, 2010)

Forest Plot • Point = study • Size of square
= weight • Ranges = individual confidence intervals (uncertainty) • Diamond = weighted mean • Dashed line = ES of 0 • If diamond overlap with dashed line the overall effect sizes does not differ from zero Grand effect size −1.00 1.00 2.00 3.00 Effect size estimate 8. spiegel 7. markman 6. grassman 5. grassman 4. byers 3. bion 2. bion 1. bion 2011 1988 2010 2010 2009 2013 2013 2013 30 45 48 24 17 30 24 18 72 10 12 12 16 20 25 22 First author Year Age (m.) N

What a forest plot tells you 1. What is the
overall effect size for phenomenon X? • Because this estimate reflects data from many more participants than a single study, it should be more accurate than the effect size from a single study. • How big is this effect relative to other effects in psychology? 2. Does the effect significantly differ from zero? • If it does not, this suggest there may be no effect (even though individual studies may show an effect). 3. How much variability is there? • Are the effects of individual studies roughly the same, or is there a lot of variability? • If there’s a lot of variability, this suggests there might be an important moderator

Analyzing moderators • Does the effect size vary by different
features of the experiment? • Two kinds of moderators: Categorical and Continuous (Left fig. from Gurevitch et al, 2018) Mutual exclusivity MA

Fitting meta-analytic models with moderators ma_model_age <- rma(d_calc ~ mean_age,
vi = d_var_calc, data = ma_data)

Assessing publication bias How many “missing” studies would have to
exist in order for the overall effect size to be zero? Fail-Safe-N (Orwin, 1983)

Fail-Safe-N M = 3,634 Pointing and vocabulary Gaze following Online
word recognition &RQFHSWïODEHODGvantage Sound symbolism Mutual exclusivity Word segmentation Statistical sound learning Vowel discrimination QRQïQDWLve) Vowel discrimination (native) Phonotactic learning IDS preference 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 FDLOï6DfHï1 Phenomenon . M = 3,228 (Lewis et al., 2016)

Detecting bias through meta-analysis Some expected variability in effect size
due to sample size – need to distinguish this from bias 0.00 0.25 0.50 0.75 1.00 0.4 0.6 0.8 1.0 chance d x x Study 1 Study 2 x Study 3 N = 100 0.4 0.6 0.8 1.0 N = 12

Funnel Plots • Scatter plot • Red points are each
an effect size • X–axis = magnitude of effect size • Y–axis = measure of how precise the study is (number of participants, SE) • Black vertical dashed line is an effect size of zero • Red dashed line is meta-analytic effect size • Triangle corresponds to a 95% confidence interval around the mean (ignore black circle points for now) Fig from Gurevitch, 2018 N = large N = small

Funnel Plots Studies that are more precise (i.e. larger sample
sizes) should have less variance around the true population effect size. Fig from Gurevitch, 2018 N = large N = small

Funnel Plots and Publication Bias If all results are published,
then studies will deviate from mean in either direction (i.e. be symmetrical) If a field of research systematically ignores a certain direction, then this plot can be asymmetrical. If researchers are not publishing studies that have non-significant ES, we should expect a gap in the lower left hand corner Fig from Gurevitch, 2018 N = large N = small

Romantic Priming (Sundie et al., 2011; Study 2) How much
do you want to purchase an expensive-looking wallet? Evolutionary psychologists have argued that male risk-taking and conspicuous consumption are costly sexual signals intended to attract potential mates (Shanks et al. 2015)

Meta-analysis of the “Romantic Priming Effect” Where are all those
studies? Very asymmetrical Suggests publication bias! What should be done next? N = 48 effect sizes

Large-scale, pre-registered replications Shanks, et al. 2015: 14 replications MA
funnel plot (for comparison) Suggests there is no effect!

0.0 0.2 0.4 0.6 0.01 0.02 0.03 0.04 0.05 p−value
Proportion p−values Baseline (null is true) 0.0 0.2 0.4 0.6 0.01 0.02 0.03 0.04 0.05 p−value Proportion p−values Observed – Evidential value 0.0 0.2 0.4 0.6 0.01 0.02 0.03 0.04 0.05 p−value Proportion p−values Observed – p-hacked Assessing analytical bias “P-curve”: Distribution of p-values of a test statistic across a literature (Simonsohn, Nelson, & Simmons, 2014; Simonsohn et al., 2014; Simonsohn, Simmons, & Nelson, 2015)

Concept−label advantage Online word recognition Gaze following Pointing and vocabulary
Statistical sound learning Word segmentation Mutual exclusivity Sound symbolism IDS preference Phonotactic learning Vowel discrimination (native) Vowel discrimination (non−native) 0.01 0.02 0.03 0.04 0.05 0.01 0.02 0.03 0.04 0.05 0.01 0.02 0.03 0.04 0.05 0.01 0.02 0.03 0.04 0.05 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 p−value Proportion p−values P-curves Observed Baseline (null) (Lewis et al., 2016)

Helpful R packages for doing meta- analysis • metafor (Viechtbauer,
2010) – the main workhorse for doing meta- analyses in R (modeling + plotting) • compute.es (Del Re, 2012), esc (Lüdecke, 2018) – for computing a variety of effect sizes and converting between them • pwr (Champely, 2020) – for estimating study power • PRISMAstatement (Wasey, 2019) – for making PRISMA plots

Wrap-up • Meta-analysis is a powerful statistical tool for synthesizing
existing evidence • Assess the evidential value of a literature, the strength of an effect, and moderating influences • Can be used both within a paper and across papers • Great way to start a new project • Reproducibility is important - there are lots of great tools in R for doing MAs • If you’re thinking of doing an MA, I’d be happy to chat with you about it!

Intro to the Meta-Analytic Method

Intro to the Meta-Analytic Method

More Decks by mllewis

Other Decks in Science

Featured

Transcript