Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SPSS Workshop

Dr.Pohlig
December 11, 2014

SPSS Workshop

Covers some SPSS basics & analysis and some more advanced methods.
Given Nov 11, 2014

Dr.Pohlig

December 11, 2014
Tweet

More Decks by Dr.Pohlig

Other Decks in Research

Transcript

  1. SPSS Statistical Software Workshop
    RYAN POHLIG
    BIOSTATISTICIAN, BIOSTATISTICS CORE FACILITY, COLLEGE OF HEALTH
    SCIENCES

    View full-size slide

  2. Workshop
    This workshop will have a brief introduction and then 3 sections of hands on working with SPSS.
    1. Intro to SPSS
    ◦ looking at what SPSS can do
    ◦ Discuss some of the functionality and walk through some examples for data manipulation and
    graphing
    ◦ This section will use 3 data sets
    2. Basic Analysis with assumption checking and output interpretation
    ◦ t-tests
    ◦ Regression
    ◦ One-way anova
    3. Advanced analysis
    • As voted on by you

    View full-size slide

  3. Co-Sponsored by the StatLab
    • StatLab
    • Department of Applied Economics and
    Statistics
    • College of Agriculture & Natural Resources
    • Director: Tom W. Ilvento
    • Site: http://canr.udel.edu/apec/affiliated-
    programs-centers/statlab/
    • Contact: [email protected]
    • StatLab provides easily accessible, high
    quality statistical consulting services to
    university graduate students, faculty, staff,
    administration and outside companies and
    organizations.
    • Provides
    • Research Design
    • Statistical Analysis
    • Statistical Computing

    View full-size slide

  4. What is SPSS?
    • Statistical Package for the Social Sciences
    • IBM owned, and had its name briefly changed to PASW (Predictive Analytics Software)
    • Runs on MACs, PCs, & Unix
    • GUI driven but has programming
    • Convenient for applied researchers to blend by initially doing things through point-and-click and then
    modifying syntax directly
    • Not as flexible as R or SAS for programming
    • Can intuitively import data in a variety of formats, and can export data and results to multiple
    formats
    • http://en.wikipedia.org/wiki/Comparison_of_statistical_packages

    View full-size slide

  5. Resources
    Support: http://www-01.ibm.com/support/docview.wss?uid=swg21592093
    Documentation (Manuals): http://www-01.ibm.com/support/docview.wss?uid=swg27038407
    UCLA site for Statistics, IDRE (Institute for Digital Research and Education):
    http://www.ats.ucla.edu/stat/spss/
    ◦ The IDRE is the best site I’ve seen for stats software help & guides
    Add-Ons
    ◦ SPSS has a number of add-ons that you can purchase
    ◦ AMOS- SEM software
    ◦ Big Data tools
    ◦ Statistics Server- lets you work using data off your server
    ◦ Modeler- data mining tool
    ◦ Data collection- create surveys that can be deployed on web or mobile devices
    ◦ Sample Power- power analysis
    ◦ Text Analytics- used for text and open ended items/variables

    View full-size slide

  6. Windows
    • SPSS is based around 3 windows (files)
    • Dataset
    • Data View – spreadsheet containing actual data
    • Variable View – contains the information about the variables
    • Syntax – lets you edit the commands directly using syntax (SPSS’ ‘code’)
    • Output – Displays the output from procedures run, and errors when encountered
    • If you are unsure use the help feature! It is surprisingly helpful

    View full-size slide

  7. GUI
    • Open SPSS
    • Can’t do some functions without the having actual data
    • Let’s create some fake data, one continuous (Score) and one dichotomous variable (Sex)
    • Menus
    • Data & Transform
    • Used for Manipulating Data
    • Analyze
    • Used to get statistics & results
    • Graphs
    • Used for getting some visual output
    • Toolbar

    View full-size slide

  8. Data View
    • Can directly manipulate data just like in Excel
    • Does not have functions like excel
    • Variables are Columns
    • Rows are cases
    • Can cut, paste, reorder both variables and cases
    • SPSS has no row or column limit
    • For those using 32 bit the theoretical limit on rows and columns spss can handle is
    • (2^32)-1 > 2 billion
    • Leave missing data blank, it is just easier that way!

    View full-size slide

  9. Variable View
    Name
    • First character has to be a letter
    • Cannot contain a space (use underscore _)
    • I would advise NOT ending a variable with
    an underscore as some variables created
    through procedures in SPSS do this also and
    may overwrite what you had
    • Cannot end with a period
    • Variable names must be unique
    Type
    • String & Numeric are most frequent
    • Variety of Date & Currency formats
    • Choosing String when it is a Numeric
    variable will eliminate the way the variable
    can be used in some analyses
    Measure
    • Use Nominal for string variables
    • Scale for everything else

    View full-size slide

  10. Variable View part 2
    Label
    • Here is where you can put in free form text
    to describe exactly what the variable is
    • Caution: this will show up in the output so
    stay away from very long descriptions or
    unhelpful ones (these do not have to be
    unique)
    Missing
    • If you did use a specific value for missing
    you can indicate that here
    Values
    • When variables have multiple levels that
    have specific meanings, you put that
    information here so that it shows up in the
    output
    • For example
    • Males coded as 0, Females coded as 1
    • A 1 stands for strongly disagree, 2 disagree, 3
    neutral, 4 agree, 5 strongly agree

    View full-size slide

  11. Art Vandelay
    Importing data
    • Go to Open
    • Data
    • Click on “File Type” drop down menu
    • Choose excel files
    • Select the “Homework” data file called (HW_data)
    • Can choose which worksheet in excel the
    data comes from
    • Make sure to check that variable names are
    first row
    • Lets try a CSV file
    • Choose either all or Text in drop down
    • Import wizard comes up to help you
    Exporting Data
    • Go to Save As
    • Select the format you want to save the file as
    in the drop down menu
    • Variables button lets you choose if you want
    to save the entire file or remove some
    variables

    View full-size slide

  12. Transform menu
    Compute will be the most useful
    • It is used to create new variables
    • Can be entirely novel
    • Can be a function of other variables already
    present
    • Has a lot of options, like using functions in
    excel
    • Has a way to incorporate an if statement for
    conditional logic
    • Creating a difference score
    • Name the new variable, move over posttest,
    click ‘-’ move over pretest
    • Hit Paste, which will send the commands to the
    syntax window
    Useful Compute commands
    • Algebraic manipulations: ln, SQRT
    • $Casenum – creates a value for each case that
    indicates its row number (easy way to make an
    ID variable)
    • Datediff – lets you find the differences in dates.
    • Any – lets you search for a value or character
    in a variable
    • Create a mean test score
    • Name new variable testaverage
    • Find the Mean function and click the up arrow
    • Select pretest and posttest to insert in
    parentheses

    View full-size slide

  13. Recode
    Recode workplace from 1, 2 into 0 and 1
    Suggestion: Always recode into a NEW variable
    in case of error
    • Move over workplace
    • Name the new variable you want to create
    • Click on change
    • Hit Old & New Values
    • Enter 1 in old value and then 0 in new
    • Click add
    • Repeat with 2 and 1
    • Click Continue
    Recode SES into Low vs Hi (Moderate & High)
    • Move over SES
    • Name the new variable you want to create
    • Click on change
    • Hit Old & New Values
    • Click Range, value through and enter 2, then
    in New Value enter 1
    • Click add
    • Select A1l Other Values, enter 0 in New
    Values
    • Click add

    View full-size slide

  14. Data Menu
    Sort cases
    • Will sort ascending or descending by
    numerical or alphabetical value
    • Split file
    • Compare Groups or Organize Output by
    Groups
    • This command lets you run any procedure for
    each group separately
    • If used for analysis no between comparisons
    will be printed
    Select Cases
    • This is used to filter out data when doing
    analyses
    • Can specify an if statement
    • Use an already created filter variable

    View full-size slide

  15. Analyze - Descriptives
    Frequencies – lists the frequency of each value in a
    table for a variable
    • Move over variables of interest
    • In statistics button select information wanted
    • Can also get some graphs here
    Descriptives
    • Same as Frequency without the frequency table
    and graphing options
    Cross tabs – is used to build contingency tables
    and will provide chi-square tests
    Not going to be covered but OLAP Cubes are just like
    Pivot tables in Excel
    ◦ Under Analyze, Reports, OLAP Cubes
    Explore – variety of useful information for
    variables, and can do so for groups separately
    • Statistics button
    • M-estimators (remove impact of outliers)
    • Outliers – just the 5 highest and lowest values,
    there are NOT necessarily outliers
    • Plots button (can get histograms)
    • Tests of Normality & Homogeneity of variance
    are buried here
    • Check normality plots with tests
    • If you have a group variable, you can test the if the variance
    is different between groups by clicking next to
    Untransformed in the Spread vs Level Plots area
    • Browne-Forsythe test is the one labeled as “Based on
    Median”

    View full-size slide

  16. Split File
    Check Frequency of SES
    Check Frequency of SES by workplace
    • Can use a split file to compare groups
    • Data, Split File, click on Compare Groups and then
    move over workplace
    • Look at lower right corner of SPSS dataset window
    • Check frequency again
    • Use split file to organize output by groups
    • Data, Split File, click on Organize output by groups,
    and then move over workplace
    • Check frequency again
    Alternatively could use Cross Tabs
    • Analyze, Descriptives, Cross Tabs
    • Move over SES into Column and Workplace
    into Row
    Options
    ◦ Statistics – can get chi-square and other
    contingency inferential tests
    ◦ Cells – will let you specify what you want
    displayed on the output (Counts or percentages)
    ◦ Format specify the ordering in the table that is
    outputted.

    View full-size slide

  17. Select if
    Turns out id #43 was ill on the posttest date and ran out of the room before completing the test.
    We need to see if the illness impacted the results and if so remove him/her from results.
    • Check mean diff with and without that case
    • Need to filter out id = 43
    • Go to Data, Select Cases, check “if condition satisfied” and then click on If button
    • We want to select everyone BUT id 43, move over id and then hit not equal to and enter 43
    • Look at lower right corner of SPSS dataset window again

    View full-size slide

  18. Merging
    The HM_data actually included a third time point called “followup” but this is contained in a different
    excel file called “HW_data_addendum”
    We want to add the follow up scores
    • Old School method, can copy and paste the values into the data viewer
    • Can merge data sets
    • First save the HM_data as an spss file (.sav extension)
    • Open up the addendum in SPSS, and save that as an spss file
    • Make sure the “key” variable that you will be matching on is sorted ascending
    • Go to Data, Merge Files
    • Add variables, and navigate the location of the saved addendum file
    • SPSS is ‘smart’ enough to know what is unique
    • Click Match cases on Key variable (and that it is sorted)
    • Select id and move it into the key variables box

    View full-size slide

  19. Restructuring
    The data in the HM_data file now has 3 test
    scores, each as its own variable
    • This is the “wide” format of data
    • What if we wanted to switch to the “long”
    format for the data?
    • Where each measurement occasion is a unique
    case?
    • Use the “restructure” wizard, in the data menu
    • What do we want to do? We want to take
    variables and turn them into cases (first choice)
    • Hit next
    • Since we are only transposing the 1 variable, we
    choose top option
    • Hit next
    ◦ Leave the identifier as id
    ◦ In the variables to be transposed, we want to
    name the variable that will be made out of the
    three test scores, call it score
    ◦ Move over the three test scores
    ◦ All the other variables will be fixed,
    ◦ Move them into fixed box
    ◦ We do want an index variable, which will
    indicate the measurement occasion based on the
    order we chose
    ◦ Index =1 will be pretestscore
    ◦ Index=2 will be posttestscore
    ◦ Index=3 will be followup
    ◦ You can choose to rename index if you want
    ◦ Hit finish

    View full-size slide

  20. Graphing
    There are about 100 ways you can get graphs in SPSS, you have already seen some of them and some will be covered later
    • What does the distribution of all the scores look like?
    • Graphs, Legacy Dialogues, Histograms
    • See if score is related to time using a scatter plot
    • Graphs, Legacy Dialogues, Scatter/Dot, Simple Scatter, and hit define
    • Move over the Score variable to the Y axis, and Index (or time) to the X axis and hit Ok
    • Double click on the graph and add a trendline
    • Looks like a tend over time, but what is wrong with doing it like this?
    • What if you wanted to do some multi-level modeling and wanted to see if there was a trend over time of the test
    scores?
    • Do the same scatterplot as before, but now set marker by id
    • Double click on graph and now we can add 2 different trend lines, an individual one and an aggregated one
    • Click on add fitline at subgroups
    • Click on add fitline at total (click on lines and make the weight 3 to see it better)
    • Export that to an excel file by right clicking and choosing export or go to file export

    View full-size slide

  21. Quick Break
    Basic Analyses next

    View full-size slide

  22. Baseball Payrolls
    • Open baseball payrolls data set
    • This data is Dated at this point
    • It Contains the real values from the ‘07 & ’08
    seasons
    • ID = Team
    • League (NL = 0, AL = 1)
    • Division (East = 0, Central = 1, West = 2)
    • Payroll in millions
    • Win totals
    • Playoff appearances (0 missed playoffs, 1 is
    made playoffs)
    • Number of Playoff series won
    Quick aside
    Statistical Assumptions are done on the
    residuals
    ◦ When you have “simple” designs this can be
    tested by looking that the dependent variable
    directly.
    In more complex designs the model must be
    run, with residuals saved
    ◦ Assumptions can then be checked on saved
    residuals

    View full-size slide

  23. Independent Samples t-tests
    • With the addition of the DH are AL payrolls
    higher than NL?
    • We can test this for 2 years
    What are the t-test Assumptions?
    1. Independence (Design Consideration)
    2. Normality (within groups)
    3. Homogeneity of Variance
    • Can Use explore to check
    • Analyze, Descriptives, Explore
    • Dependents are 2007 & 2008 payrolls, Factor is League
    • Click Plots button
    • Select Normality plots, and untransformed in the Spread vs
    level
    • Uncheck Stem-and-Leaf
    • All assumptions satisfied, ok to run procedure
    • Analyze, Compare Means, Independent
    Sample t-test
    • Move over the outcomes 2007 & 2008 payrolls
    into Test Variables
    • Move league into grouping variable, define
    groups as 0 and 1
    • What is the result?

    View full-size slide

  24. Paired Sample or Repeated Measures t-test
    Do baseball payrolls significantly increase over time?
    • We can test if there was a significant change
    between 2007 and 2008
    What are the Assumptions?
    1. Independence (Design Consideration)
    2. Normality of difference scores
    Need to create difference score by using compute
    command
    ◦ Paydiff = 2008 payroll – 2007 payroll
    ◦ Use Explore on paydiff with no factors for normality test
    What is result?
    ◦ Ignore for a second and then we will fix it
    Analyze, Compare Means, paired Samples t-test
    ◦ Move over 2007 & 2008 payroll variables
    ◦ What is result?
    What about that assumption violation
    ◦ Looks negatively skewed
    ◦ How could this be fixed?
    ◦ Taking SQRT of diff
    ◦ The run a 1 sample t-test on difference score compared to 0
    ◦ Compute paydiff_sqrt = SQRT(paydiff)
    Analyze, Compare Means, one Samples t-test
    ◦ Move over paydiff_sqrt
    ◦ What is result?

    View full-size slide

  25. One-way between subjects ANOVA
    Do baseball payrolls differ significantly
    between Divisions in 2008?
    ◦ Assumptions
    • Normality of residuals (all groups)
    • No outliers (all groups)
    • Homogeneity of variance (all groups)
    • Independence (design consideration)
    • Can Use explore to check
    • Analyze, Descriptives, Explore
    • Dependents are 2008 payrolls, Factor is now division
    • Click Plots button
    • Select Normality plots, and untransformed in the Spread
    vs level
    • Uncheck Stem-and-Leaf
    Assumptions are essentially satisfied
    Time to run model
    ◦ Analyze, General Linear Model, Univariate
    ◦ Move over 2008 payroll as the DV
    ◦ Division is the Fixed Factor
    ◦ Click on Options
    ◦ Check Estimates of effect size
    ◦ Check homogeneity tests for Levine’s Test
    ◦ Move Division into the “Display means for” area
    ◦ Check compare main effects
    ◦ Model is not significant.

    View full-size slide

  26. Two-way between subjects ANOVA
    Do baseball payrolls differ significantly
    between Divisions & Leagues in 2008?
    ◦ Assumptions
    • Normality of residuals (all groups)
    • No outliers (all groups)
    • Homogeneity of variance (all groups)
    • Independence (design consideration)
    • All groups must be tested, but we don’t have
    a variable that separates all 6
    division*league groups
    • Can Use explore to check for normality &
    outliers and HOV
    • Analyze, Descriptives, Explore
    • Dependents are 2008 payrolls, Factor are League and
    division
    • Click Plots button
    • Select Normality plots, and untransformed in the Spread
    vs level
    • Uncheck Stem-and-Leaf
    • Now click Paste
    • Quirk of SPSS
    • You need to add “BY” between the two factor variables

    View full-size slide

  27. If model were significant you could look at the
    Estimated Marginal Means tables for pairwise
    comparisons
    ◦ Division has the estimates reported and a table of
    pairwise comparisons
    ◦ League has the estimates and a table of pairwise
    comparisons
    ◦ The interaction just has the estimates…
    ◦ Quirk of SPSS
    ◦ You can get the same pairwise comparisons
    relatively easily using syntax. Rerun but this time
    hit paste then find the EMMEANS command
    with the interaction and
    ◦ Compare(Division) ADJ(LSD)
    Assumptions are essentially satisfied
    Time to run model
    ◦ Analyze, General Linear Model, Univariate
    ◦ Move over 2008 payroll as the DV
    ◦ Division & League are the Fixed Factors
    ◦ Click on Options
    ◦ Check Estimates of effect size
    ◦ Check Homogeneity tests
    ◦ Move Division, League, and Division*league into the
    “Display means for” area
    ◦ Check compare main effects
    ◦ Model is not significant
    Two-way between subjects ANOVA

    View full-size slide

  28. Correlation-Regression
    What is correlation among payrolls and win totals for 2007 and 2008?
    • Will talk about assumptions in next procedure for Regression
    • Want a correlation matrix made up for 4 variables
    • Analyze correlate bivariate, move over the variables of interest
    • Below you can check the box for Spearman’s Rank correlation if you violate some of the assumptions
    • What do we see?
    Regression
    • What if we want to see the impact of change in payroll on 2008 win total but we want to adjust
    for potential confounder of 2007 win payroll?
    • Need to use multiple regression, predicting 08 win total by paydiff and including 07 payroll in the
    model

    View full-size slide

  29. Regression
    Assumptions
    1. Independence (design consideration)
    2. Normality of residuals (within groups)
    3. No outliers
    4. Linearity
    5. Homoscedasticity- variance equal across all
    values of predictors
    6. No multicollinearity- IVs should not be
    measuring same exact thing
    Multicollinearity
    ◦ Variance Inflation Factor
    ◦ If VIF is greater than or equal to 10 and you may have
    violated this assumption
    ◦ Condition Index
    ◦ If CI is greater than 10 and 2 variance portions greater than
    .5 you might have violated this assumption
    ◦ Tolerances
    ◦ If less than .1 could indicate multicollinearity

    View full-size slide

  30. Regression
    Residual Analysis
    ◦ Can look for these assumptions using Residual Analysis
    ◦ Scatter plot between predicted values (x axis) and residuals
    (y axis)
    ◦ Linearity
    ◦ There are tests for linearity (Box-Cox)
    ◦ Can examine residual plot, if there is a clear pattern, then
    linearity probably was violated
    ◦ Homoscedasticity
    ◦ There are formal tests for this (Breusch-Pagan)
    ◦ If the residual plot looks like a fan shape, could have unequal
    variances
    Not Linear
    Linear & Homoscedastic
    Y
    residuals residuals
    <
    Heteroscedastic
    Y
    residuals
    <
    Y
    <

    View full-size slide

  31. Assumption Checking
    Need to run the model first
    • Analyze, Regression, Linear, move over the
    DV, 08 win total, and two predictors, paydiff
    and 07 payroll
    • Click on statistics, and ask for the collinearity
    diagnostics
    • Click on plots button, and move the Zpred under
    the X axis and Zresid under the Y
    • Click on save button and select unstandardized
    predicted and unstandardized residuals
    • Use explore to look at normality of residuals,
    which are a newly created variable at end of
    dataset
    • Looks ok
    • Check collinearity diagnostics looking at VIF,
    CI, and Tolerance
    • Next check the residual plot OR create you
    own using graph, legacy dialogue, scatter/dot
    • Assumptions have been satisfied, now go back
    and look at model results
    • What did we find?

    View full-size slide

  32. Quick Break
    Advanced Methods

    View full-size slide

  33. Topics
    Logistic Regression (ROC Curve)
    Automated Regression (Model selection)
    ANOVA, within-subjects and mixed design
    Factor Analysis, extractions & rotations
    Reliability
    Missing Values
    Moderation in Regression

    View full-size slide

  34. FA (factor.sav)
    Factor Analysis- want to explain the
    relationships observed among the variables as a
    function of some underlying latent
    constructs/traits.
    The Language
    • Communality- amount of variance in an item
    that is accounted for by all factors
    • Factor loading- quantification of the strength
    of the relationship between the factor and an
    individual item
    3 Things to think about in a FA
    • Number of Factors
    • Extraction- the mathematical way the number
    of factors are found
    • PCA- maximizes the amount of variance that each
    factor explains across items, successively getting
    smaller
    • PAF- finds factors that predict only the shared variance
    of the items
    • Maximum Likelihood – estimates are found that
    maximize the probability (likelihood) of sampling the
    observed correlation matrix from the population
    • Rotation- Make results more interpretable by
    trying to get each item to load onto only 1
    factor
    • Orthogonal- factors are independent of each other
    • Varimax is most popular, maximizes variance
    accounted fore
    • Oblique - factors can be correlated with each other
    • Direct oblimin- simplifies factors by minimizing
    cross products of loadings

    View full-size slide

  35. FA
    Factor Analysis typically has 3 steps (after testing
    assumptions)
    1. Determine the number of factors to extract
    using a PCA with no rotation
    • Visually- Scree plot
    • “Objectively”- parallel analysis,
    • Hard Rules
    • Eigen values > 1
    • Amount of variance accounted for is > 50%
    • Analyze, dimension reduction, factor
    • Move over the variables, click extraction and check on the
    scree plot (uncheck unrotated factor solution)
    • How many factors?
    2. Determine if factors are correlated
    ◦ Repeat steps but after clicking on extraction button, choose
    the method you want to use
    ◦ Click on “fixed number of factors” specify the number of
    factors to extract
    ◦ Next click on rotation button and choose an oblique
    rotation
    ◦ Check the Factor Correlation Matrix
    ◦ Rule of thumb would be if more than one correlation is
    above .3
    3. Interpret solution
    ◦ If oblique rotation is needed you interpret the
    Pattern Matrix to get your factor loadings
    ◦ If orthogonal rotation is needed you interpret
    the rotated Factor matrix to get your factor
    loadings

    View full-size slide

  36. Reliability (factor.sav)
    Internal consistency
    • Coefficient alpha
    • Analyze, Scale, reliability, select alpha from drop
    down, move over items
    • Click statistics, choose all in the “descriptives for” box
    and check the inter-item correlations
    Multiple Raters
    • Cohen’s kappa (Agreement for 2 raters)
    • Analyze, Descriptives, Crosstabs, move over the rater
    1 to rows and 1 to columns
    • Click on statistics, choose kappa
    • ICC (intraclass correlation coefficient
    • Analyze, Scale, reliability, select alpha from drop
    down, move over items
    • Click on ICC, choose appropriate one
    ICC
    • One-Way: when raters did NOT rate all items
    • Two-way random: If raters rated all items, but they
    represent only a sample of possible raters
    • Two-way mixed: if raters rated all items, and
    represent all the raters that will be used.
    • This is most common for research
    • If the variability due to personal responding of raters
    is not “error” and Consistency should be chosen in
    the drop down on the right side.

    View full-size slide

  37. Missing Values (baseball.sav)
    • This analysis allows you to test your data for type of
    missingness using Little’s MCAR test
    • It also allows you to impute data when missing
    • Never do regression or mean imputation
    • List wise & pairwise remove cases
    • EM (Expectation Maximization) is the suggested one
    to use
    Let us create some missing data, copy the column of
    wins2008 and paste into a new column
    • Rename Variable
    • Next delete the win total for
    • Cubs, Marlins, & Padres
    • Analyze, Missing Value Analysis
    • Move over the continuous variables that will help
    predict what those missing values will be
    • Do not include wins2008 as that will cause it to be
    perfect
    • Put division and league in the categorical variables box
    • Put team name in case labels
    • Click the EM checkbox, and then click on EM
    button
    • Check Save completed data
    • Enter a dataset name
    • Open up the new data window that appeared and see
    that values have been imputed
    • Copy the column named missing wins
    • Paste it back into the baseball dataset
    • Compare the imputed estimates to the real
    observations

    View full-size slide