Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Survey Research in the Digital Age

Matthew Salganik
February 21, 2019

Survey Research in the Digital Age

Abstract: In the past several years, we have witnessed the birth and rapid spread of social media, smart phones, and numerous other digital marvels. In addition to changing how we live, these tools enable us to collect and process data about human behavior on a scale never before imaginable. In this talk, I’ll describe how survey research fits into this new data landscape. Further, I’ll use specific examples to illustrate how survey researchers can harness the tools of the digital age to collect data in new ways. Throughout the talk I will emphasize ways that big data sources and surveys can serve as compliments rather than substitutes.

These slides are from a webinar organized by the American Association of Public Opinion Research (AAPOR). More information: https://www.aapor.org/Education-Resources/Online-Education/Webinar-Details.aspx?webinar=WEB0219

Matthew Salganik

February 21, 2019
Tweet

More Decks by Matthew Salganik

Other Decks in Research

Transcript

  1. Survey research in the digital age Matthew J. Salganik Department

    of Sociology Princeton University msalganik AAPOR Webinar February 21, 2019
  2. We will always need to ask limitations of big data

    (fubu vs. nufu-nubu) internal states vs. external states
  3. We will always need to ask limitations of big data

    (fubu vs. nufu-nubu) internal states vs. external states inaccessibility of big data
  4. We will always need to ask limitations of big data

    (fubu vs. nufu-nubu) internal states vs. external states inaccessibility of big data But how we are going to ask is going to change
  5. Sampling Interviews 1st era Area probability Face-to-face 2nd era Random

    digital dial probability Telephone 3rd era Non-probability Computer-administered
  6. Sampling Interviews Data environment 1st era Area probability Face-to-face Stand-alone

    2nd era Random digital dial probability Telephone Stand-alone 3rd era Non-probability Computer-administered Linked
  7. Sampling Interviews Data environment 1st era Area probability Face-to-face Stand-alone

    2nd era Random digital dial probability Telephone Stand-alone 3rd era Non-probability Computer-administered Linked
  8. Sampling Interviews Data environment 1st era Area probability Face-to-face Stand-alone

    2nd era Random digital dial probability Telephone Stand-alone 3rd era Non-probability Computer-administered Linked
  9. Sampling Interviews Data environment 1st era Area probability Face-to-face Stand-alone

    2nd era Random digital dial probability Telephone Stand-alone 3rd era Non-probability Computer-administered Linked
  10. Good web-based systems use the fat-head and the long-tail Zero

    Lots Information Contributed Contributors (sorted by rank) Contributes Most Contributes Least
  11. Surveys don’t use the fat-head or the long-tail Zero Lots

    Information Contributed Contributors (sorted by rank) Contributes Most Contributes Least
  12. Which do you think is a better idea for creating

    a greener, greater New York City? Seeded the wiki survey with 25 ideas: Require all big buildings to make certain energy efficiency upgrades Increase targeted tree plantings in neighborhoods with high asthma rates Establish a New York City Energy Planning Board
  13. What are we trying to estimate? Data Vote Session Prompt

    1 1 item 4 item 1 2 1 item 3 item 1 3 1 item 4 item 3 4 2 item 3 item 4 5 2 item 4 item 2 . . . . . . . . . . . . Opinion matrix      θ1,1 θ1,2 . . . θ1,K θ2,1 θ2,2 . . . θ2,K . . . . . . ... . . . θJ,1 θJ,2 . . . θJ,K      θj,k: how much respondent j likes item k
  14. Which do you think is a better idea for creating

    a greener, greater New York City? Seeded the wiki survey with 25 ideas: Require all big buildings to make certain energy efficiency upgrades Increase targeted tree plantings in neighborhoods with high asthma rates Establish a New York City Energy Planning Board
  15. Recruited participants through Twitter, Facebook, blogs, etc. This is not

    a random sample, but random samples are possible
  16. 31,893 responses 464 ideas uploaded q q q q q

    q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Which do you think is a better idea for creating a greener, greater New York City? Rank of session Responses 1 500 1000 1 200 400 600 800 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 1 40 80 120 160 200 240 1 10 20 30 40 50 Rank of session Contributed ideas
  17. 60 65 70 75 80 85 90 Which do you

    think is a better idea for creating a greener, greater New York City? Score, si ^ q Provide better transit service outside of Manhattan q Create a network of protected bike paths throughout the entire city q Support and protect community gardens and create mechanisms to create new gardens and open space q Implement congestion pricing in lower Manhattan q Require all big buildings to make certain energy efficiency upgrades q Create more year−round Greenmarkets in under−served communities. q Continue enhancing bike lane network, to finally connect separated bike lane systems to each other across all five boroughs. q Plug ships into electricity grid so they don't idle in port − reducing emissions equivalent to 12000 cars per ship. q Invest in multiple modes of transportation and provide both improved infrastructure and improved safety q Keep NYC's drinking water clean by banning fracking in NYC's watershed.
  18. Alternative framings: “Keep NYC’s drinking water clean by banning fracking

    in NYC’s watershed” Novel information: “Plug ships into electricity grid so they don’t idle in port - reducing emissions equivalent to 12000 cars per ship.”
  19. q q q q q q q q q q

    q q q q q q q q q q q q q q q Seed ideas Score, si ^ Rank 1 25 10 20 30 40 50 60 70 80 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q User−contributed ideas Rank 1 50 100 150 200 244 10 20 30 40 50 60 70 80
  20. q q q q q q q q q q

    q q q q q q q q q q q q q q q Seed ideas Score, si ^ Rank 1 25 10 20 30 40 50 60 70 80 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q User−contributed ideas Rank 1 50 100 150 200 244 10 20 30 40 50 60 70 80 variance + volume → extreme cases
  21. Sampling Interviews Data environment 1st era Area probability Face-to-face Stand-alone

    2nd era Random digital dial probability Telephone Stand-alone 3rd era Non-probability Computer-administered Linked
  22. Sampling Interviews Data environment 1st era Area probability Face-to-face Stand-alone

    2nd era Random digital dial probability Telephone Stand-alone 3rd era Non-probability Computer-administered Linked
  23. Daytime satellite images are available, but most researchers had been

    using night lights https://www.nasa.gov/multimedia/imagegallery/image_feature_2480.html
  24. Jean et al. (2016): Day pictures + Nightlights + survey

    data → estimates of wealth in places without surveys
  25. Start with CNN pretrained on ImageNet Train CNN to predict

    nightlights from day pictures (lots of training data)
  26. Start with CNN pretrained on ImageNet (e.g. hampsters and weasels)

    Train CNN to predict nightlights from day pictures (lots of training data) Take features from CNN and train ridge regression to predict cluster mean survey response http://dx.doi.org/10.1126/science.aah5217
  27. Two patterns: Performance decreases when train on one country and

    test on another Performance varies by the quantity being estimated (assets seems easier to estimate than consumption expenditures) http://dx.doi.org/10.1126/science.aaf7894
  28. Sampling Interviews Data environment 1st era Area probability Face-to-face Stand-alone

    2nd era Random digital dial probability Telephone Stand-alone 3rd era Non-probability Computer-administered Linked
  29. Sampling Interviews Data environment 1st era Area probability Face-to-face Stand-alone

    2nd era Random digital dial probability Telephone Stand-alone 3rd era Non-probability Computer-administered Linked