Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Promise and Peril of Open Data: Lessons fro...

The Promise and Peril of Open Data: Lessons from St. Louis for Public Health Practice

Invited lecture at Washington University's Brown School.

Christopher Prener

March 03, 2020
Tweet

More Decks by Christopher Prener

Other Decks in Education

Transcript

  1. THE PROMISE AND PERIL OF OPEN DATA LESSONS FROM ST.

    LOUIS FOR PUBLIC HEALTH PRACTICE CHRISTOPHER PRENER, PH.D. BROWN SCHOOL PUBLIC HEALTH SPEAKER SERIES MARCH 3, 2020 SAINT LOUIS UNIVERSITY Robert Cohen / St. Louis Post-Dispatch
  2. AGENDA 1. Preface 2. Six Lessons About Open Data
 


    
 
 3. Open Data as Culture Change OPEN DATA / BROWN SCHOOL / MARCH 3, 2020 1. PDF Files Aren’t Open Data 2. There Are No Free Lunches 3. We Are the Product 4. Open Data Are Political 5. Open Data Are (Often) Messy 6. Open Data Are Essential
  3. PREFACE I spend time building tools to make data accessible

    - an individual solution to a structural problem. We need to view “open data” as a structural solution, not an individual one.
  4. DISCIPLINARY MOTIVATION Southern cities (broadly defined) have been glossed over

    in the sociology of cities and urban social problems. We need a broader and deeper engagement with non-paradigmatic cities, but we need data to do that.
  5. SAINT LOUIS UNIVERSITY’S HISTORY WITH SLAVERY EXTENDS BACK TO 1823

    WHEN THE JESUITS BROUGHT THEIR MISSION TO MISSOURI. THEY CAME WITH SIX ENSLAVED PEOPLE FROM WHITE MARSH, MARYLAND. IN 2016, WE KNEW LITTLE ABOUT THOSE SIX PEOPLE BEYOND THEIR FIRST NAMES: THOMAS, MARY (POLLY OR MOLLY), MOSES, NANCY, ISAAC, AND SUCCY. Fred Pestello, Ph.D. St. Louis American
 (2019)
  6. CAUTIONARY NOTE Open data will not “solve” St. Louis’s structural

    problems, but they do provide us with the means to make our interventions more effective.
  7. WHAT ARE OPEN DATA? Open data are machine readable data

    that are licensed for re-use, freely available, and are stored in open, non-proprietary formats. They should be well documented, too.
  8. WHAT ARE OPEN DATA? Open data are machine readable data

    that are licensed for re-use, freely available, and are stored in open, non-proprietary formats. They should be well documented, too.
  9. WHAT ARE OPEN DATA? Open data are machine readable data

    that are licensed for re-use, freely available, and are stored in open, non-proprietary formats. They should be well documented, too.
  10. WHAT ARE OPEN DATA? Open data are machine readable data

    that are licensed for re-use, freely available, and are stored in open, non-proprietary formats. They should be well documented, too.
  11. WHAT ARE OPEN DATA? Open data are machine readable data

    that are licensed for re-use, freely available, and are stored in open, non-proprietary formats. They should be well documented, too.
  12. WHAT ARE OPEN DATA? Open data are machine readable data

    that are licensed for re-use, freely available, and are stored in open, non-proprietary formats. They should be well documented, too.
  13. WHILE THE DATA WILL BE FREE TO THE PUBLIC, [L.A.]

    COUNTY WILL SPEND $319,000 IN STARTUP COSTS, AND ANNUAL EXPENSES ARE EXPECTED TO COST AN ADDITIONAL $287,000. FOR COMPARISON, CONSIDER THIS: CALIFORNIA LAWMAKERS IN JUNE INTRODUCED A BILL TO ESTABLISH A STATEWIDE OPEN DATA POLICY THAT WOULD AFFECT MORE THAN 200 STATE AGENCIES. AN ANALYSIS OF THE BILL’S FISCAL IMPACT SHOWED THE POLICY WOULD COST THE STATE $4 MILLION TO $5 MILLION ANNUALLY. Tod Newcombe Governing
 (2015)
  14. RELATIONS OF PRODUCTION When we say that we love _______

    (data product or open source software ecosystem) because they are free, we gloss over the often invisible labor and significant costs that make our access free.
  15. PRIVACY AND PUBLIC DATA STL Resident @stlresident I just love

    living in the grove - being literally a block from 
 @urbanchestnut is amazing! #prost 2:24 PM ・ Feb 4, 2020 ・ Twitter for iPhone Jonathan Gayman / Sauce Magazine
  16. PRIVACY AND PUBLIC DATA STL Resident @stlresident UGH! We got

    robbed for the second time this year last night!
 I keep thinking about that tweet from @swhikt the
 other day - this city will absolutely break your heart 8:06 AM ・ Feb 24, 2020 ・ Twitter for iPhone
  17. PRIVACY AND PUBLIC DATA STL Resident @stlresident Query the master

    address list for all addresses within
 a two block radius of Urban Chestnut’s Grove location,
 then query SLMPD’s data for robberies, larcenies, and
 burglaries that occurred at the same address twice
 in 2020. Which address(s) appear on both lists?
  18. RESEARCHER RESPONSIBILITY? Laurie Skrivan / St. Louis Post-Dispatch Should we

    be careful posting data on tax delinquency
 or nuisance reports with our vacant building estimates?
  19. RESEARCHER RESPONSIBILITY? David Carson / St. Louis Post-Dispatch City crime

    data include approximate addresses for rapes.
 What are the consequences for making those public?
  20. SCIENCE IS POLITICAL, TOO… Would you choose to 
 submit

    to a journal that
 required posting your
 anonymized analysis 
 data set and the code 
 you used to produce your findings? Why isn’t this required? ?
  21. SO YOU’D LIKE TO REPLICATE THAT? St. Louis 
 Metropolitan


    Police Department Requires manual
 download Tabular data
  22. SO YOU’D LIKE TO REPLICATE THAT? St. Louis 
 Metropolitan


    Police Department Requires manual
 download Tabular data Each month since 2008 is a separate file: = 1 month ‘08 ‘10 ‘12 ‘14 ‘16 ‘18
  23. SO YOU’D LIKE TO REPLICATE THAT? St. Louis 
 Metropolitan


    Police Department Requires manual
 download Tabular data ‘08 ‘10 ‘12 ‘14 ‘16 ‘18 Each is download with a .html file extension: = 1 month
  24. SO YOU’D LIKE TO REPLICATE THAT? St. Louis 
 Metropolitan


    Police Department Requires manual
 download Tabular data ‘08 ‘10 ‘12 ‘14 ‘16 ‘18 Different months have different numbers of columns: = 20 cols = 18 cols = 26 cols
  25. SO YOU’D LIKE TO REPLICATE THAT? ‘08 ‘10 ‘12 ‘14

    ‘16 ‘18 Validate &
 Standardize Collapse Geocode
 Because Some
 Homicides Are
 Missing x,y!
  26. SO YOU’D LIKE TO REPLICATE THAT? ‘08 ‘10 ‘12 ‘14

    ‘16 ‘18 Validate &
 Standardize Collapse Geocode
 Because Some
 Homicides Are
 Missing x,y! This sounds exhausting, is it really worth it?
  27. SCIENCE IS MESSY, TOO… If you had to post your


    raw data, data cleaning
 code, analysis data, and
 analysis code right now, 
 how would they look? How would you feel about doing this? ?
  28. THESE SAME DATA ARE CRITICAL FOR US, TOO… How would

    inaccurate
 Census data impact
 your work? If you use
 another data source,
 what would happen to
 your research if it went away? ?
  29. OPEN DATA ARE ABOUT EQUITY Data and knowledge are
 power.

    Open data are a 
 therefore statement that 
 all of us have the right to 
 learn from public records 
 and publicly supported research.
  30. OPEN DATA ARE ABOUT EQUITY Even for research that
 occurs

    without public
 funding - we should 
 consider what we owe
 to the communities that
 participated (broadly defined) in the production of our findings.
  31. CULTURAL CHANGE IS HARD Embracing open data requires intentionality in

    our policy and research processes - we must serve multiple publics in sustainable ways. How we make data accessible, cite-able, and reusable all matter.
  32. CULTURAL CHANGE IS HARD Embracing open data requires vulnerability in

    our policy and research processes - putting our data and code out into the world means people will look, and might find problems or errors.
  33. CULTURAL CHANGE IS HARD Embracing open data requires vulnerability in

    our relationships - advocating for “open” can be seen defensively as an indictment of “closed.” How we acknowledge and reward openness matters, too.
  34. CULTURAL CHANGE IS HARD Embracing open data requires inclusivity -

    open data are more effective when more people have the means to learn from them. We need dashboards, but we also need to build capacity in communities for data work.
  35. CULTURAL CHANGE IS HARD Embracing open data requires humility in

    our relationships - “use data as a flashlight, not a hammer.”* How we engage data producers and consumers is essential. * David Brooks (2019)
  36. https://github.com/chris-prener/STL_CRIME_Trends https://github.com/slu-openGIS/STL_CRIME_Murders LEARN MORE THANKS FOR COMING! Slides available via

    SpeakerDeck:
 https://bit.ly/32MrxuJ [email protected] https://chris-prener.github.io
 , @chrisprener Replicate figures: