Predictive Analytics, Cities, and Public Health

Cff972b45e6ea7823fd6a2a231c4e659?s=47 Tom Schenk Jr
September 24, 2018

Predictive Analytics, Cities, and Public Health

Presented at ASM Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines on September 24, 2018

Cff972b45e6ea7823fd6a2a231c4e659?s=128

Tom Schenk Jr

September 24, 2018
Tweet

Transcript

  1. PREDICTIVE ANALYTICS, CITIES, AND PUBLIC HEALTH TOM SCHENK JR. KPMG

    @TOMSCHENKJR
  2. Source: techplan.cityofchicago.org IN CHICAGO, WE BELIEVE THAT THE POWER OF

    TECHNOLOGY IS DRIVEN BY THE PEOPLE WHO USE AND BENEFIT FROM IT.
  3. © 2012 _chrisUK CC-BY-ND 2.0

  4. © 2012 _chrisUK CC-BY-ND 2.0

  5. Adapted from © 2012 Steve Vance CC BY-NC-SA 2.0

  6. Adapted from © 2012 Steve Vance CC BY-NC-SA 2.0

  7. Data on potholes are reported by residents and city staff

    through the 311 system, which is then reported on the City’s #opendata portal—updated daily. data.cityofchicago.org
  8. Chicago has released more #opendata, including important items such as

    red light and speed camera violations, problem landlords, and public chauffeurs. data.cityofchicago.org/view/caas-knxs
  9. #OPENDATA PROVIDES A MEANS TO CREATE AN ECOSYSTEM AROUND DATA,

    WHICH INCLUDES MULTIPLE STAKEHOLDERS AND INITIATIVES THAT EXTEND BEYOND TRANSPARENCY.
  10. None
  11. Civic Tech Community Chicago has a large, vibrant, productive, civic

    community. This is led by Chicago residents interested in technology and society that, along with non-profits, help Chicagoans. © Tom Schenk Jr, 2016. CC-BY
  12. Using #opendata, this service developed by the civic community alerts

    individuals to street sweeping activity by providing email, text, or calendar alerts. sweeparound.us
  13. Chicago Flu Shots was developed to easily find flu-shot locations

    across Chicago. The code was created by a volunteer is #opensource so the site was adopted by Boston, Philadelphia, and San Francisco. chicagoflushots.org
  14. The City of Chicago has a number of high-quality research

    universities and groups willing to engage in projects with the city. We can leverage #opendata portal and data itself to create cooperative relationships. Academia
  15. #Predictions Using advanced research techniques to forecast and predict events

    in the city. #Optimization Optimizing the allocation of resources across the city to for a more efficient city. #Evaluation Evaluate programs, including the effectiveness of advanced analytics.
  16. © 2015 PBS Newshour

  17. Image adapted from Michael Mooney’s Little Chicago (CC-BY 2.0).

  18. Chicago leveraged the #opendata portal to share data with external

    researchers, leveraging the city’s premiere method of sharing data and saving time on data- sharing agreements to create #predictions. Using #opendata
  19. Establishments with previous critical or serious violations Three-day average high

    temperature Nearby garbage and sanitation complaints Nearby burglaries Whether establishments has tobacco or alcohol license Length of time since last inspection Length of time establishment has been operating Inspector assigned The model predicts the likelihood of a food establishment having a critical violation, a violation most likely to lead to food borne illnesses. Over a dozen #opendata sources were used to help define the model. Ultimately, ten different variables proved to create #predictions of critical violations. Significant Predictors
  20. The #predictions revealed an opportunity to find deliver results faster.

    Within the first half of work, 69% of critical violations would have been found by inspectors using a data- driven approach. During the same period, only 55% of violations were found using the status quo method. Critical violations Data-driven Status quo 0% 10% 20% 30% 40% 50% 60% 70%
  21. The food inspection model is able to deliver results faster.

    After comparing a data-driven approach versus the status quo, the rate of finding violations was accelerated by an average of 7.4 days in the 60 day pilot. That means the #predictions led to more violations would be found sooner by inspectors. IMPROVEMENT 7 days
  22. OPTIMIZING FOOD INSPECTIONS Discovering critical violations sooner rather than later

    reduces the risk of patrons becoming ill, which helps reduce medical expenses, lost time at work, and even a limited number of fatalities.
  23. The data science team has built a website which lets

    CDPH prioritize inspections based on projected risk.
  24. http://chicago.github.io/food-inspections-evaluation/

  25. The analytical model will be released as an open source

    project on GitHub, allowing other cities to study or even adopt the model in their respective cities. No other city has released their analytic models before this release. #OPENSOURCE
  26. WEST NILE VIRUS CURTAILING VECTOR-BORN DISEASES

  27. WEST NILE VIRUS • Between 5 and 884 human cases

    reported annually in Illinois since 2002 • 2,371 confirmed human infections since 2002 • Most people who become infected with West Nile virus never develop any symptoms • About 1 in 5 people who are infected will develop flu like symptoms • Less than 1% of people who are infected will develop a serious neurologic illness
  28. PREVENTION The Chicago Department of Public Health (CDPH) uses a

    multi-pronged approach to fight the spread of WNV • Larvicide in stormwater drains • DNA tests of mosquitoes (pictured) • Spraying when WNV is present
  29. DNA MONITORING • At any given time there are 60+

    traps in Chicago collecting (mostly) Culex Pipiens and Culex Restuans mosquitoes • The traps are collected twice / week • Batches of up to 50 mosquitoes are DNA tested • The data is published on https://data.cityofchicago.org/ • The results and model predictions are displayed in WindyGrid
  30. SEASONAL MODEL • We use a generalized linear mixed-effects model

    • Incorporates season and regional bias • Predicts likelihood of WNV one week in advance
  31. WE WERE ABLE TO IDENTIFY WNV ONE WEEK IN ADVANCE

    IN OUT OF SAMPLE DATA 78% OF THE TIME, AND OUR PREDICTION WAS CORRECT 65% OF THE TIME
  32. CLEAR WATER PROTECTING CHICAGO BEACHES

  33. None
  34. Today Yesterday Beach 2 results Hydrometerological Predictors Beach 1 results

    Beach 2 predictions Beach 1 predictions “PRIOR-DAY” BEACH MODELS
  35. PREDICTIONS IN 2016 Prior-day forecasting methods are very noisy. They

    are, overall, accurate, but often fail to predict elevated E. coli levels. In Chicago, the true positive rate (sensitivity) is around 5 percent. 0% 20% 40% 60% 80% 100% Specificity Sensitivity Precision Accuracy
  36. BEACH CORRELATIONS Our research, and limited others, noted that Chicago

    beaches tend to be correlated with each other on a given day.
  37. Hybrid Model Prior-Day Model = −1, 1 , 2 ,

    ⋯ , Today’s prediction Yesterday’s culture-based results Hydrometerological predictors ∈ = ∈ Today’s prediction …in the same “cluster” of beaches “Lead” qPCR results on same day…
  38. Hybrid Model Prior-Day Model Today Yesterday qPCR testing at beach

    1 in group 1 Predictions at beaches 2, 3, and 4 in group 1
  39. CLUSTERING BEACHES We used a simple k-means clustering algorithm to

    group. Beaches were grouped into 5 clusters given the availability of qPCR equipment. These 5 beaches are used to predict results at remaining beaches. , = =1 ∈ − 2
  40. The map shows the five clusters were usually, but not

    strictly geographically correlated. Some beaches were excluded because they have very unique features, namely, breakwaters.
  41. SUMMER 2017 PILOT Predictions for the summer 2017 yielded similar

    accuracy, but a 175% increase in sensitivity from 4% to 11 percent while precision grew from 17% to 27 percent. “False positives” remained consistent. 0% 20% 40% 60% 80% 100% Specificity Sensitivity Precision Accuracy Hybrid (2017) Prior-Day (2016)
  42. Precipitation We tested the impact of adding typical hydrometerological variables

    in the hybrid format. This model was the same functional form but added predictors to better- estimate Enterococci levels. Lake Levels Sunlight Wind Tidal Levels Human Density
  43. MULTIVARIATE HYBRID MODEL Hydrometerological variables did not add significant improvements.

    Overall AUC between hybrid- only and a multivariate hybrid model was 0.753 and 0.744, respectively. 0% 20% 40% 60% 80% 100% Specificity Sensitivity Precision Accuracy Hybrid Multivariate Hybrid-only
  44. The project was released using an academic- quality technical paper

    instructing others on the the variables and statistical methodology used in the project. In addition to source code, the paper will help researchers adopt this approach. Technical Documentation
  45. The technical paper was written as a highly- reproducible “knitr”

    document, allowing other researchers to understand how summary numbers were calculated. Each statement in the project can be traced to an original source. Reproducible Research
  46. Open science is part of a workflow that extends from

    data collection, to posting results, and generating the final publications. Sources: Code: 10.5281/zenodo.1420460 Paper: 10.5281/zenodo.1434260 Pre-print: 10.1101/250480 Open Data: Test Results GitHub: Source Code GitHub: Reproducible Paper Open Data: Predictions Open Data: Hydrometeorological Test Collector Online App Lab Supervisors
  47. CITIZEN SCIENCE PROJECT The project was primarily completed by citizen

    data scientists who volunteered their time each week at the weekly Chi Hack Night meetings.
  48. None
  49. Total hours dedicated to this project through volunteers, Chi Hack

    Night, and students. 1,000 HOURS
  50. © 2015 PBS Newshour

  51. ] [ Insert GIF of someone vomiting

  52. CROSS-REF FOOD POISONING

  53. CROSS-REF FOOD POISONING Work Gym

  54. CROSS-REF FOOD POISONING Work Gym Restaurant

  55. CROSS-REF FOOD POISONING Work Gym Restaurant Airport School

  56. CROSS-REF FOOD POISONING Work Gym Restaurant Airport School “food poisoning

    symptoms” “diarrhea”
  57. FINDER Complaints 0% 10% 20% 30% 40% 50% 60% FINDER

    had higher precision for finding restaurants with critical violations. Over 52% of FINDER-recommended inspections resulted in violations compared to almost 23% of complaints reported to the City of Chicago. Source: 10.1038/s41746-018-0045-1 FINDER inspections
  58. THANK YOU Contact Info: Websites: Tom Schenk Jr. Director of

    Analytics KPMG @tomschenkjr tomschenkjr@gmail.com data.cityofchicago.org github.com/Chicago Clear Water: Code: 10.5281/zenodo.1420460 Paper: 10.5281/zenodo.1434260 Pre-print: 10.1101/250480 Cross-referencing Food Poisoning: 10.1038/s41746-018-0045-1