Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Predictive Analytics, Cities, and Public Health

Tom Schenk Jr
September 24, 2018

Predictive Analytics, Cities, and Public Health

Presented at ASM Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines on September 24, 2018

Tom Schenk Jr

September 24, 2018
Tweet

More Decks by Tom Schenk Jr

Other Decks in Science

Transcript

  1. Source: techplan.cityofchicago.org IN CHICAGO, WE BELIEVE THAT THE POWER OF

    TECHNOLOGY IS DRIVEN BY THE PEOPLE WHO USE AND BENEFIT FROM IT.
  2. Data on potholes are reported by residents and city staff

    through the 311 system, which is then reported on the City’s #opendata portal—updated daily. data.cityofchicago.org
  3. Chicago has released more #opendata, including important items such as

    red light and speed camera violations, problem landlords, and public chauffeurs. data.cityofchicago.org/view/caas-knxs
  4. #OPENDATA PROVIDES A MEANS TO CREATE AN ECOSYSTEM AROUND DATA,

    WHICH INCLUDES MULTIPLE STAKEHOLDERS AND INITIATIVES THAT EXTEND BEYOND TRANSPARENCY.
  5. Civic Tech Community Chicago has a large, vibrant, productive, civic

    community. This is led by Chicago residents interested in technology and society that, along with non-profits, help Chicagoans. © Tom Schenk Jr, 2016. CC-BY
  6. Using #opendata, this service developed by the civic community alerts

    individuals to street sweeping activity by providing email, text, or calendar alerts. sweeparound.us
  7. Chicago Flu Shots was developed to easily find flu-shot locations

    across Chicago. The code was created by a volunteer is #opensource so the site was adopted by Boston, Philadelphia, and San Francisco. chicagoflushots.org
  8. The City of Chicago has a number of high-quality research

    universities and groups willing to engage in projects with the city. We can leverage #opendata portal and data itself to create cooperative relationships. Academia
  9. #Predictions Using advanced research techniques to forecast and predict events

    in the city. #Optimization Optimizing the allocation of resources across the city to for a more efficient city. #Evaluation Evaluate programs, including the effectiveness of advanced analytics.
  10. Chicago leveraged the #opendata portal to share data with external

    researchers, leveraging the city’s premiere method of sharing data and saving time on data- sharing agreements to create #predictions. Using #opendata
  11. Establishments with previous critical or serious violations Three-day average high

    temperature Nearby garbage and sanitation complaints Nearby burglaries Whether establishments has tobacco or alcohol license Length of time since last inspection Length of time establishment has been operating Inspector assigned The model predicts the likelihood of a food establishment having a critical violation, a violation most likely to lead to food borne illnesses. Over a dozen #opendata sources were used to help define the model. Ultimately, ten different variables proved to create #predictions of critical violations. Significant Predictors
  12. The #predictions revealed an opportunity to find deliver results faster.

    Within the first half of work, 69% of critical violations would have been found by inspectors using a data- driven approach. During the same period, only 55% of violations were found using the status quo method. Critical violations Data-driven Status quo 0% 10% 20% 30% 40% 50% 60% 70%
  13. The food inspection model is able to deliver results faster.

    After comparing a data-driven approach versus the status quo, the rate of finding violations was accelerated by an average of 7.4 days in the 60 day pilot. That means the #predictions led to more violations would be found sooner by inspectors. IMPROVEMENT 7 days
  14. OPTIMIZING FOOD INSPECTIONS Discovering critical violations sooner rather than later

    reduces the risk of patrons becoming ill, which helps reduce medical expenses, lost time at work, and even a limited number of fatalities.
  15. The data science team has built a website which lets

    CDPH prioritize inspections based on projected risk.
  16. The analytical model will be released as an open source

    project on GitHub, allowing other cities to study or even adopt the model in their respective cities. No other city has released their analytic models before this release. #OPENSOURCE
  17. WEST NILE VIRUS • Between 5 and 884 human cases

    reported annually in Illinois since 2002 • 2,371 confirmed human infections since 2002 • Most people who become infected with West Nile virus never develop any symptoms • About 1 in 5 people who are infected will develop flu like symptoms • Less than 1% of people who are infected will develop a serious neurologic illness
  18. PREVENTION The Chicago Department of Public Health (CDPH) uses a

    multi-pronged approach to fight the spread of WNV • Larvicide in stormwater drains • DNA tests of mosquitoes (pictured) • Spraying when WNV is present
  19. DNA MONITORING • At any given time there are 60+

    traps in Chicago collecting (mostly) Culex Pipiens and Culex Restuans mosquitoes • The traps are collected twice / week • Batches of up to 50 mosquitoes are DNA tested • The data is published on https://data.cityofchicago.org/ • The results and model predictions are displayed in WindyGrid
  20. SEASONAL MODEL • We use a generalized linear mixed-effects model

    • Incorporates season and regional bias • Predicts likelihood of WNV one week in advance
  21. WE WERE ABLE TO IDENTIFY WNV ONE WEEK IN ADVANCE

    IN OUT OF SAMPLE DATA 78% OF THE TIME, AND OUR PREDICTION WAS CORRECT 65% OF THE TIME
  22. Today Yesterday Beach 2 results Hydrometerological Predictors Beach 1 results

    Beach 2 predictions Beach 1 predictions “PRIOR-DAY” BEACH MODELS
  23. PREDICTIONS IN 2016 Prior-day forecasting methods are very noisy. They

    are, overall, accurate, but often fail to predict elevated E. coli levels. In Chicago, the true positive rate (sensitivity) is around 5 percent. 0% 20% 40% 60% 80% 100% Specificity Sensitivity Precision Accuracy
  24. BEACH CORRELATIONS Our research, and limited others, noted that Chicago

    beaches tend to be correlated with each other on a given day.
  25. Hybrid Model Prior-Day Model = −1, 1 , 2 ,

    ⋯ , Today’s prediction Yesterday’s culture-based results Hydrometerological predictors ∈ = ∈ Today’s prediction …in the same “cluster” of beaches “Lead” qPCR results on same day…
  26. Hybrid Model Prior-Day Model Today Yesterday qPCR testing at beach

    1 in group 1 Predictions at beaches 2, 3, and 4 in group 1
  27. CLUSTERING BEACHES We used a simple k-means clustering algorithm to

    group. Beaches were grouped into 5 clusters given the availability of qPCR equipment. These 5 beaches are used to predict results at remaining beaches. , = =1 ∈ − 2
  28. The map shows the five clusters were usually, but not

    strictly geographically correlated. Some beaches were excluded because they have very unique features, namely, breakwaters.
  29. SUMMER 2017 PILOT Predictions for the summer 2017 yielded similar

    accuracy, but a 175% increase in sensitivity from 4% to 11 percent while precision grew from 17% to 27 percent. “False positives” remained consistent. 0% 20% 40% 60% 80% 100% Specificity Sensitivity Precision Accuracy Hybrid (2017) Prior-Day (2016)
  30. Precipitation We tested the impact of adding typical hydrometerological variables

    in the hybrid format. This model was the same functional form but added predictors to better- estimate Enterococci levels. Lake Levels Sunlight Wind Tidal Levels Human Density
  31. MULTIVARIATE HYBRID MODEL Hydrometerological variables did not add significant improvements.

    Overall AUC between hybrid- only and a multivariate hybrid model was 0.753 and 0.744, respectively. 0% 20% 40% 60% 80% 100% Specificity Sensitivity Precision Accuracy Hybrid Multivariate Hybrid-only
  32. The project was released using an academic- quality technical paper

    instructing others on the the variables and statistical methodology used in the project. In addition to source code, the paper will help researchers adopt this approach. Technical Documentation
  33. The technical paper was written as a highly- reproducible “knitr”

    document, allowing other researchers to understand how summary numbers were calculated. Each statement in the project can be traced to an original source. Reproducible Research
  34. Open science is part of a workflow that extends from

    data collection, to posting results, and generating the final publications. Sources: Code: 10.5281/zenodo.1420460 Paper: 10.5281/zenodo.1434260 Pre-print: 10.1101/250480 Open Data: Test Results GitHub: Source Code GitHub: Reproducible Paper Open Data: Predictions Open Data: Hydrometeorological Test Collector Online App Lab Supervisors
  35. CITIZEN SCIENCE PROJECT The project was primarily completed by citizen

    data scientists who volunteered their time each week at the weekly Chi Hack Night meetings.
  36. FINDER Complaints 0% 10% 20% 30% 40% 50% 60% FINDER

    had higher precision for finding restaurants with critical violations. Over 52% of FINDER-recommended inspections resulted in violations compared to almost 23% of complaints reported to the City of Chicago. Source: 10.1038/s41746-018-0045-1 FINDER inspections
  37. THANK YOU Contact Info: Websites: Tom Schenk Jr. Director of

    Analytics KPMG @tomschenkjr [email protected] data.cityofchicago.org github.com/Chicago Clear Water: Code: 10.5281/zenodo.1420460 Paper: 10.5281/zenodo.1434260 Pre-print: 10.1101/250480 Cross-referencing Food Poisoning: 10.1038/s41746-018-0045-1