Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Food Inspections: Optimizing Inspections with A...

Food Inspections: Optimizing Inspections with Analytics

Tom Schenk Jr

May 19, 2015
Tweet

More Decks by Tom Schenk Jr

Other Decks in Research

Transcript

  1. FOOD INSPECTIONS O P T I M I Z I

    N G I N S P E C T I O N S W I T H A N A LY T I C S Image adapted from Edsel Little’s The Hunt (CC BY-SA 2.0)
  2. THE DEPARTMENT OF HEALTH SHALL INSPECT ALL FOOD ESTABLISHMENTS AS

    LEAST ONCE EVERY SIX MONTHS AND AS OFTEN AS NECESSARY TO DETERMINE THAT THE REQUIREMENTS OF THIS MUNICIPAL CODE ARE BEING COMPLIED WITH. 7-42-010
  3. The objective for this research project is to order inspections

    to increase the speed of finding critical violations at retail food establishments. Optimizing Inspections
  4. #ENGAGEMENT The City of Chicago teamed-up with the Civic Consulting

    Alliance and Allstate Insurance Company’s data science team to help develop the predictive model. Data from the open data portal was used to develop the model. While other data were considered, almost all of the useful data was publicly available.
  5. data.cityofchicago.org/view/2bnm-jnvb Chicago leveraged the open data portal to share data

    with external researchers, leveraging the city’s premiere method of sharing data and saving time on data- sharing agreements. #OPENDATA
  6. Restaurants with previous serious and critical violations Three-day average high

    temperature Location of restaurant Nearby garbage and sanitation complaints Nearby burglaries Whether the establishment has a tobacco or has an incidental alcohol consumption license. Length of time since last inspection. Length of time the restaurant has been open. Inspectors. The model predicts the likelihood of a food establishment having a critical violation, a violation most likely to lead to food borne illnesses. Data on the portal was used to help define the model. Ultimately, eleven different variables proved to be useful predictors of critical violations. Significant Predictors:
  7. Data-driven Status quo 0% 10% 20% 30% 40% 50% 60%

    70% The research revealed an opportunity to find deliver results faster. Within the first half of work, 69% of critical violations would have been found by inspectors using a data-driven approach. During the same period, only 55% of violations were found using the status quo method. Critical violations
  8. After comparing a data-driven approach versus the current methods, the

    rate of finding violations was accelerated by an average of 7.4 days in the 60 day pilot. That means more violations would be found sooner by CDPH’s inspectors. 7 days IMPROVEMENT The food inspection model is able to deliver results faster.
  9. http://github.com/Chicago/food-inspections-evaluation The analytical model has been released as an open

    source project on GitHub, allowing other cities to study or even adopt the model in their respective cities. No other city has released their analytic models before this release. #OPENSOURCE
  10. The project uses an evaluation period to benchmark the performance

    of the analytical model. This evaluation period can also be used to benchmark potential improvements of the model. The analytical code in the repo drives the city model, allowing for collaborative improvements.
  11. •  Can we use historical data to predict which inspections

    are most likely to have a critical violation? •  Implications: –  The model type is a binary response model –  The observations are historical food inspections –  A positive outcome is the presence of any violation numbered 1 to 14 MODEL OBJECTIVES
  12. DATA SOURCES Food   Inspec+on   History   License  

    Number   Business   Licenses   Date   Weather   Date  &   Loca+on     Crime   Garbage  Cart   Requests   Sanita+on   Complaints  
  13. MODEL Inspec+on  Data   Food  Inspec+on   History   Historical

     Data    -­‐  Business  Licenses    -­‐  Inspectors    -­‐  Weather    -­‐  Sanita+on    -­‐  Crime    -­‐  Garbage  Carts   Model     Predic+on   Lasso  and  Elas+c-­‐Net   Regularized   Generalized  Linear   Models   R’s  glmnet  package   Model   Object  
  14. PREDICTION AND APPLICATION Current  Licenses   Current  Business   Licenses

      Current  Data    -­‐  Business  Licenses    -­‐  Inspectors    -­‐  Weather    -­‐  Sanita+on    -­‐  Crime    -­‐  Garbage  Carts   Historical  Model     Model   Object   Current  Businesses   +     Current  Predic+on  Score   Shiny  Applica+on  
  15. •  We found that inspectors were part of the prediction,

    but are not published publicly. MERGING INSPECTORS
  16. •  We found that inspectors were part of the prediction,

    but are not published publicly. MERGING INSPECTORS We used matching to group them by simple clustering of the model coefficients!
  17. Our final outcome was a simple list that contained • 

    Business details •  Zip codes •  Predictions That’s it, no fancy maps! Technical notes: •  Updates nightly •  MVC framework •  Uses R Studio’s Shiny •  Built on JQuery The Application
  18. •  The first model was built using data prior to

    2014, and tested in early 2014 •  The second model was built in response to the first –  Completed in the summer of 2014 –  Tested in November based on actual inspection results from September and October TEST / TRAIN FRAMEWORK Training   Period   Not   Used   Test   Period    Sept  2011  –  Feb  2014    Sept  2014  –  Oct  2014   A  model  is  built  on  historically   available  data   The  model  is  tested  on  future   data  
  19. •  The first model was built using data prior to

    2014, and tested in early 2014 •  The second model was built in response to the first –  Completed in the summer of 2014 –  Tested in November based on actual inspection results from September and October TEST / TRAIN FRAMEWORK Training   Period   Not   Used   Test   Period    Sept  2011  –  Feb  2014    Sept  2014  –  Oct  2014   A  model  is  built  on  historically   available  data   The  model  is  tested  on  future   data  
  20. OPTIMIZING FOOD INSPECTIONS Impact Discovering critical violations sooner rather than

    later reduces the risk of patrons becoming ill, which helps reduce medical expenses, lost time at work, and even a limited number of fatalities.
  21. REPOSITORY Code Data Documentation Analytical code used to generate predictions

    and the code that was used to conduct the evaluation. Copies of the data that was used to train the model and to evaluate. Most of it is already on the portal, but provides weather data that is not on the portal. Reproducible reports used during the diagnostics and evaluation. The branch gh-pages contains a public document describing results.
  22. •  Simple naming convention to keep steps organized, and flexible

    •  Naming for data files matches script names Code Structure Code   Prefix   Purpose   Example  Script  File  Names   Corresponding  Example  Data  Files   00   Ini+aliza+on   00_Startup.R   10+   Data  acquisi+on  and  basic   manipula+on   13_food_inspec+on_download.R   13_food_inspec+ons.Rds   20+   Feature  extrac+on   21_calculate_viola+on_matrix.R   21_food_inspec+on_viola+on_matrix.Rds   30+   Models   30_glmnet_model.R   30_model.Rds   30_model_coef.Rds   40+   Predic+on  –  (20  and  30   steps,  for  chosen  model)   40_ini+alize_app_data.R   41_calculate_viola+on_matrix.R   40_bus_license_CURRENT.Rds   40_food_inspec+ons_CURRENT.Rds   41_food_inspec+on_viola+on_matrix.Rds  
  23. •  The past is an indicator of the future – 

    The underlying fundamentals of the problem don’t change •  Heat maps are generated by using a Kernel Density Estimate with –  Normality assumption –  90 Day window –  h = .2 •  Many values are capped to exclude outliers / missing data –  KDE function has various caps for each data type –  Age At Inspection is capped at 4 years –  The number of violations doesn’t matter, only the presence of a violation –  3 day rolling average for weather IMPLICIT ASSUMPTIONS Open  science  allows  the  reader  to  uncover   any  assump+on!  
  24. The project was released using an academic- quality technical paper

    instructing others on the the variables and statistical methodology used in the project. In addition to source code, the paper will help researchers adopt this approach. Technical Documentation
  25. The technical paper was written as a highly- reproducible “knitr”

    document, allowing other researchers to understand how summary numbers were calculated. Each statement in the project can be traced to an original source. Reproducible Research
  26. #OPENDATA #OPENSCIENCE How might research, when combined with #opendata and

    #engagement with researchers look for a municipal government? It would resemble the on- going #openscience movement. #OPENSOURCE #ENGAGEMENT