Slide 1

Slide 1 text

FOOD INSPECTIONS O P T I M I Z I N G I N S P E C T I O N S W I T H A N A LY T I C S Image adapted from Edsel Little’s The Hunt (CC BY-SA 2.0)

Slide 2

Slide 2 text

THE DEPARTMENT OF HEALTH SHALL INSPECT ALL FOOD ESTABLISHMENTS AS LEAST ONCE EVERY SIX MONTHS AND AS OFTEN AS NECESSARY TO DETERMINE THAT THE REQUIREMENTS OF THIS MUNICIPAL CODE ARE BEING COMPLIED WITH. 7-42-010

Slide 3

Slide 3 text

The objective for this research project is to order inspections to increase the speed of finding critical violations at retail food establishments. Optimizing Inspections

Slide 4

Slide 4 text

23% 15% 11% 7% 14% Image adapted from Michael Mooney’s Little Chicago (CC-BY 2.0).

Slide 5

Slide 5 text

#ENGAGEMENT The City of Chicago teamed-up with the Civic Consulting Alliance and Allstate Insurance Company’s data science team to help develop the predictive model. Data from the open data portal was used to develop the model. While other data were considered, almost all of the useful data was publicly available.

Slide 6

Slide 6 text

data.cityofchicago.org/view/2bnm-jnvb Chicago leveraged the open data portal to share data with external researchers, leveraging the city’s premiere method of sharing data and saving time on data- sharing agreements. #OPENDATA

Slide 7

Slide 7 text

Restaurants with previous serious and critical violations Three-day average high temperature Location of restaurant Nearby garbage and sanitation complaints Nearby burglaries Whether the establishment has a tobacco or has an incidental alcohol consumption license. Length of time since last inspection. Length of time the restaurant has been open. Inspectors. The model predicts the likelihood of a food establishment having a critical violation, a violation most likely to lead to food borne illnesses. Data on the portal was used to help define the model. Ultimately, eleven different variables proved to be useful predictors of critical violations. Significant Predictors:

Slide 8

Slide 8 text

Data-driven Status quo 0% 10% 20% 30% 40% 50% 60% 70% The research revealed an opportunity to find deliver results faster. Within the first half of work, 69% of critical violations would have been found by inspectors using a data-driven approach. During the same period, only 55% of violations were found using the status quo method. Critical violations

Slide 9

Slide 9 text

After comparing a data-driven approach versus the current methods, the rate of finding violations was accelerated by an average of 7.4 days in the 60 day pilot. That means more violations would be found sooner by CDPH’s inspectors. 7 days IMPROVEMENT The food inspection model is able to deliver results faster.

Slide 10

Slide 10 text

http://github.com/Chicago/food-inspections-evaluation The analytical model has been released as an open source project on GitHub, allowing other cities to study or even adopt the model in their respective cities. No other city has released their analytic models before this release. #OPENSOURCE

Slide 11

Slide 11 text

The project uses an evaluation period to benchmark the performance of the analytical model. This evaluation period can also be used to benchmark potential improvements of the model. The analytical code in the repo drives the city model, allowing for collaborative improvements.

Slide 12

Slide 12 text

•  Can we use historical data to predict which inspections are most likely to have a critical violation? •  Implications: –  The model type is a binary response model –  The observations are historical food inspections –  A positive outcome is the presence of any violation numbered 1 to 14 MODEL OBJECTIVES

Slide 13

Slide 13 text

DATA SOURCES Food   Inspec+on   History   License   Number   Business   Licenses   Date   Weather   Date  &   Loca+on     Crime   Garbage  Cart   Requests   Sanita+on   Complaints  

Slide 14

Slide 14 text

MODEL Inspec+on  Data   Food  Inspec+on   History   Historical  Data    -­‐  Business  Licenses    -­‐  Inspectors    -­‐  Weather    -­‐  Sanita+on    -­‐  Crime    -­‐  Garbage  Carts   Model     Predic+on   Lasso  and  Elas+c-­‐Net   Regularized   Generalized  Linear   Models   R’s  glmnet  package   Model   Object  

Slide 15

Slide 15 text

PREDICTION AND APPLICATION Current  Licenses   Current  Business   Licenses   Current  Data    -­‐  Business  Licenses    -­‐  Inspectors    -­‐  Weather    -­‐  Sanita+on    -­‐  Crime    -­‐  Garbage  Carts   Historical  Model     Model   Object   Current  Businesses   +     Current  Predic+on  Score   Shiny  Applica+on  

Slide 16

Slide 16 text

•  We found that inspectors were part of the prediction, but are not published publicly. MERGING INSPECTORS

Slide 17

Slide 17 text

•  We found that inspectors were part of the prediction, but are not published publicly. MERGING INSPECTORS We used matching to group them by simple clustering of the model coefficients!

Slide 18

Slide 18 text

Our final outcome was a simple list that contained •  Business details •  Zip codes •  Predictions That’s it, no fancy maps! Technical notes: •  Updates nightly •  MVC framework •  Uses R Studio’s Shiny •  Built on JQuery The Application

Slide 19

Slide 19 text

YOU CAN HELP!

Slide 20

Slide 20 text

•  The first model was built using data prior to 2014, and tested in early 2014 •  The second model was built in response to the first –  Completed in the summer of 2014 –  Tested in November based on actual inspection results from September and October TEST / TRAIN FRAMEWORK Training   Period   Not   Used   Test   Period    Sept  2011  –  Feb  2014    Sept  2014  –  Oct  2014   A  model  is  built  on  historically   available  data   The  model  is  tested  on  future   data  

Slide 21

Slide 21 text

•  The first model was built using data prior to 2014, and tested in early 2014 •  The second model was built in response to the first –  Completed in the summer of 2014 –  Tested in November based on actual inspection results from September and October TEST / TRAIN FRAMEWORK Training   Period   Not   Used   Test   Period    Sept  2011  –  Feb  2014    Sept  2014  –  Oct  2014   A  model  is  built  on  historically   available  data   The  model  is  tested  on  future   data  

Slide 22

Slide 22 text

OPTIMIZING FOOD INSPECTIONS Impact Discovering critical violations sooner rather than later reduces the risk of patrons becoming ill, which helps reduce medical expenses, lost time at work, and even a limited number of fatalities.

Slide 23

Slide 23 text

REPOSITORY Code Data Documentation Analytical code used to generate predictions and the code that was used to conduct the evaluation. Copies of the data that was used to train the model and to evaluate. Most of it is already on the portal, but provides weather data that is not on the portal. Reproducible reports used during the diagnostics and evaluation. The branch gh-pages contains a public document describing results.

Slide 24

Slide 24 text

•  Simple naming convention to keep steps organized, and flexible •  Naming for data files matches script names Code Structure Code   Prefix   Purpose   Example  Script  File  Names   Corresponding  Example  Data  Files   00   Ini+aliza+on   00_Startup.R   10+   Data  acquisi+on  and  basic   manipula+on   13_food_inspec+on_download.R   13_food_inspec+ons.Rds   20+   Feature  extrac+on   21_calculate_viola+on_matrix.R   21_food_inspec+on_viola+on_matrix.Rds   30+   Models   30_glmnet_model.R   30_model.Rds   30_model_coef.Rds   40+   Predic+on  –  (20  and  30   steps,  for  chosen  model)   40_ini+alize_app_data.R   41_calculate_viola+on_matrix.R   40_bus_license_CURRENT.Rds   40_food_inspec+ons_CURRENT.Rds   41_food_inspec+on_viola+on_matrix.Rds  

Slide 25

Slide 25 text

•  The past is an indicator of the future –  The underlying fundamentals of the problem don’t change •  Heat maps are generated by using a Kernel Density Estimate with –  Normality assumption –  90 Day window –  h = .2 •  Many values are capped to exclude outliers / missing data –  KDE function has various caps for each data type –  Age At Inspection is capped at 4 years –  The number of violations doesn’t matter, only the presence of a violation –  3 day rolling average for weather IMPLICIT ASSUMPTIONS Open  science  allows  the  reader  to  uncover   any  assump+on!  

Slide 26

Slide 26 text

DOCUMENTATION

Slide 27

Slide 27 text

http://chicago.github.io/food-inspections-evaluation/

Slide 28

Slide 28 text

The project was released using an academic- quality technical paper instructing others on the the variables and statistical methodology used in the project. In addition to source code, the paper will help researchers adopt this approach. Technical Documentation

Slide 29

Slide 29 text

The technical paper was written as a highly- reproducible “knitr” document, allowing other researchers to understand how summary numbers were calculated. Each statement in the project can be traced to an original source. Reproducible Research

Slide 30

Slide 30 text

#OPENDATA #OPENSCIENCE How might research, when combined with #opendata and #engagement with researchers look for a municipal government? It would resemble the on- going #openscience movement. #OPENSOURCE #ENGAGEMENT