Food Inspections: Optimizing Inspections with Analytics

FOOD INSPECTIONS O P T I M I Z I
N G I N S P E C T I O N S W I T H A N A LY T I C S Image adapted from Edsel Little’s The Hunt (CC BY-SA 2.0)

THE DEPARTMENT OF HEALTH SHALL INSPECT ALL FOOD ESTABLISHMENTS AS
LEAST ONCE EVERY SIX MONTHS AND AS OFTEN AS NECESSARY TO DETERMINE THAT THE REQUIREMENTS OF THIS MUNICIPAL CODE ARE BEING COMPLIED WITH. 7-42-010

The objective for this research project is to order inspections
to increase the speed of finding critical violations at retail food establishments. Optimizing Inspections

23% 15% 11% 7% 14% Image adapted from Michael Mooney’s
Little Chicago (CC-BY 2.0).

#ENGAGEMENT The City of Chicago teamed-up with the Civic Consulting
Alliance and Allstate Insurance Company’s data science team to help develop the predictive model. Data from the open data portal was used to develop the model. While other data were considered, almost all of the useful data was publicly available.

data.cityofchicago.org/view/2bnm-jnvb Chicago leveraged the open data portal to share data
with external researchers, leveraging the city’s premiere method of sharing data and saving time on data- sharing agreements. #OPENDATA

Restaurants with previous serious and critical violations Three-day average high
temperature Location of restaurant Nearby garbage and sanitation complaints Nearby burglaries Whether the establishment has a tobacco or has an incidental alcohol consumption license. Length of time since last inspection. Length of time the restaurant has been open. Inspectors. The model predicts the likelihood of a food establishment having a critical violation, a violation most likely to lead to food borne illnesses. Data on the portal was used to help define the model. Ultimately, eleven different variables proved to be useful predictors of critical violations. Significant Predictors:

Data-driven Status quo 0% 10% 20% 30% 40% 50% 60%
70% The research revealed an opportunity to find deliver results faster. Within the first half of work, 69% of critical violations would have been found by inspectors using a data-driven approach. During the same period, only 55% of violations were found using the status quo method. Critical violations

After comparing a data-driven approach versus the current methods, the
rate of finding violations was accelerated by an average of 7.4 days in the 60 day pilot. That means more violations would be found sooner by CDPH’s inspectors. 7 days IMPROVEMENT The food inspection model is able to deliver results faster.

http://github.com/Chicago/food-inspections-evaluation The analytical model has been released as an open
source project on GitHub, allowing other cities to study or even adopt the model in their respective cities. No other city has released their analytic models before this release. #OPENSOURCE

The project uses an evaluation period to benchmark the performance
of the analytical model. This evaluation period can also be used to benchmark potential improvements of the model. The analytical code in the repo drives the city model, allowing for collaborative improvements.

•  Can we use historical data to predict which inspections
are most likely to have a critical violation? •  Implications: –  The model type is a binary response model –  The observations are historical food inspections –  A positive outcome is the presence of any violation numbered 1 to 14 MODEL OBJECTIVES

DATA SOURCES Food Inspec+on History License
Number Business Licenses Date Weather Date & Loca+on Crime Garbage Cart Requests Sanita+on Complaints

MODEL Inspec+on Data Food Inspec+on History Historical
Data -‐ Business Licenses -‐ Inspectors -‐ Weather -‐ Sanita+on -‐ Crime -‐ Garbage Carts Model Predic+on Lasso and Elas+c-‐Net Regularized Generalized Linear Models R’s glmnet package Model Object

PREDICTION AND APPLICATION Current Licenses Current Business Licenses
Current Data -‐ Business Licenses -‐ Inspectors -‐ Weather -‐ Sanita+on -‐ Crime -‐ Garbage Carts Historical Model Model Object Current Businesses + Current Predic+on Score Shiny Applica+on

•  We found that inspectors were part of the prediction,
but are not published publicly. MERGING INSPECTORS

•  We found that inspectors were part of the prediction,
but are not published publicly. MERGING INSPECTORS We used matching to group them by simple clustering of the model coefficients!

Our final outcome was a simple list that contained • 
Business details •  Zip codes •  Predictions That’s it, no fancy maps! Technical notes: •  Updates nightly •  MVC framework •  Uses R Studio’s Shiny •  Built on JQuery The Application

YOU CAN HELP!

•  The first model was built using data prior to
2014, and tested in early 2014 •  The second model was built in response to the first –  Completed in the summer of 2014 –  Tested in November based on actual inspection results from September and October TEST / TRAIN FRAMEWORK Training Period Not Used Test Period Sept 2011 – Feb 2014 Sept 2014 – Oct 2014 A model is built on historically available data The model is tested on future data

OPTIMIZING FOOD INSPECTIONS Impact Discovering critical violations sooner rather than
later reduces the risk of patrons becoming ill, which helps reduce medical expenses, lost time at work, and even a limited number of fatalities.

REPOSITORY Code Data Documentation Analytical code used to generate predictions
and the code that was used to conduct the evaluation. Copies of the data that was used to train the model and to evaluate. Most of it is already on the portal, but provides weather data that is not on the portal. Reproducible reports used during the diagnostics and evaluation. The branch gh-pages contains a public document describing results.

•  Simple naming convention to keep steps organized, and flexible
•  Naming for data files matches script names Code Structure Code Preﬁx Purpose Example Script File Names Corresponding Example Data Files 00 Ini+aliza+on 00_Startup.R 10+ Data acquisi+on and basic manipula+on 13_food_inspec+on_download.R 13_food_inspec+ons.Rds 20+ Feature extrac+on 21_calculate_viola+on_matrix.R 21_food_inspec+on_viola+on_matrix.Rds 30+ Models 30_glmnet_model.R 30_model.Rds 30_model_coef.Rds 40+ Predic+on – (20 and 30 steps, for chosen model) 40_ini+alize_app_data.R 41_calculate_viola+on_matrix.R 40_bus_license_CURRENT.Rds 40_food_inspec+ons_CURRENT.Rds 41_food_inspec+on_viola+on_matrix.Rds

•  The past is an indicator of the future – 
The underlying fundamentals of the problem don’t change •  Heat maps are generated by using a Kernel Density Estimate with –  Normality assumption –  90 Day window –  h = .2 •  Many values are capped to exclude outliers / missing data –  KDE function has various caps for each data type –  Age At Inspection is capped at 4 years –  The number of violations doesn’t matter, only the presence of a violation –  3 day rolling average for weather IMPLICIT ASSUMPTIONS Open science allows the reader to uncover any assump+on!

DOCUMENTATION

http://chicago.github.io/food-inspections-evaluation/

The project was released using an academic- quality technical paper
instructing others on the the variables and statistical methodology used in the project. In addition to source code, the paper will help researchers adopt this approach. Technical Documentation

The technical paper was written as a highly- reproducible “knitr”
document, allowing other researchers to understand how summary numbers were calculated. Each statement in the project can be traced to an original source. Reproducible Research

#OPENDATA #OPENSCIENCE How might research, when combined with #opendata and
#engagement with researchers look for a municipal government? It would resemble the on- going #openscience movement. #OPENSOURCE #ENGAGEMENT

Food Inspections: Optimizing Inspections with A...

Food Inspections: Optimizing Inspections with Analytics

Tom Schenk Jr

More Decks by Tom Schenk Jr

Other Decks in Research

Featured

Transcript