$30 off During Our Annual Pro Sale. View Details »

Data in the City of Chicago (Data Science Chica...

Data in the City of Chicago (Data Science Chicago Meetup)

Provides an overview of Chicago's data science team, it's mission and recent accomplishments. Presentation delivered to the Data Science Chicago Meetup on 2015-05-14 (http://www.meetup.com/Data-Science-Chicago/events/221882893/).

Tom Schenk Jr

May 14, 2015
Tweet

More Decks by Tom Schenk Jr

Other Decks in Research

Transcript

  1. DATA IN THE CITY O P E N D ATA

    A S A B R I D G E T O A D ATA S T R AT E G Y F O R T H E C I T Y A N D T H E C O M M U N I T Y
  2. Agent-based Modeling: How does cities form? Let’s run a simulation.

    Philosophy of Economics: How can a computer simulation represent reality?
  3. •  Hyper-local estimates for internal-rate of return for education • 

    K-12/college/ employment/prison outcomes system •  Long-term program evaluation •  Labor-market outcomes Education Research
  4. Policy areas included community colleges, career preparation, and STEM policy,

    including several pieces of legislation and Governor Brandstad’s Executive Order 74 to create the STEM Advisory Council. Education Policy
  5. Thinking about design saturates all aspects of workflow. Redesigning reports

    and methods of communication has been a common theme in my work, which has included complete re-designs of data reports and websites. Report design
  6. Within policy and politics, data visualization is even more important

    as data must be communicated quickly, effectively to make an impact on policymakers —who are balancing several data inputs. Data Visualization packtpub.com/big-data-and-business-intelligence/circos- data-visualization-how-instant
  7. Northwestern University Applied similar methodologies to understand the impact of

    treatments on cancer patient’s quality of life, including the effects of anti-angiogenesis treatments on renal cell cancer patients.
  8. OPEN DATA HAS ORIGINALLY SERVED TO MAKE DATA TRANSPARENT, BUT

    ITS IMPLEMENTATION IS A CATALYST TO CONSIDER A GREATER DATA STRATEGY.
  9. data.cityofchicago.org Chicago’s open data portal provides almost 600 datasets that

    are updated on a daily basis, ranging from crimes to the quality of water on beaches.
  10. data.cityofchicago.org/view/caas-knxs Chicago has released more data, including important items such

    as red light and speed camera violations, problem landlords, and public chauffeurs.
  11. In 2012, Chicago issued an executive order which formalized the

    open data portal, endowed powers to the Chief Data Officer, created an advisory committee to advise on the expansion of new datasets, and required an annual open data report. Executive Order 2012-2
  12. techplan.cityofchicago.org/initiatives-by-strategy/effective-government/initiative-14/ INCREASE & IMPROVE CITY DATA The City will continue

    to increase and improve the quality of City data available internally and externally, and facilitate methods for analyzing that data to help create a smarter and more efficient city.
  13. The open data report was released in January 2013 to

    describe the upcoming initiatives and plan. The report aligns with the Chicago Tech Plan, which outlines broad initiatives. Open Data Report report.cityofchicago.org/open-data-2013 -or- chicago.github.io/open-data-annual-report-2013
  14. Chicago’s open data portal has been able to quickly expand

    by leveraging automation to regularly upload datasets from various databases. DATABASES PORTAL ETL Server
  15. 0:00 0:05 0:10 0:15 0:20 0:25 0:30 0:00 1:00 2:00

    3:00 4:00 5:00 6:00 7:00 8:00 NOW BEFORE DATABASE PERFORMANCE RUN-TIMES ON STATISTICALLY-VALID SAMPLE OF 311 REPORTS
  16. FOR THE FIRST TIME, CITY USERS HAD ACCESS TO REAL-TIME

    311 AND FINANCIAL REPORTS. THIS CAPABILITY WILL ALSO BE EXTENDED TO THE PORTAL ’S 311 AND OTHER DATA IN LATE 2015.
  17. We released our automation framework as an open-source project that

    can be downloaded to quickly deploy automated updates to Socrata data portals. New datasets can be launched with minimal configuration and provides e-mail alerts. ETL Utility Kit github.com/Chicago/open-data-etl-utility-kit
  18. NLC issued a report discussing the role of Chicago’s leadership

    in developing a leading open data portal. The first chapter reviews Chicago’s open data program and its benefits to the city, residents, and others. National League of Cities
  19. “Open data initiatives are an increasingly popular component of governance.

    At the national level, Chicago’s open data initiative has been held up as a model for cities that are seeking to start their own open data programs.” - National League of Cities, p. 22
  20. OPEN DATA PROVIDES A MEANS TO CREATE AN ECOSYSTEM AROUND

    DATA, WHICH INCLUDES MULTIPLE STAKEHOLDERS AND INITIATIVES THAT EXTEND BEYOND TRANSPARENCY.
  21. Chicago has a large, vibrant, productive, civic community. This is

    led by Chicago residents interested in technology and society. Smart Chicago Collaborative and non-profits provide assistance and city officials regularly engage in meetups and other activities. This group has produced several helpful apps. Community
  22. Using #opendata, this service developed by the civic community alerts

    individuals to street sweeping activity by providing email, text, or calendar alerts. sweeparound.us
  23. The City of Chicago partnered with developers to create LargeLots,

    a website using #opendata to help residents apply to the City of Chicago $1 lot program designed to encourage investment in struggling neighborhoods. largelots.org
  24. Chicago Flu Shots was developed to easily find flu-shot locations

    across Chicago during the fall and winter months. This provides an easy-to- use central website built upon open data by a volunteer. chicagoflushots.org
  25. This site shows the work completed by city crews and

    is also based on the data portal. It provides summary statistics of potholes filled, graffiti removal, and other work completed by city council ward. chicagoworksforyou
  26. OPEN DATA & INTERNET OF THINGS Open data has also

    spread to physical devices. @chrismetcalf used traffic congestion data from the open data portal to generate an imp to provide a red or green light to denote heavy or light traffic congestion.
  27. Array of Things University of Chicago has partnered with multiple

    institutions to build a mesh network of small sensors, dubbed the Array of Things, that will frequently post data for public consumption. arrayofthings.github.io
  28. Array of Things The Array of Things will provide hyper-

    local, temporal data on using a variety of sensors: ! Sensors measuring sound and vibration ! Low-resolution infrared cameras measuring sidewalk temperature ! Climate and environmental data, such as air-quality and temperature
  29. THE OPEN DATA PORTAL IS NOT SUFFICIENT FOR THE COMMUNITY,

    BUT SERVES AS THE TOWN SQUARE FOR A COMMUNITY, PROVIDING A COMMON TOPIC OF CONVERSATION FOR EVERYONE.
  30. DataMade Co. is a consultancy and web design firm that

    usually works with open data. The firm focuses on “telling a story with data” for public and private sector clients.
  31. constructionmonitor.com downloads building permits from the Open Data portal to

    generate sales leads for construction equipment suppliers.
  32. Chicago Tribune uses the city’s crime data to present summaries

    of crime by neighborhoods. This data is often served alongside stories and helps provide a data journalism approach within the news organization. crime.chicagotribune. com github.com/Chicago/osd-street-center-line
  33. This site provides hyper-local information to it’s users. It combines

    data from the portal with a message board where individuals can discuss community issues, lost and found, or pose general questions to neighbors. everyblock.com github.com/Chicago/osd-street-center-line
  34. Sometimes, the terms & conditions were onerous for companies to

    use data. Likewise, people wanted to sometimes correct our data. Data posted on GitHub can be edited by others and comes with a business-friendly MIT license. Open-source data github.com/Chicago/osd-street-center-line
  35. Open-source data (and the MIT license) allowed openstreetmaps.org to import

    all of the building footprints in the city, giving the shape of the city to its users.
  36. The open-source approach also lets users submit corrections for data.

    Like the example above, a mistake was found on the data compared to actuality.
  37. The City of Chicago has a number of high-quality research

    universities and groups willing to engage in projects with the city. We can leverage open data portal and data itself to create cooperative relationships. Researchers
  38. Metalicous is the open-source platform the City of Chicago and

    Chapin Hall at the University of Chicago built to power the city’s data dictionary. It can be adopted and deployed by any other organization. Metalicious github.com/Chicago/metalicious
  39. Incorporating a data-driven practice is contingent on leadership, practice, and

    technology. Most cities have the technology framework in place, it just needs to be added REPORTS DATABASES PORTAL ANALYTICS
  40. We used the same dataflow basis from the portal to

    route it to an internal database that can provide real-time situational awareness. The platform, named WindyGrid, provides this information to city users DATABASES PORTAL Mongo DB
  41. Built using open source software, WindyGrid is a real- time

    situational awareness system that brings over a dozen data sources together into a single application. This September, it will be released as an open source project. WindyGrid
  42. City of Chicago found 31 factors that predicted when and

    where rodent complaints are most likely in the next week. We used spatial- temporal relationships to create these predictions, which started as an investigation of over 350 different factors. Spatial Correlation Temporal Correlation
  43. A list of likely locations are updated and published to

    an internal site used to route preventative baiting crews to bait likely locations.
  44. Chicago developed a package for the R statistical software designed

    to make it easier to R programmers, i.e., researchers, to download and interact with data from the portal. RSocrata github.com/Chicago/RSocrata
  45. #ENGAGEMENT The City of Chicago teamed-up with the Civic Consulting

    Alliance and Allstate Insurance Company’s data science team to help develop the predictive model. Data from the open data portal was used to develop the model. While other data were considered, almost all of the useful data was publicly available.
  46. data.cityofchicago.org/view/2bnm-jnvb Chicago leveraged the open data portal to share data

    with external researchers, leveraging the city’s premiere method of sharing data and saving time on data- sharing agreements. #OPENDATA
  47. Restaurants with previous critical violations Three-day average high temperature CDPH

    risk level Location of restaurant Nearby garbage and sanitation complaints Type of facility Nearby burglaries Whether the establishment has a tobacco or has an incidental alcohol consumption license. Length of time since last inspection. Length of time the restaurant has been inspecting. The model predicts the likelihood of a food establishment having a critical violation, a violation most likely to lead to food borne illnesses. Over a dozen data sources were used to help define the model. Ultimately, ten different variables proved to be useful predictors of critical violations. Significant Predictors:
  48. Data-driven Status quo 0% 10% 20% 30% 40% 50% 60%

    70% The research revealed an opportunity to find deliver results faster. Within the first half of work, 69% of critical violations would have been found by inspectors using a data-driven approach. During the same period, only 55% of violations were found using the status quo method. Critical violations
  49. After comparing a data-driven approach versus the current methods, the

    rate of finding violations was accelerated by an average of 7.4 days in the 60 day pilot. That means more violations would be found sooner by CDPH’s inspectors. 7 days IMPROVEMENT The food inspection model is able to deliver results faster.
  50. OPTIMIZING FOOD INSPECTIONS Impact Discovering critical violations sooner rather than

    later reduces the risk of patrons becoming ill, which helps reduce medical expenses, lost time at work, and even a limited number of fatalities.
  51. http://github.com/Chicago/food-inspections-evaluation The analytical model will be released as an open

    source project on GitHub, allowing other cities to study or even adopt the model in their respective cities. No other city has released their analytic models before this release. #OPENSOURCE
  52. The project was released using an academic- quality technical paper

    instructing others on the the variables and statistical methodology used in the project. In addition to source code, the paper will help researchers adopt this approach. Technical Documentation
  53. The technical paper was written as a highly- reproducible “knitr”

    document, allowing other researchers to understand how summary numbers were calculated. Each statement in the project can be traced to an original source. Reproducible Research
  54. The data science team has built a website which lets

    CDPH prioritize inspections based on projected risk. http://10.220.135.98:8182/
  55. http://github.com/Chicago The analytical model will be released as an open

    source project on GitHub, allowing other cities to study or even adopt the model in their respective cities. No other city has released their analytic models before this release. #OPENSOURCE
  56. #OPENDATA #OPENSCIENCE How might research, when combined with #opendata and

    #engagement with researchers look for a municipal government? It would resemble the on- going #openscience movement. #OPENSOURCE #ENGAGEMENT
  57. Research scales linearly in the city, presenting a challenge to

    complete a broad program of research across all city departments without significant delay. A more intelligent approach needs to be used to meet city needs. Research scalability 1 2 3 4 5 6 7 8 9 10 0" 2" 4" 6" 8" 10" 12" Time to complete research projects Number of research projects Business.as.usual" Scalable"research"
  58. SMART DATA PLATFORM What if we could use modern statistics

    to solve the most common type of research questions in the City? Automating the oft- repeated research project can introduce economies of scale to research. Scaling Research
  59. The development team has begun to wireframe and prototype new

    functionality in the system. An alpha version will be ready by end of Q3 2015. Open Grid (aka WindyGrid 2.0)
  60. THANK YOU Contact Info: Websites: Tom Schenk Jr. Chief Data

    Officer City of Chicago @ChicagoCDO [email protected] data.cityofchicago.org github.com/Chicago techplan.cityofchicago.org report.cityofchicago.org opengovhacknight.org arrayofthings.github.io datadictionary.cityofchicago.org digital.cityofchicago.org