1. I spent many years in urban planning grad school focussed on geographic information systems, so I see the world through a lens of spatial data. At IBM Analytics, I help developers in any way possible get the most out of our hosted and managed database services. I build, talk and write about sample apps and open data sets.
2. 24 years ago when I started in the spatial data business, we had to create all our own data, using these TV-sized heads-up digitizing tablets. It was boring drudgery. Nothing like this cheerful Marlboro smoker seems to imply. Getting away from these things is partly why I went back to grad school ;)
3. I’m so excited about the open data trend because we used to spend 80% of our time creating data. And now we can focus on prep and use and on how to best combine open and enterprise data to best effect.
4. And I think we’re just at the early stages of adoption and publication. I expect the next 10 years will see an explosion in the depth, breadth and timeliness of open data sets.
5. Open data can impact important global issues, but there’s also a lot that can be done right here in our backyard. And it’s not just about tech for societal benefit. We can do good and do well at the same time.
6. I’ve been looking at crime data for a while, but my interest took on a new intensity when Pokémon Go came out and we started hearing about people luring players into dark alleys and robbing them. Surely data could play a role in helping people avoid these incidents. So I decided I would build an app that let you know when you were heading towards an area that, based on crime data, looked like a bad bet.
7. But since I’m an advocate for developers, my first priority is to create something useful for others to build on, so I’m creating a massive database of geocoded crime data, starting with the top cities on the US CITY OPEN DATA CENSUS.
8. All of these cities use Socrata as their data host. Socrata doesn’t just host the data. They provide a great standardized UI for the common person to browse, query and visualize data with a SQL-like query language for developers.
9. I use the API to harvest crime data and aggregate it into a single database in Cloudant — a NoSQL database. This allows me to keep data in one place even though the fields differ across cities.
10. And that’s a big headache. There’s no standard for coding crime type; release dates are random, as are schemes for generalizing the data to protect privacy.
11. But us data wranglers are used to this kind of punishment, so we soldier on. Here’s some Boston crime data. We have lat/lons, some crime codes, and a date. Unfortunately that date is a text field, but I hope someone in the audience is from the city and can get this fixed!
12. But anyway, these fields are generally available across all the cities. Like I said before, the crime codes are all different as well as the release dates.
13. So with this research complete, I worked out a crime data harvesting app to query the Socrata API every morning and grab the latest data from all cities, saving it to Cloudant and standardizing the coding of crimes into 3 categories: non-violent, street crime, and domestic.
14. This gives me what I need for the Pokemon Go safety app, where I’m mainly concerned with street crime. But you could go a completely different direction.
15. For example, how about a restaurant siting helper app? Crime stats would help answer questions like, should I open a store here? If I do, when should it close? Recent weeks’ data for 4 cities (Boston and Dallas coming soon) and 1 country (England) are up on the site.
16. I’m going to build a basic iOS background app that buzzes when you’re in a bad area. But I can’t wait to see what you’ll build. Let me know how I can help!
17. Here’s where you can find code and data I’ve been talking about.