Speaker Deck

Guerrilla Data Liberation

by henare

Published January 19, 2012 in Programming

What has the government been spending my money on? Which corporation is responsible for the most oil spills? How much radiation is coming out of the crippled Fukushima nuclear plant? Where are the crime hot spots in my area? To paraphrase Special Agent Mulder, the data is out there...

The web is full of public data but how do you transform the text from some web page into a beautiful map, how can you get the latest updates when information in that CSV file changes? Web scraping can help and ScraperWiki is an open source online tool to make that process simpler and more collaborative. Anyone can write a scraper using the online editor and the code and data are shared with the world. Because it's a wiki, other programmers can help maintain and improve the code.

ScraperWiki is used by projects such as the OpenAustralia Foundation’s PlanningAlerts to gather development applications in your local area, the OpenCorporates project to create an open database of every corporation in the world and journalists in the emerging world of data-driven investigative journalism.

In this presentation I’ll show how you can start to liberate data you’re interested in on the web. This can help you with everything from satisfying a passing curiosity, to being able to bolt on a powerful API to data for your next big web application.

I will provide you with an overview of the ScraperWiki project and what you can do with ScraperWiki and open data in general. We’ll write a simple scraper to show how easy it is for even people with only basic programming knowledge to get started with their own guerrilla data liberation.