There are loads of places to find data – open government data at many levels, publicly released data from companies, and researched data from organizations. Ideally, these sources would be provided as web services. However, often they are a mish-mash of Excel or other loosely structured files, HTML tables, or even PDF documents.
It’s easy to become discouraged with so many obstacles to merely acquiring information for your app or site. Fortunately, there are many tools and techniques to help you gather, parse, and clean up data from a variety of sources.
This session will use a real-world example, Politilines, as an example. I will demonstrate how we found, gathered, parsed, and made sense of the public data needed for Politilines.
Presented at OSCON 2012.