CTO @ Lockstep Labs • Ruby developer since 2003 (“Splitter!”) • Py-curious, but mostly for ML / data science PyCon Thailand 2018 locksteplabs.com Yours truly
harvesting, data extraction • Extracting (structured) data from (unstructured) web sites • Use cases: • web indexing • data mining • price comparison • change detection • data mashups
• High-level: Useful abstractions • Crawling: Predefined crawlers, easy to write your own • Scraping: Selectors & feed exporters • Framework: Comprehensive toolset • Commercial support in form of Scrapinghub • FOSS