harvesting, data extraction • Extracting data from web sites • Turning unstructured data into structured data • Use cases: web indexing, data mining, price comparison, change detection, mashups etc. • Anti-scraping measures likes robots.txt, captchas, bot detection frameworks etc. • Legal grey area
• High-level: Comprehensive with many useful abstractions and tools • Crawling: Predefined crawlers, easy to write your own • Scraping: Selectors & feed exporters • Commercial support in form of Scrapinghub • FOSS