used to scrape over 4 billion web pages a month. We offer: • Professional Services to handle the web scraping for you • Off-the-shelf datasets so you can get data hassle free • A cloud-based platform that makes scraping a breeze
dirty work related to web crawling out of your way. Benefits • No platform lock-in: Open Source • Very popular (13k+ ★) • Battle tested • Highly extensible • Great documentation
get data without needing to write code. Benefits • No platform lock-in: Open Source • JavaScript dynamic content generation • Ideal for non-developers • Extensible • It’s as easy as annotating a page
web crawlers: • Scalable: Crawlers run on EC2 instances or dedicated servers • Crawlera add-on • Control your spiders: Command line, API or web UI • Machine learning integration: BigML, MonkeyLearn • No lock-in: scrapyd to run Scrapy spiders on your own infrastructure
crawlers in Python: • Scrapy support out of the box • Distribute and scale custom web crawlers across servers • Crawl Frontier Framework: large scale URL prioritization logic • Aduana to prioritize URLs based on link analysis (PageRank, HITS)
and the ratings of competitors: • Scrape online retailers • Structure the data in a search engine or DB • Create an interface to search for products • Sentiment analysis for product rankings
their resellers: • Tracking and watching out for stolen goods • Pricing agreement violations • Customer support responses on complaints • Product line quality checks Monitor Resellers
in a company for your outbound sales campaigns: • Locate possible leads in your target market • Identify the right contacts within each one • Augment the information you already have on them
ToS of credit card companies including: • Drugs • Weapons • Gambling Identify stolen cards and IDs on the Dark Web • Forums where hackers share ID numbers / pins
newsletters, social networks and other natural language data sources. • NLP to create an associated sentiment indicator. • Track the relevant news supporting the indicator can lead to market insights for long-term trends.
to evaluate consumer reviews and commentary: • Volume of comments across brands • Topics of discussion • Comparisons with other brands and products • Evaluate product launches and marketing tactics
in Congress. Access court judgments and opinions in order to: • Follow discussions • Try to forecast legislative outcomes • Track regulations that impact different economic sectors
sources in order to understand: • Hiring trends in different sectors or regions • Find candidates for jobs, or future leaders • Spot and rescue employees that are shopping for a new job
extracting information from difficult to access government websites: • Track the activities of lobbyists • Patterns in the behavior of government officials • Disruptions in the economy due to corruption allegations