Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using Web Data for Finance

Scrapinghub
August 11, 2016
100

Using Web Data for Finance

We help you get web data hassle free. You can use this data to scale your business, obtain leads, and track competitors.

Scrapinghub

August 11, 2016
Tweet

Transcript

  1. About Scrapinghub Scrapinghub specializes in data extraction. Our platform is

    used to scrape over 4 billion web pages a month. We offer: • Professional Services to handle the web scraping for you • Off-the-shelf datasets so you can get data hassle free • A cloud-based platform that makes scraping a breeze
  2. Founded in 2010, largest 100% remote company based outside of

    the US We’re 134 teammates in 48 countries
  3. “Getting information off the Internet is like taking a drink

    from a fire hydrant.” – Mitchell Kapor
  4. Scrapy Scrapy is a web scraping framework that gets the

    dirty work related to web crawling out of your way. Benefits • No platform lock-in: Open Source • Very popular (13k+ ★) • Battle tested • Highly extensible • Great documentation
  5. Portia Portia is a Visual Scraping tool that lets you

    get data without needing to write code. Benefits • No platform lock-in: Open Source • JavaScript dynamic content generation • Ideal for non-developers • Extensible • It’s as easy as annotating a page
  6. Large Scale Infrastructure Meet Scrapy Cloud , our PaaS for

    web crawlers: • Scalable: Crawlers run on EC2 instances or dedicated servers • Crawlera add-on • Control your spiders: Command line, API or web UI • Machine learning integration: BigML, MonkeyLearn • No lock-in: scrapyd to run Scrapy spiders on your own infrastructure
  7. Broad Crawls Frontera allows us to build large scale web

    crawlers in Python: • Scrapy support out of the box • Distribute and scale custom web crawlers across servers • Crawl Frontier Framework: large scale URL prioritization logic • Aduana to prioritize URLs based on link analysis (PageRank, HITS)
  8. Competitive Pricing Companies use web scraping to monitor the pricing

    and the ratings of competitors: • Scrape online retailers • Structure the data in a search engine or DB • Create an interface to search for products • Sentiment analysis for product rankings
  9. We help a leading IT manufacturer monitor the activities of

    their resellers: • Tracking and watching out for stolen goods • Pricing agreement violations • Customer support responses on complaints • Product line quality checks Monitor Resellers
  10. Lead Generation Mine scraped data to identify who to target

    in a company for your outbound sales campaigns: • Locate possible leads in your target market • Identify the right contacts within each one • Augment the information you already have on them
  11. Real Estate Crawl property websites and use the data obtained

    in order to: • Estimate house prices • Rental values • Housing stock movements • Give insight into real estate agents and homeowners
  12. Fraud Detection Monitor for sellers that offer products violating the

    ToS of credit card companies including: • Drugs • Weapons • Gambling Identify stolen cards and IDs on the Dark Web • Forums where hackers share ID numbers / pins
  13. Company Reputation Sentiment analysis of a company or product through

    newsletters, social networks and other natural language data sources. • NLP to create an associated sentiment indicator. • Track the relevant news supporting the indicator can lead to market insights for long-term trends.
  14. Consumer Behavior Extract data from forums and websites like Reddit

    to evaluate consumer reviews and commentary: • Volume of comments across brands • Topics of discussion • Comparisons with other brands and products • Evaluate product launches and marketing tactics
  15. Tracking Legislation Monitor bills and regulations that are being discussed

    in Congress. Access court judgments and opinions in order to: • Follow discussions • Try to forecast legislative outcomes • Track regulations that impact different economic sectors
  16. Hiring Crawl and extract data from job boards and other

    sources in order to understand: • Hiring trends in different sectors or regions • Find candidates for jobs, or future leaders • Spot and rescue employees that are shopping for a new job
  17. Monitoring Corruption Journalists and analysts can create Open Data by

    extracting information from difficult to access government websites: • Track the activities of lobbyists • Patterns in the behavior of government officials • Disruptions in the economy due to corruption allegations