Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scrapy & Scrapinghub

Pablo Hoffman
November 01, 2013

Scrapy & Scrapinghub

Lessons learned building a company around an open source project. I gave this talk at PyCon Uruguay 2013.

Pablo Hoffman

November 01, 2013

More Decks by Pablo Hoffman

Other Decks in Business


  1. Scrapy & Scrapinghub Lessons learned building a company around an

    open source project Pablo Hoffman - PyCon Uruguay 2013
  2. Scrapy → Motivation Scrapers back then were messy, ad-hoc urllib

    + BeautifulSoup + cross fingers Let’s make a framework! conventions write well structured code share code & patterns between sites crawl politely, efficiently & reliably
  3. Scrapy → Early stages direction more unclear no community, no

    users slow progress messy code minimal documentation faith, patience & hard work
  4. Scrapy → Growing up Documentation gets better over time Support

    can be top notch from day 1 good support → more users engaged → community grows!
  5. Scrapy → Today GitHub #12 in Top Python projects 3,000+

    watchers, 750 forks StackOverflow 1,500 questions Twitter 1,000 followers Mailing list 1,600 members 200 messages/month IRC 50 users
  6. Scrapinghub → Motivation We love Scrapy! ↓ So how do

    we keep working on it? ↓ Let’s make a business around it! Crawling at scale is hard & expensive ↓ Let’s bring that to everyone!
  7. Scrapinghub → Inspiration Consulting Cloudera (Hadoop, et.al) LucidWorks (Lucene) SaaS

    Many search SaaS (ElasticSearch, Lucene) Automattic (Wordpress) GetSentry (Sentry) PaaS Heroku, Amazon
  8. Scrapinghub → Conception Validate the business freelancing sites real customers,

    concrete projects don’t worry about scaling (at first) start consulting, productize later Our case 2 years writing Scrapy crawlers at Insophia ↓ there was a real business
  9. Scrapinghub → Community Python community web crawling & data mining

    → hot topics free, organic, tech-savvy referrals bidirectional: give and you shall receive Help, don’t sell grateful users → potential customers Always answer even if it takes a while, and the answer is “No”
  10. Many useful OSS practices remote interactions different time zones meritocracy

    code reviews OSS improvements Scrapinghub & Scrapy must grow together Also important Self-management, keep track of hours Scrapinghub → Management
  11. Scrapinghub → Team Fully distributed team requires discipline, communication, responsiveness

    time-zone friendly to US, EU, Asia, EU Internal tools Google Apps, HipChat, Github, Trello 2010 1 person full-time 2 people part-time Today 35 people full-time 17 countries
  12. Scrapinghub → Hiring Attract good developers remote work, flexible times

    open source (your work is yours) very technical team, good developers Worldwide pool of developers more tailored skills, already know our tools Our case StackOverflow careers + Trello + Trial runs
  13. Scrapinghub → Consulting Easier to bootstrap harder to scale Position

    yourself as experts still working on it :) Helps to understand customer needs find patterns → devise product / open source project Our case Scrapy web crawlers (main source of revenue) Scrapy consultancy, tuning & training
  14. Scrapinghub → Product & Services Solve recurrent, common, tedious tasks

    Our goal infrastructure & services for running web crawlers Our products Scrapy Cloud (PaaS) Autoscraping (SaaS) Crawlera, Splash (developer APIs)
  15. Closing thoughts Proud to watch our open source baby grow

    Happy to make my living with it Confident that it has and will survive any company behind it Hopeful that Scrapinghub will, someday, conquer the world :) Love your open source project!