Upgrade to Pro — share decks privately, control downloads, hide ads and more …

February 2015 - The Snake and the Real Estate Biz

HuPy
February 26, 2015

February 2015 - The Snake and the Real Estate Biz

Talk by Gergely Kalman

HuPy

February 26, 2015
Tweet

More Decks by HuPy

Other Decks in Technology

Transcript

  1. The snake and the real estate biz How I wrote

    a ~400 line Python bot to find me a place to live https://github.com/synapse211/real-estate-bot
  2. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot Hi • My name is Gergely Kálmán • I love Python and startups • Worked for an Alexa #25 company, 17th guy at toptal.com • Co-founded my own company, Buffered VPN
  3. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot We're hiring! [email protected] or talk to me later
  4. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot Legal The opinions expressed within are solely mine and doesn't accurately represent the opinions of my past, present or future employees and/or robot overlords who may or may not have been involved with my caffeine overdose while working on this presentation*. * Don't sue me** ** You'll lose this is all legal
  5. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot Once upon a time • I live in Budapest for about 10 years • In the past 3 I started freelancing • That means working from home • … and my girlfriend does too
  6. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot My Real Estate journey • I had rented 7 places. I have learned: – This sucks – Ads are “outdated” – Ads are “incorrect” • RE agents know more than you (information imbalance) • Agents have nothing to lose but get rewarded for sales. No barrier to entry How many of you had a similar experience?
  7. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot How can you win? • Usually you can subscribe to a newsletter • But everyone gets it at 6am • Rush to check the place out first • If you're not the first, you're unlikely to catch a good deal
  8. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot How can you win? • Usually you can subscribe to a newsletter • But everyone gets it at 6am • Rush to check the place out first • If you're not the first, you're unlikely to catch a good deal Idea #1: This sounds surprisingly like the stock market
  9. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot How can you win? • Usually you can subscribe to a newsletter • But everyone gets it at 6am • Rush to check the place out first • If you're not the first, you're unlikely to catch a good deal Idea #1: This sounds surprisingly like the stock market Idea #2: I don't think anybody has scripted this before
  10. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot My solution • Let's write a bot that scrapes the RE site(s) • Do this every hour, send changes in email • Agents usually work normal hours, so that's way before 6am the next day I should win as I have information much quicker than anyone else
  11. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot The bot • Stupid ~400 line Python script written in a day • Typical prototype code, mistakes were made • Send pull requests https://github.com/synapse211/real-estate-bot
  12. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot “Architecture” • Core – Merges ads, detects changes – Sends alerts • Crawler – Fetcher: gets index and ad pages – Parser: parses HTML to get urls and details • DB – Stores ads in a “database” (pickle) • Deduplicator (optional) – Component to find and eliminate duplicate ads
  13. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot Batteries included • requests because Python HTTP is funky • BeautifulSoup, HTML parser - no Xpath :( • cPickle for DB • smtplib for emailing • Scipy for the deduplicator
  14. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot Lessons learnt • ingatlan.com kicks you out if you're hammering aggressively • Finding the data model is hard • Parsing HTML without Xpath is bad
  15. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot The final layout • core.py – main executable • crawler.py – fetches pages and invokes parser • parser.py – parses HTML • db.py – db functions • deduplicator.py – image similarity detection magic • settings.py – settings
  16. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot Success \o/
  17. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot Deduplicator • Script returns a lot of duplicates • Only way to tell is using the images, hashing won't work • Summon Google – https://stackoverflow.com/questions/1819124/image-compari son-algorithm • Normalizes and resizes the images then applies 2d cross correlation • I have no idea what I just said...
  18. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot Deduplicator • Implemented the algo in few hours • Doesn't scale at all • With my limited knowledge I minimized the compares – Don't compare images: • for ads from different streets • within the same ad • we have compared before, only new stuff • if we successfully found the ad to be a copy
  19. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot Deduplicator • I also started small and increased size to larger matrices – It is much faster to sift through the data this way • Speedup was extreme, it's barely noticeable now • Regarding false pos/neg rate I have no data, but feels ok
  20. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot Future (?) • I mainly did this to avoid dealing with RE people • I probably won't work on it anymore • But feel free to use it, and sell pull requests • You could find other uses for it • But for serious stuff there's scrapy
  21. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot Moral of the story • Bad code with purpose beats perfect code with no purpose • Even though this is not my best work, it gets the job done
  22. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot Moral of the story • Bad code with purpose beats perfect code with no purpose • Even though this is not my best work, it gets the job done Did you know? RE agencies can be implemented in 400 lines of badly written Python
  23. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot Final word for the beginners (before I shut up for real) • If you're interested, start learning to code • Coders will always have a job • Boring stuff must be automated! I wish you way more than luck
  24. Gergely Kalman: The snake and the real estate biz –

    https://github.com/synapse211/real-estate-bot Thanks Questions? We're hiring