Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Robert King: Making a scalable course search engine with Python

Robert King: Making a scalable course search engine with Python

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Robert King:
Making a scalable course search engine with Python
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
@ Kiwi PyCon 2014 - Sunday, 14 Sep 2014 - Track 2
http://kiwi.pycon.org/

**Audience level**

Experienced

**Description**

Creating a custom search engine with python on google app engine. Serve large spikes in search request traffic. Allow students to find course reviews across multiple universities and countries.

**Abstract**

- introduction to the real world problem - Students need to be able to find university courses across multiple countries and universities.
- first - explore how to solve the problem - collect course data & decide on a rough solution
- second - create minimum viable product & see how people use it. Iteratively make it better.
- second continued: Organise a big website launch event before you've created the website - then proceed to write 10K lines of code in the week before launch.
- third - analyse the 50K most recent search terms & make a simple tree data structure to help improve search performance.
- four - caching & cache invalidation
- five - Maybe I'll do an online marketing campaign halfway through the talk and show graphs of the app responding in real time.

- Covers Data analysis with python (csv, matplotlib, networkx, collections.Counter, logfile parsing)

- Covers "Futures" - doing RPC calls in parallel.
- Unit testing & simulating all things. - Being able to see how adjusting search functionality effects query times & quality of results.

- Some tasteful jokes to keep things entertaining ;)

**YouTube**

https://www.youtube.com/watch?v=568mFzqsjqk

New Zealand Python User Group

September 14, 2014
Tweet

More Decks by New Zealand Python User Group

Other Decks in Programming

Transcript

  1. Looking into the Matrix
    Building a Course Search Engine with Python

    View full-size slide

  2. that moment when
    you realise - you don’t understand the code
    you’re trying to explain

    View full-size slide

  3. Build lots of things from
    scratch
    and get good at refactoring

    View full-size slide

  4. What’s Student Course Review?

    View full-size slide

  5. Popular search terms

    View full-size slide

  6. C
    O
    M
    M
    S
    C
    P
    L
    A
    W
    N
    G
    I
    TRIE TREE DATA STRUCTURE

    View full-size slide

  7. but you don’t have to
    scale yourself

    View full-size slide









  8. View full-size slide

  9. Caching all the things

    View full-size slide

  10. Did you know harry potter
    was a code wizard?
    He could speak parseltongue

    View full-size slide

  11. Sharded counter
    like counting the votes during an election night.

    View full-size slide

  12. Who wants to be Kermit

    View full-size slide

  13. And in Java?

    View full-size slide

  14. Conclusions
    If your architecture is language agnostic then
    you’re safer
    Python > Java

    View full-size slide

  15. www.google.com/+robertking
    kingrobertking at gmail dot com
    robert-king.com
    http://www.studentcoursereview.co.nz/feedback

    View full-size slide