Making Addons Search Better

E3c6ff6229e3fe28f6dd008d8dc5ad04?s=47 Dave Dash
September 29, 2009

Making Addons Search Better

What I did to make Mozilla Addons better at search

E3c6ff6229e3fe28f6dd008d8dc5ad04?s=128

Dave Dash

September 29, 2009
Tweet

Transcript

  1. MAKING ADDONS SEARCH BETTER dave dash - amo

  2. ME • web developer for mozilla - mostly AMO •

    python hacker / web tinkerer • former engineer @ delicious.com (“fresh” homepage, data migrations from Delicious 1.0 to 2.0) • UIUC alum / former NCSA
  3. None
  4. None
  5. None
  6. EVERYONE LOVES ADBLOCK

  7. None
  8. None
  9. None
  10. firefox

  11. PEOPLE WANT ADDONS even if they just don’t know it

  12. CUSTOMIZATION = AWESOME

  13. MAKES BROWSING BETTER

  14. MAKES IT PERSONAL

  15. AMO • Largest mozilla site in terms of traffic and

    hardware • 24 web frontends • 4 mysql slaves off a single master • 2 memcached servers • Zeus proxy
  16. WHAT’S UP?

  17. SEARCH IS IMPORTANT

  18. GOOGLE CAN’T FIND EVERYTHING... YET

  19. WEB SITES MOST IMPORTANT FEATURE

  20. None
  21. IMPROVE ADDONS SEARCH

  22. SEARCH GOALS • Do something that sucks less than what

    we’ve got • Do something that makes it easier to suck less in the future • Do something that’s easy to use for our operations team, web developers and most importantly, end-users • Reduce strain on our databases/developers/ops and ultimately end-users
  23. • Addons in multiple locales • Platforms for Linux, OS

    X, BSD, Windows... • Addons for Firefox, Thunderbird, Seamonkey, Sunbird, Mobile • Extensions, Themes, Dictionaries and more LOTS OF ADDONS
  24. CHALLENGES • I’m a n00b at Mozilla • Cake PHP

    framework • No prepared statements • Images are in the DB • Many pages invoke 100s of DB queries • Addon metadata is localized • Addons x Versions x Files x Apps x Application Versions
  25. None
  26. JOINS MAKE ANGELS CRY

  27. 10QPS IS HARD WHEN ITS JOINS

  28. 10QPS IS EASY WHEN WE’RE SMART

  29. UNCHALLENGES • We can solve a lot of problems using

    python =) • Smart, helpful webdev team • Mozilla community is supportive (we blog) • 10 queries/second at peak - easy!
  30. SPHINX

  31. None
  32. SPHINX • Craigslist, Pirate Bay, Mozilla Support use it •

    Beats rolling your own • Beats the unmaintainable tangled mess of SQL queries that power search now • Open Source • “It just works”
  33. EASY TO START

  34. AND THEN IT GETS HARD

  35. None
  36. SPHINX ISSUES • Index all translations of all 5,000 addons

    (=18,000) • Data needs to be joined and filtered carefully • Database views are horrendously complicated to do this • We stored versions as strings, not integers (3.0, 3.0.*, 3.5b, 3.5rc1, 3.5, 3.6, etc)
  37. ... • SmushedText/CamelCasing used for a lot of addon names

    (e.g. FireBug, StumbleUpon) • Business logic around what we display • Mostly an issue with our data, not sphinx • Infix searches vs stemming • Hard to debug when queries go wrong
  38. None
  39. SPHINX WINS • Complicated database view run every 5 minutes,

    versus complicated join queries run on demand during search (180K queries) • Indexing takes just over a minute - we can speed it up if we wanted • Easy API to drop into out existing codebase
  40. ... • Small data set is easy to scale •

    Lots of traffic means we just use puppet to deploy a cloned sphinx server • Load Balance away!
  41. ADVANCED SEARCH IS HARD

  42. FORMS SUCK • Advanced search forms are difficult to use

    • A collection of widgets that force you to visit multiple elements in order to fine tune your search • Exposes too many dimensions at once
  43. VERSION RANGE?

  44. SEARCH OPERATORS

  45. SEARCH OPERATORS • Human Readable • Google, Yahoo and Bing

    do it • Easy to hack • Easy to extend in the future • Easter egg potential
  46. WHAT’S NEXT? • Caching indefinitely any “almost static” data like

    category lists, etc • Retrieving data using less queries (100+ queries = UR doing it wrong) • Getting images out of the database • Make life better for other developers and ops
  47. HELPING USERS • New search might not be much better

    now, but now we can make it better • Can better engineering lead to better user experience? • Doing things “right” makes it easier to develop
  48. CONTRIBUTE • Every projects is open source • #amo @

    irc.mozilla.org • Mission - make the internet better - we mean this • Or work at Mozilla
  49. QUESTIONS?

  50. DD@MOZILLA.COM