Making Addons Search Better

E3c6ff6229e3fe28f6dd008d8dc5ad04?s=47 Dave Dash
September 29, 2009

Making Addons Search Better

What I did to make Mozilla Addons better at search

E3c6ff6229e3fe28f6dd008d8dc5ad04?s=128

Dave Dash

September 29, 2009
Tweet

Transcript

  1. 2.

    ME • web developer for mozilla - mostly AMO •

    python hacker / web tinkerer • former engineer @ delicious.com (“fresh” homepage, data migrations from Delicious 1.0 to 2.0) • UIUC alum / former NCSA
  2. 3.
  3. 4.
  4. 5.
  5. 7.
  6. 8.
  7. 9.
  8. 10.
  9. 15.

    AMO • Largest mozilla site in terms of traffic and

    hardware • 24 web frontends • 4 mysql slaves off a single master • 2 memcached servers • Zeus proxy
  10. 20.
  11. 22.

    SEARCH GOALS • Do something that sucks less than what

    we’ve got • Do something that makes it easier to suck less in the future • Do something that’s easy to use for our operations team, web developers and most importantly, end-users • Reduce strain on our databases/developers/ops and ultimately end-users
  12. 23.

    • Addons in multiple locales • Platforms for Linux, OS

    X, BSD, Windows... • Addons for Firefox, Thunderbird, Seamonkey, Sunbird, Mobile • Extensions, Themes, Dictionaries and more LOTS OF ADDONS
  13. 24.

    CHALLENGES • I’m a n00b at Mozilla • Cake PHP

    framework • No prepared statements • Images are in the DB • Many pages invoke 100s of DB queries • Addon metadata is localized • Addons x Versions x Files x Apps x Application Versions
  14. 25.
  15. 29.

    UNCHALLENGES • We can solve a lot of problems using

    python =) • Smart, helpful webdev team • Mozilla community is supportive (we blog) • 10 queries/second at peak - easy!
  16. 30.
  17. 31.
  18. 32.

    SPHINX • Craigslist, Pirate Bay, Mozilla Support use it •

    Beats rolling your own • Beats the unmaintainable tangled mess of SQL queries that power search now • Open Source • “It just works”
  19. 35.
  20. 36.

    SPHINX ISSUES • Index all translations of all 5,000 addons

    (=18,000) • Data needs to be joined and filtered carefully • Database views are horrendously complicated to do this • We stored versions as strings, not integers (3.0, 3.0.*, 3.5b, 3.5rc1, 3.5, 3.6, etc)
  21. 37.

    ... • SmushedText/CamelCasing used for a lot of addon names

    (e.g. FireBug, StumbleUpon) • Business logic around what we display • Mostly an issue with our data, not sphinx • Infix searches vs stemming • Hard to debug when queries go wrong
  22. 38.
  23. 39.

    SPHINX WINS • Complicated database view run every 5 minutes,

    versus complicated join queries run on demand during search (180K queries) • Indexing takes just over a minute - we can speed it up if we wanted • Easy API to drop into out existing codebase
  24. 40.

    ... • Small data set is easy to scale •

    Lots of traffic means we just use puppet to deploy a cloned sphinx server • Load Balance away!
  25. 42.

    FORMS SUCK • Advanced search forms are difficult to use

    • A collection of widgets that force you to visit multiple elements in order to fine tune your search • Exposes too many dimensions at once
  26. 45.

    SEARCH OPERATORS • Human Readable • Google, Yahoo and Bing

    do it • Easy to hack • Easy to extend in the future • Easter egg potential
  27. 46.

    WHAT’S NEXT? • Caching indefinitely any “almost static” data like

    category lists, etc • Retrieving data using less queries (100+ queries = UR doing it wrong) • Getting images out of the database • Make life better for other developers and ops
  28. 47.

    HELPING USERS • New search might not be much better

    now, but now we can make it better • Can better engineering lead to better user experience? • Doing things “right” makes it easier to develop
  29. 48.

    CONTRIBUTE • Every projects is open source • #amo @

    irc.mozilla.org • Mission - make the internet better - we mean this • Or work at Mozilla