Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making Addons Search Better

Dave Dash
September 29, 2009

Making Addons Search Better

What I did to make Mozilla Addons better at search

Dave Dash

September 29, 2009
Tweet

More Decks by Dave Dash

Other Decks in Programming

Transcript

  1. ME • web developer for mozilla - mostly AMO •

    python hacker / web tinkerer • former engineer @ delicious.com (“fresh” homepage, data migrations from Delicious 1.0 to 2.0) • UIUC alum / former NCSA
  2. AMO • Largest mozilla site in terms of traffic and

    hardware • 24 web frontends • 4 mysql slaves off a single master • 2 memcached servers • Zeus proxy
  3. SEARCH GOALS • Do something that sucks less than what

    we’ve got • Do something that makes it easier to suck less in the future • Do something that’s easy to use for our operations team, web developers and most importantly, end-users • Reduce strain on our databases/developers/ops and ultimately end-users
  4. • Addons in multiple locales • Platforms for Linux, OS

    X, BSD, Windows... • Addons for Firefox, Thunderbird, Seamonkey, Sunbird, Mobile • Extensions, Themes, Dictionaries and more LOTS OF ADDONS
  5. CHALLENGES • I’m a n00b at Mozilla • Cake PHP

    framework • No prepared statements • Images are in the DB • Many pages invoke 100s of DB queries • Addon metadata is localized • Addons x Versions x Files x Apps x Application Versions
  6. UNCHALLENGES • We can solve a lot of problems using

    python =) • Smart, helpful webdev team • Mozilla community is supportive (we blog) • 10 queries/second at peak - easy!
  7. SPHINX • Craigslist, Pirate Bay, Mozilla Support use it •

    Beats rolling your own • Beats the unmaintainable tangled mess of SQL queries that power search now • Open Source • “It just works”
  8. SPHINX ISSUES • Index all translations of all 5,000 addons

    (=18,000) • Data needs to be joined and filtered carefully • Database views are horrendously complicated to do this • We stored versions as strings, not integers (3.0, 3.0.*, 3.5b, 3.5rc1, 3.5, 3.6, etc)
  9. ... • SmushedText/CamelCasing used for a lot of addon names

    (e.g. FireBug, StumbleUpon) • Business logic around what we display • Mostly an issue with our data, not sphinx • Infix searches vs stemming • Hard to debug when queries go wrong
  10. SPHINX WINS • Complicated database view run every 5 minutes,

    versus complicated join queries run on demand during search (180K queries) • Indexing takes just over a minute - we can speed it up if we wanted • Easy API to drop into out existing codebase
  11. ... • Small data set is easy to scale •

    Lots of traffic means we just use puppet to deploy a cloned sphinx server • Load Balance away!
  12. FORMS SUCK • Advanced search forms are difficult to use

    • A collection of widgets that force you to visit multiple elements in order to fine tune your search • Exposes too many dimensions at once
  13. SEARCH OPERATORS • Human Readable • Google, Yahoo and Bing

    do it • Easy to hack • Easy to extend in the future • Easter egg potential
  14. WHAT’S NEXT? • Caching indefinitely any “almost static” data like

    category lists, etc • Retrieving data using less queries (100+ queries = UR doing it wrong) • Getting images out of the database • Make life better for other developers and ops
  15. HELPING USERS • New search might not be much better

    now, but now we can make it better • Can better engineering lead to better user experience? • Doing things “right” makes it easier to develop
  16. CONTRIBUTE • Every projects is open source • #amo @

    irc.mozilla.org • Mission - make the internet better - we mean this • Or work at Mozilla