Slide 1

Slide 1 text

MAKING ADDONS SEARCH BETTER dave dash - amo

Slide 2

Slide 2 text

ME • web developer for mozilla - mostly AMO • python hacker / web tinkerer • former engineer @ delicious.com (“fresh” homepage, data migrations from Delicious 1.0 to 2.0) • UIUC alum / former NCSA

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

EVERYONE LOVES ADBLOCK

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

firefox

Slide 11

Slide 11 text

PEOPLE WANT ADDONS even if they just don’t know it

Slide 12

Slide 12 text

CUSTOMIZATION = AWESOME

Slide 13

Slide 13 text

MAKES BROWSING BETTER

Slide 14

Slide 14 text

MAKES IT PERSONAL

Slide 15

Slide 15 text

AMO • Largest mozilla site in terms of traffic and hardware • 24 web frontends • 4 mysql slaves off a single master • 2 memcached servers • Zeus proxy

Slide 16

Slide 16 text

WHAT’S UP?

Slide 17

Slide 17 text

SEARCH IS IMPORTANT

Slide 18

Slide 18 text

GOOGLE CAN’T FIND EVERYTHING... YET

Slide 19

Slide 19 text

WEB SITES MOST IMPORTANT FEATURE

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

IMPROVE ADDONS SEARCH

Slide 22

Slide 22 text

SEARCH GOALS • Do something that sucks less than what we’ve got • Do something that makes it easier to suck less in the future • Do something that’s easy to use for our operations team, web developers and most importantly, end-users • Reduce strain on our databases/developers/ops and ultimately end-users

Slide 23

Slide 23 text

• Addons in multiple locales • Platforms for Linux, OS X, BSD, Windows... • Addons for Firefox, Thunderbird, Seamonkey, Sunbird, Mobile • Extensions, Themes, Dictionaries and more LOTS OF ADDONS

Slide 24

Slide 24 text

CHALLENGES • I’m a n00b at Mozilla • Cake PHP framework • No prepared statements • Images are in the DB • Many pages invoke 100s of DB queries • Addon metadata is localized • Addons x Versions x Files x Apps x Application Versions

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

JOINS MAKE ANGELS CRY

Slide 27

Slide 27 text

10QPS IS HARD WHEN ITS JOINS

Slide 28

Slide 28 text

10QPS IS EASY WHEN WE’RE SMART

Slide 29

Slide 29 text

UNCHALLENGES • We can solve a lot of problems using python =) • Smart, helpful webdev team • Mozilla community is supportive (we blog) • 10 queries/second at peak - easy!

Slide 30

Slide 30 text

SPHINX

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

SPHINX • Craigslist, Pirate Bay, Mozilla Support use it • Beats rolling your own • Beats the unmaintainable tangled mess of SQL queries that power search now • Open Source • “It just works”

Slide 33

Slide 33 text

EASY TO START

Slide 34

Slide 34 text

AND THEN IT GETS HARD

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

SPHINX ISSUES • Index all translations of all 5,000 addons (=18,000) • Data needs to be joined and filtered carefully • Database views are horrendously complicated to do this • We stored versions as strings, not integers (3.0, 3.0.*, 3.5b, 3.5rc1, 3.5, 3.6, etc)

Slide 37

Slide 37 text

... • SmushedText/CamelCasing used for a lot of addon names (e.g. FireBug, StumbleUpon) • Business logic around what we display • Mostly an issue with our data, not sphinx • Infix searches vs stemming • Hard to debug when queries go wrong

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

SPHINX WINS • Complicated database view run every 5 minutes, versus complicated join queries run on demand during search (180K queries) • Indexing takes just over a minute - we can speed it up if we wanted • Easy API to drop into out existing codebase

Slide 40

Slide 40 text

... • Small data set is easy to scale • Lots of traffic means we just use puppet to deploy a cloned sphinx server • Load Balance away!

Slide 41

Slide 41 text

ADVANCED SEARCH IS HARD

Slide 42

Slide 42 text

FORMS SUCK • Advanced search forms are difficult to use • A collection of widgets that force you to visit multiple elements in order to fine tune your search • Exposes too many dimensions at once

Slide 43

Slide 43 text

VERSION RANGE?

Slide 44

Slide 44 text

SEARCH OPERATORS

Slide 45

Slide 45 text

SEARCH OPERATORS • Human Readable • Google, Yahoo and Bing do it • Easy to hack • Easy to extend in the future • Easter egg potential

Slide 46

Slide 46 text

WHAT’S NEXT? • Caching indefinitely any “almost static” data like category lists, etc • Retrieving data using less queries (100+ queries = UR doing it wrong) • Getting images out of the database • Make life better for other developers and ops

Slide 47

Slide 47 text

HELPING USERS • New search might not be much better now, but now we can make it better • Can better engineering lead to better user experience? • Doing things “right” makes it easier to develop

Slide 48

Slide 48 text

CONTRIBUTE • Every projects is open source • #amo @ irc.mozilla.org • Mission - make the internet better - we mean this • Or work at Mozilla

Slide 49

Slide 49 text

QUESTIONS?

Slide 50

Slide 50 text