Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DXR: Now 75% More Tolerable

DXR: Now 75% More Tolerable

A tale of the past, present, and future of DXR, with a bent toward recruiting you to help hack on it

A2e2fa7edec98420c8b87f64b7b80e54?s=128

Erik Rose

March 24, 2014
Tweet

Transcript

  1. None
  2. DXR *Now 75% more tolerable! Past & present of DXR—our

    search & analysis engine for Mozilla codebases. ! It’s a code browser, full text & regex searcher, structural query language understands the code it indexes. ! There is so much cool stuff to do. Laura & I would both love to see contributions from other webeng folks. My job… Convince you it’s so much fun, you won’t be able to help but start hacking on it.
  3. DXR *Now 75% more tolerable! Dehydra Cross Reference Dave’s Cross

    Reference Da Cross Reference? Dino Cross Reference? Disco Cross Reference? Past & present of DXR—our search & analysis engine for Mozilla codebases. ! It’s a code browser, full text & regex searcher, structural query language understands the code it indexes. ! There is so much cool stuff to do. Laura & I would both love to see contributions from other webeng folks. My job… Convince you it’s so much fun, you won’t be able to help but start hacking on it.
  4. milestones milestones DXR’s evolution during my stewardship can be divided

    into 3 milestones. When I started, couldn’t even type a word into the query field without the JS scrambling the letters. Squarely at the ashamed-of-it milestone. Now, we are within a hair’s breadth of the no-longer-ashamed-of-it milestone, ! causing the famously hard-to-please Taras to say things like dink and dink.
  5. milestones milestones ashamed! of it DXR’s evolution during my stewardship

    can be divided into 3 milestones. When I started, couldn’t even type a word into the query field without the JS scrambling the letters. Squarely at the ashamed-of-it milestone. Now, we are within a hair’s breadth of the no-longer-ashamed-of-it milestone, ! causing the famously hard-to-please Taras to say things like dink and dink.
  6. milestones milestones ashamed! of it no longer ashamed! of it

    DXR’s evolution during my stewardship can be divided into 3 milestones. When I started, couldn’t even type a word into the query field without the JS scrambling the letters. Squarely at the ashamed-of-it milestone. Now, we are within a hair’s breadth of the no-longer-ashamed-of-it milestone, ! causing the famously hard-to-please Taras to say things like dink and dink.
  7. milestones milestones ashamed! of it no longer ashamed! of it

    “Much better than any other code browser I use” DXR’s evolution during my stewardship can be divided into 3 milestones. When I started, couldn’t even type a word into the query field without the JS scrambling the letters. Squarely at the ashamed-of-it milestone. Now, we are within a hair’s breadth of the no-longer-ashamed-of-it milestone, ! causing the famously hard-to-please Taras to say things like dink and dink.
  8. milestones milestones ashamed! of it no longer ashamed! of it

    “Much better than any other code browser I use” “Damn it feels good” DXR’s evolution during my stewardship can be divided into 3 milestones. When I started, couldn’t even type a word into the query field without the JS scrambling the letters. Squarely at the ashamed-of-it milestone. Now, we are within a hair’s breadth of the no-longer-ashamed-of-it milestone, ! causing the famously hard-to-please Taras to say things like dink and dink.
  9. This is DXR, and I am almost not ashamed of

    it. ! ▼ ❑ 27 structural filters • ❑ Callers • ❑ Subclasses, superclasses • ❑ Uses & definitions of types & macros • ❑ Drill in using browse interface ▼ ❑ moz-central & comm-central • ❑ Both updating every 6 hours • ❑ 50 more trees to come, quite possibly including webdev stuff ▼ ❑ All of this is real-time. • ❑ We just did a regex search over a 2.5GB codebase in under a sec. • ❑ Trigram indices
  10. This is DXR, and I am almost not ashamed of

    it. ! ▼ ❑ 27 structural filters • ❑ Callers • ❑ Subclasses, superclasses • ❑ Uses & definitions of types & macros • ❑ Drill in using browse interface ▼ ❑ moz-central & comm-central • ❑ Both updating every 6 hours • ❑ 50 more trees to come, quite possibly including webdev stuff ▼ ❑ All of this is real-time. • ❑ We just did a regex search over a 2.5GB codebase in under a sec. • ❑ Trigram indices
  11. ashamed! of it no longer ashamed! of it milestones Very

    soon we will start chasing the proud-of-it milestone. I want you to help with that. It’s too much fun not to share. Toward that, I want to give you an overview of how DXR’s structured….
  12. ashamed! of it no longer ashamed! of it proud! of

    it proud! of it milestones Very soon we will start chasing the proud-of-it milestone. I want you to help with that. It’s too much fun not to share. Toward that, I want to give you an overview of how DXR’s structured….
  13. Instance Flask app Indexer writes reads Shiny, happy person Compilers

    Pygments VCS plugins Other plugins tolerates • ❑ DXR is 2 halves: indexer and web app. • ❑ Meet in the middle: instance • ❑ dink ▼ ❑ indexer
  14. Instance Flask app Indexer writes reads Shiny, happy person Compilers

    Pygments VCS plugins Other plugins tolerates • ❑ DXR is 2 halves: indexer and web app. • ❑ Meet in the middle: instance • ❑ dink ▼ ❑ indexer
  15. Instance Flask app Indexer writes reads Shiny, happy person Compilers

    Pygments VCS plugins Other plugins tolerates • ❑ DXR is 2 halves: indexer and web app. • ❑ Meet in the middle: instance • ❑ dink ▼ ❑ indexer
  16. Instance Flask app Indexer writes reads Shiny, happy person Compilers

    Pygments VCS plugins Other plugins tolerates • ❑ DXR is 2 halves: indexer and web app. • ❑ Meet in the middle: instance • ❑ dink ▼ ❑ indexer
  17. Instance Flask app Indexer writes reads Shiny, happy person Compilers

    Pygments VCS plugins Other plugins tolerates • ❑ DXR is 2 halves: indexer and web app. • ❑ Meet in the middle: instance • ❑ dink ▼ ❑ indexer
  18. Instance Flask app Indexer writes reads Compilers Pygments VCS plugins

    Other plugins tolerates • ❑ builds FF • ❑ clang plugin • ❑ env var • ❑ CSVs • ❑ writes to instance…
  19. Instance Flask app Indexer writes reads Shiny, happy person Compilers

    Pygments VCS plugins Other plugins tolerates SQLite static HTML NFS • ❑ Flask app
  20. Instance Flask app reads Shiny, happy person tolerates • ❑

    Questions? • ❑ Now, this structure hasn't changed much since I started on the project. But everything inside the boxes has changed like crazy. ! When I first started working on DXR, it was kind of a mess.
  21. pain and treachery ▼ ❑ Had to set up the

    whole doggone thing yourself, fight with versions of sqlite dev headers, tame the linker and loader. ▼ ❑ A Vagrant box (one of Lonnen's early contributions)
  22. vagrant ▼ ❑ We even run our Jenkins tests on

    Vagrant. • ❑ That way, we don't have to get LLVM installed on the host and continually pester anybody for updates. • ❑ As a bonus, the suitability of the box as a fast spin-up platform for contribs stays validated.
  23. CGI ▼ ❑ A single search CGI plus a huge

    folder of static HTML • ❑ It used to actually copy a bunch of Python code into a folder, and then you'd point Apache at that.
  24. Flask ▼ ❑ Flask + WSGI • ❑ People at

    user groups ask if I use Flask. • ❑ I don't know what to say. • ❑ There are maybe 20 Flask-specific lines of code in DXR. • ❑ Part of that is a credit to Flask and its use of plain data types. • ❑ But a lot is that the interesting stuff in DXR happens at indexing and search. • ❑ There are only 5 HTTP endpoints. • ❑ There is zero state mutable at request time. • ❑ It's barely a web project. • ❑ In fact, I'd like to see us add a well-documented JSON endpoint for search and spin off a CLI and editor integration.
  25. zero tests

  26. good tests Good test coverage with nose Harnesses for testing

    against full on-disk instances, for complicated things Or you can just embed C source right in the test if it’s just a dozen lines
  27. no dependency management Hadn't been deployed for 4 months. OS

    packages, dreadfully out of date
  28. peep CD, using peep and npm-lockdown, validated by Jenkins, coordinated

    by Shiva's father
  29. orphaned infra Nobody owned the prod box. OS hadn't been

    updated for a year.
  30. a fubar of our own Brand new hardware, puppet, a

    burly SSD box for doing indexing, python2.7, and a fubar to keep things humming
  31. oogie boogie Monolithic, multi-hundred-line state machine function running the transformation

    from plugin byte offsets to HTML ! No tests, of course.
  32. kablam! ▼ ❑ Totally rewritten rendering pipeline • ❑ Plugins

    now independent • ❑ Faster. Made out of strung-together generators. • ❑ fully TESTED
  33. WTF Custom C++ tokenizer just to find include statements

  34. clangity clang Now letting clang do what it's good at

  35. reinvent them wheels Custom impl of the multiprocess module. Separate

    dxr-worker binaries it'd shell out to. Ad hoc work queue.
  36. functional yum Now concurrent.futures. Functional glory. Just looks like a

    big map statement.
  37. hurt the user You couldn't even type a simple, one-word

    query without the JS screwing it up.
  38. decent ui UI refresh! Stable UI. Not things showing &

    hiding all the time Consistent query lang. Proper parser based on Parsimonious All fresh JS from Schalk and contribs. Features like multi-line highlights. Pushstate/popstate. ! Used to look like this. Now looks like this.
  39. decent ui UI refresh! Stable UI. Not things showing &

    hiding all the time Consistent query lang. Proper parser based on Parsimonious All fresh JS from Schalk and contribs. Features like multi-line highlights. Pushstate/popstate. ! Used to look like this. Now looks like this.
  40. decent ui UI refresh! Stable UI. Not things showing &

    hiding all the time Consistent query lang. Proper parser based on Parsimonious All fresh JS from Schalk and contribs. Features like multi-line highlights. Pushstate/popstate. ! Used to look like this. Now looks like this.
  41. rewriting In case you haven't noticed, we're gradually rewriting the

    entire thing. ! UI: been through one major revision http interface: been replaced render pipeline: rewritten query machinery: rewriting now storage backend: up next ! It's an aggressively 4-dimensional project.
  42. rewriting ui In case you haven't noticed, we're gradually rewriting

    the entire thing. ! UI: been through one major revision http interface: been replaced render pipeline: rewritten query machinery: rewriting now storage backend: up next ! It's an aggressively 4-dimensional project.
  43. rewriting ui http interface In case you haven't noticed, we're

    gradually rewriting the entire thing. ! UI: been through one major revision http interface: been replaced render pipeline: rewritten query machinery: rewriting now storage backend: up next ! It's an aggressively 4-dimensional project.
  44. rewriting ui http interface render pipeline In case you haven't

    noticed, we're gradually rewriting the entire thing. ! UI: been through one major revision http interface: been replaced render pipeline: rewritten query machinery: rewriting now storage backend: up next ! It's an aggressively 4-dimensional project.
  45. rewriting ui http interface render pipeline query machinery In case

    you haven't noticed, we're gradually rewriting the entire thing. ! UI: been through one major revision http interface: been replaced render pipeline: rewritten query machinery: rewriting now storage backend: up next ! It's an aggressively 4-dimensional project.
  46. rewriting ui http interface render pipeline query machinery storage backend

    In case you haven't noticed, we're gradually rewriting the entire thing. ! UI: been through one major revision http interface: been replaced render pipeline: rewritten query machinery: rewriting now storage backend: up next ! It's an aggressively 4-dimensional project.
  47. “You’re not thinking 4-dimensionally.” There's so much going on, I

    feel more like an air traffic controller than anything else. ! In fact, it was 4-dimensionality that led to the second of only 2 meetings we’ve ever had about DXR. [Story: landing rust support] ! ▼ ❑ That's another thing that makes the project fun: • ❑ You get to talk to your fellow contribs a lot to figure out what order you can do all this stuff in • ❑ all the while keeping it deployable and not ending up in merge hell. ! Nice things that give us freedom & help toward that…
  48. CD

  49. root

  50. paving the DB ▼!❑!Rebuilding the DBs all the time! !

    •! ❑!No migrations to worry about! ! ▼!❑!We change schemas all the time! ! ▼!❑!When we do, we increment a "format version" so the deploy waits.! ! •! ❑!An example of the overriding philosophy I try to take on the project: do the simplest thing that could possibly work.! ! •! ❑!More specifically, do the simplest thing that works and doesn't prevent us from making it better.! ! •! ❑!We were going to have this complicated system of build triggers, set off by a duo of commits and timers.! ! •! ❑!We still may someday, but, until then, YAGNI.!
  51. breaking things ▼ ❑ It can break once in awhile.

    • ❑ Our audience knows how to file bugs and is not shy. • ❑ We can deploy fixes in about 10 minutes. • ❑ That includes the time to run tests.
  52. you’re invited Laura & I would like to welcome everyone

    in this room to hack on DXR. One of the neat things about the project is the diversity of things to do. Whether you like UI design or mucking about with compilers, there’s something really important for you. Here’s what people are already working on. ! • ❑ Line-based searching ▼ ❑ SQLite → ES ▼ ❑ Common misconception that concurrency is bad with SQLite • ❑ Only for writes ▼ ❑ Will enable older-rev display • ❑ Major thing toward killing off MXR, which is our prime directive • ❑ Will enable dynamic rendering • ❑ Rust • ❑ More trees • ❑ incremental indexing. Much lower hardware requirements • ❑ result mixing ! I think it’s really important(…) to work on what really turns you on: better work.
  53. you’re invited line-scoped searching Laura & I would like to

    welcome everyone in this room to hack on DXR. One of the neat things about the project is the diversity of things to do. Whether you like UI design or mucking about with compilers, there’s something really important for you. Here’s what people are already working on. ! • ❑ Line-based searching ▼ ❑ SQLite → ES ▼ ❑ Common misconception that concurrency is bad with SQLite • ❑ Only for writes ▼ ❑ Will enable older-rev display • ❑ Major thing toward killing off MXR, which is our prime directive • ❑ Will enable dynamic rendering • ❑ Rust • ❑ More trees • ❑ incremental indexing. Much lower hardware requirements • ❑ result mixing ! I think it’s really important(…) to work on what really turns you on: better work.
  54. you’re invited line-scoped searching elasticsearch Laura & I would like

    to welcome everyone in this room to hack on DXR. One of the neat things about the project is the diversity of things to do. Whether you like UI design or mucking about with compilers, there’s something really important for you. Here’s what people are already working on. ! • ❑ Line-based searching ▼ ❑ SQLite → ES ▼ ❑ Common misconception that concurrency is bad with SQLite • ❑ Only for writes ▼ ❑ Will enable older-rev display • ❑ Major thing toward killing off MXR, which is our prime directive • ❑ Will enable dynamic rendering • ❑ Rust • ❑ More trees • ❑ incremental indexing. Much lower hardware requirements • ❑ result mixing ! I think it’s really important(…) to work on what really turns you on: better work.
  55. you’re invited line-scoped searching elasticsearch rust Laura & I would

    like to welcome everyone in this room to hack on DXR. One of the neat things about the project is the diversity of things to do. Whether you like UI design or mucking about with compilers, there’s something really important for you. Here’s what people are already working on. ! • ❑ Line-based searching ▼ ❑ SQLite → ES ▼ ❑ Common misconception that concurrency is bad with SQLite • ❑ Only for writes ▼ ❑ Will enable older-rev display • ❑ Major thing toward killing off MXR, which is our prime directive • ❑ Will enable dynamic rendering • ❑ Rust • ❑ More trees • ❑ incremental indexing. Much lower hardware requirements • ❑ result mixing ! I think it’s really important(…) to work on what really turns you on: better work.
  56. you’re invited line-scoped searching elasticsearch rust more trees Laura &

    I would like to welcome everyone in this room to hack on DXR. One of the neat things about the project is the diversity of things to do. Whether you like UI design or mucking about with compilers, there’s something really important for you. Here’s what people are already working on. ! • ❑ Line-based searching ▼ ❑ SQLite → ES ▼ ❑ Common misconception that concurrency is bad with SQLite • ❑ Only for writes ▼ ❑ Will enable older-rev display • ❑ Major thing toward killing off MXR, which is our prime directive • ❑ Will enable dynamic rendering • ❑ Rust • ❑ More trees • ❑ incremental indexing. Much lower hardware requirements • ❑ result mixing ! I think it’s really important(…) to work on what really turns you on: better work.
  57. you’re invited line-scoped searching elasticsearch rust more trees incremental indexing

    Laura & I would like to welcome everyone in this room to hack on DXR. One of the neat things about the project is the diversity of things to do. Whether you like UI design or mucking about with compilers, there’s something really important for you. Here’s what people are already working on. ! • ❑ Line-based searching ▼ ❑ SQLite → ES ▼ ❑ Common misconception that concurrency is bad with SQLite • ❑ Only for writes ▼ ❑ Will enable older-rev display • ❑ Major thing toward killing off MXR, which is our prime directive • ❑ Will enable dynamic rendering • ❑ Rust • ❑ More trees • ❑ incremental indexing. Much lower hardware requirements • ❑ result mixing ! I think it’s really important(…) to work on what really turns you on: better work.
  58. you’re invited line-scoped searching elasticsearch rust more trees incremental indexing

    result mixing Laura & I would like to welcome everyone in this room to hack on DXR. One of the neat things about the project is the diversity of things to do. Whether you like UI design or mucking about with compilers, there’s something really important for you. Here’s what people are already working on. ! • ❑ Line-based searching ▼ ❑ SQLite → ES ▼ ❑ Common misconception that concurrency is bad with SQLite • ❑ Only for writes ▼ ❑ Will enable older-rev display • ❑ Major thing toward killing off MXR, which is our prime directive • ❑ Will enable dynamic rendering • ❑ Rust • ❑ More trees • ❑ incremental indexing. Much lower hardware requirements • ❑ result mixing ! I think it’s really important(…) to work on what really turns you on: better work.
  59. you’re invited Here’s more stuff we could use help on:

    ! • ❑ Polishing off a few unfinished pieces of the UI refresh (CSS, template DRY) so we can be really proud of it. If you like little things or front-end things, this would be good for you. • ❑ Multi-language support in single codebase • ❑ Chasing reports of bad analysis (might span C++ and Python work). Great for a grand tour of the codebase. ▼ ❑ Little refactorings • ❑ Package DXR as a Python package. Almost there. • ❑ Getting the last of the boilerplate makefiles out of the test harness • ❑ Rewrite config system • ❑ Not on this list but implicit: The thing that interests you the most. That's the most important consideration. ! I hope I’ve expressed how much fun I find DXR to work on. Admittedly, I’m a little crazy, but I welcome you to join me in my craziness.
  60. you’re invited polishing UI Here’s more stuff we could use

    help on: ! • ❑ Polishing off a few unfinished pieces of the UI refresh (CSS, template DRY) so we can be really proud of it. If you like little things or front-end things, this would be good for you. • ❑ Multi-language support in single codebase • ❑ Chasing reports of bad analysis (might span C++ and Python work). Great for a grand tour of the codebase. ▼ ❑ Little refactorings • ❑ Package DXR as a Python package. Almost there. • ❑ Getting the last of the boilerplate makefiles out of the test harness • ❑ Rewrite config system • ❑ Not on this list but implicit: The thing that interests you the most. That's the most important consideration. ! I hope I’ve expressed how much fun I find DXR to work on. Admittedly, I’m a little crazy, but I welcome you to join me in my craziness.
  61. you’re invited polishing UI multi-language support Here’s more stuff we

    could use help on: ! • ❑ Polishing off a few unfinished pieces of the UI refresh (CSS, template DRY) so we can be really proud of it. If you like little things or front-end things, this would be good for you. • ❑ Multi-language support in single codebase • ❑ Chasing reports of bad analysis (might span C++ and Python work). Great for a grand tour of the codebase. ▼ ❑ Little refactorings • ❑ Package DXR as a Python package. Almost there. • ❑ Getting the last of the boilerplate makefiles out of the test harness • ❑ Rewrite config system • ❑ Not on this list but implicit: The thing that interests you the most. That's the most important consideration. ! I hope I’ve expressed how much fun I find DXR to work on. Admittedly, I’m a little crazy, but I welcome you to join me in my craziness.
  62. you’re invited polishing UI multi-language support bad analyses Here’s more

    stuff we could use help on: ! • ❑ Polishing off a few unfinished pieces of the UI refresh (CSS, template DRY) so we can be really proud of it. If you like little things or front-end things, this would be good for you. • ❑ Multi-language support in single codebase • ❑ Chasing reports of bad analysis (might span C++ and Python work). Great for a grand tour of the codebase. ▼ ❑ Little refactorings • ❑ Package DXR as a Python package. Almost there. • ❑ Getting the last of the boilerplate makefiles out of the test harness • ❑ Rewrite config system • ❑ Not on this list but implicit: The thing that interests you the most. That's the most important consideration. ! I hope I’ve expressed how much fun I find DXR to work on. Admittedly, I’m a little crazy, but I welcome you to join me in my craziness.
  63. you’re invited polishing UI multi-language support bad analyses packaging Here’s

    more stuff we could use help on: ! • ❑ Polishing off a few unfinished pieces of the UI refresh (CSS, template DRY) so we can be really proud of it. If you like little things or front-end things, this would be good for you. • ❑ Multi-language support in single codebase • ❑ Chasing reports of bad analysis (might span C++ and Python work). Great for a grand tour of the codebase. ▼ ❑ Little refactorings • ❑ Package DXR as a Python package. Almost there. • ❑ Getting the last of the boilerplate makefiles out of the test harness • ❑ Rewrite config system • ❑ Not on this list but implicit: The thing that interests you the most. That's the most important consideration. ! I hope I’ve expressed how much fun I find DXR to work on. Admittedly, I’m a little crazy, but I welcome you to join me in my craziness.
  64. you’re invited polishing UI multi-language support bad analyses packaging test

    harnesses Here’s more stuff we could use help on: ! • ❑ Polishing off a few unfinished pieces of the UI refresh (CSS, template DRY) so we can be really proud of it. If you like little things or front-end things, this would be good for you. • ❑ Multi-language support in single codebase • ❑ Chasing reports of bad analysis (might span C++ and Python work). Great for a grand tour of the codebase. ▼ ❑ Little refactorings • ❑ Package DXR as a Python package. Almost there. • ❑ Getting the last of the boilerplate makefiles out of the test harness • ❑ Rewrite config system • ❑ Not on this list but implicit: The thing that interests you the most. That's the most important consideration. ! I hope I’ve expressed how much fun I find DXR to work on. Admittedly, I’m a little crazy, but I welcome you to join me in my craziness.
  65. you’re invited polishing UI multi-language support bad analyses packaging test

    harnesses config system Here’s more stuff we could use help on: ! • ❑ Polishing off a few unfinished pieces of the UI refresh (CSS, template DRY) so we can be really proud of it. If you like little things or front-end things, this would be good for you. • ❑ Multi-language support in single codebase • ❑ Chasing reports of bad analysis (might span C++ and Python work). Great for a grand tour of the codebase. ▼ ❑ Little refactorings • ❑ Package DXR as a Python package. Almost there. • ❑ Getting the last of the boilerplate makefiles out of the test harness • ❑ Rewrite config system • ❑ Not on this list but implicit: The thing that interests you the most. That's the most important consideration. ! I hope I’ve expressed how much fun I find DXR to work on. Admittedly, I’m a little crazy, but I welcome you to join me in my craziness.
  66. where to start wiki.mozilla.org/DXR #static on IRC ▼ ❑ How

    to get started • ❑ wiki.mozilla.org/DXR • ❑ #static • ❑ Open space: Wednesday PM