$30 off During Our Annual Pro Sale. View Details »

OSCON - React Architecture

OSCON - React Architecture

vjeux

July 28, 2014
Tweet

More Decks by vjeux

Other Decks in Programming

Transcript

  1. React Architecture
    CHRISTOPHER “VJEUX” CHEDEAU
    FACEBOOK OPEN SOURCE

    View Slide

  2. A JavaScript Library for Building User Interfaces
    React

    View Slide

  3. String Concatenation — 2004
    $str = '';
    foreach ($talks as $talk) {
    $str += '' . $talk->name . '';
    }
    $str += '';
    Way back in time, in the early days of Facebook when Mark Zuckerberg was still in his dorm room,

    the way you would build websites using PHP is with string concatenation. It turns out that it’s a

    very good way to build website, whether you are back-end, front-end or even have no programming

    experience at all, you can build a big website.

    View Slide

  4. $str = '';
    foreach ($talks as $talk) {
    $str += '' . $talk->name . '';
    }
    $str += '';
    XSS Injection!
    String Concatenation — 2004
    The only issue with that way of programming is that it’s insecure. If you use this exact code,

    an attacker can execute arbitrary JavaScript. This is especially bad for Facebook since this code

    is going to be executed in the user context. So you can basically take over the user account

    View Slide

  5. String Concatenation — 2004
    Insecure by default
    If you don’t do anything, you are vulnerable. Worse, for most inputs, it’s actually

    going to render fine for the developer working on the feature. So there is very few

    incentive for him/her to add proper escaping

    View Slide

  6. String Concatenation — 2004
    Insecure by default
    One mistake and there’s a vulnerability
    And, the property that’s really bad is the fact that you need to have every single call site

    in your millions of lines of codes written by hundreds of engineers to be safe.

    Made one mistake? You are subject to account takeover

    View Slide

  7. String Concatenation — 2004
    Insecure by default
    One mistake and there’s a vulnerability
    You can’t over escape
    One idea to escape this impossible situation is to just escape everything no matter what.

    Unfortunately it doesn’t quite work, if you double escape a string it’s going to display

    control characters. If you accidentally escape markup, then it’s going to show html to the user!

    View Slide

  8. XHP — 2010
    $content = ;
    foreach ($talks as $talk) {
    $content->appendChild({$talk->name});
    }
    The solution we came up with at Facebook is to extend the syntax of PHP to allow the

    developer to write markup. In this case is not in a string anymore.

    View Slide

  9. XHP — 2010
    $content = ;
    foreach ($talks as $talk) {
    $content->appendChild({$talk->name});
    }
    Markup
    Content
    Now, everything that’s markup is written using a different syntax so we know not to escape

    it when generating the HTML. Everything else is considered an untrusted string

    and automatically escaped. We get to keep the ease of development while being secure

    View Slide

  10. Custom Tags
    $content = ;
    foreach ($talks as $talk) {
    $content->appendChild();
    }
    Once XHP was introduced, it wasn’t long until people realized that they could create custom
    tags. It turns out that they let you build very big applications easily by composing a lot of those
    tags. This is one implementation of the concepts of Semantic Web and Web Components

    View Slide

  11. We started doing more and more in JavaScript in order to avoid the latency between client and
    server. We’ve tried many techniques like having a cross-browser DOM library and a data binding
    approach but none of them really worked well for us.

    !
    Given this state of the world, Jordan Walke, a front-end engineer, pitched his manager the idea of
    porting XHP to JavaScript. He somehow managed to get to work on it for 6 months in order to
    prove the concept.

    !
    The first time I heard about this project, I was like, there’s absolutely no way that it’s going to
    work, but in the rare chance it does, this is going to be huge. When I finally got to play with it I
    immediately started evangelizing it :)

    View Slide

  12. JSX — 2013
    var content =

    {talks.map(talk => )}
    ;
    ES6 Arrow Function
    The first task was to write an extension of JavaScript that supports this weird XML syntax. It
    turns out that at Facebook we’ve been using JavaScript transforms for a while. In this example, I’m
    using the alternative way to write functions of ES6, the next JavaScript standard.

    !
    It took like a week to implement JSX and is not really the most important part of React

    View Slide

  13. PHP
    Anything changes? Re-render everything.
    Can we make it fast enough?
    What was way more challenging is to reproduce the update mechanism of PHP. It’s really simple,
    whenever anything changes, you go to a new page and get a full new page. For a developer point
    of view this makes writing apps extremely easy as you don’t have to worry about mutations and
    making sure everything is in sync when something changes in your UI

    !
    However, the question that everybody asks … It’s going to be super slow

    View Slide

  14. React
    Not only is it fast enough...
    It’s often faster than previous implementations!
    After 2 years of production usage, I can confidently say that it’s surprisingly faster than most of
    the code that we replaced it with. In the rest of this talk I’m going to explain the big optimizations
    that makes this possible

    View Slide

  15. “You need to be right before being good”

    — Akim Demaille
    My teacher at school used to say that you need to be right before being good. What he meant is
    that if you are trying to build something performant, you have a much higher chance to succeed if
    you first build a naive but working implementation and iterate on the performance rather than
    trying to build it the best way from the start

    View Slide

  16. Render t=1 Render t=2
    Naive
    So, let’s try to apply his advice. We’re first going to implement the most naive version. Whenever
    anything changes, we’re going to build an entire new DOM and replace the old one

    View Slide

  17. DOM is Stateful
    Input focus
    and selection
    Scroll
    position
    Iframe
    This is kind of working but there are a lot of edge cases. If you blow up the DOM you’re going to
    lose the currently focused element and cursor, same for the text selection and scroll position.
    What this really means is that DOM nodes actually contain state.

    !
    The first attempt was to try to restore those state, we would remember the focused input and
    focus the new element, same for cursor and scroll position. Unfortunately, this isn’t enough.

    View Slide

  18. DOM is Stateful
    Input focus
    and selection
    Scroll
    position
    Iframe
    If you are using a mac and scrolling, you’re going to have inertia. Turns out that there is no
    JavaScript API to read or write scrolling inertia. For iframe it’s even worse, if it’s from another
    domain, the security policy actually disallow you to even look at what’s inside, so you cannot
    restore it. Not only the DOM is stateful, but it contains hidden state!

    View Slide

  19. Reuse Nodes

    To get around this, the idea is instead of blowing up the DOM and recreating a new one, we’re
    going to reuse the DOM nodes that stayed the same between two renders

    View Slide

  20. Reuse Nodes

    Instead of removing the previous DOM tree and replacing it with the new one, we’re going to
    match nodes and if they didn’t change, discard the new one and keep the old one which is
    currently rendered on screen

    View Slide

  21. Reuse Nodes
    As long as we can match nodes, we repeat the process. But at some point we’re going to see a
    new node that wasn’t there before. In this case, we’re going to move the new one to the old (and
    currently rendered on screen) dom tree

    View Slide

  22. Reuse Nodes
    As long as we can match nodes, we repeat the process. But at some point we’re going to see a
    new node that wasn’t there before. In this case, we’re going to move the new one to the old (and
    currently rendered on screen) dom tree

    View Slide

  23. “I tend to think of React as

    Version Control for the DOM”
    — AdonisSMU
    We now have a general idea of how we want React to work but don’t have a specific plan.

    This is the moment when I pull out the Analogy card out of my hat

    View Slide

  24. new-awesome-version.zip
    check-this-out.zip
    all-bugs-are-fixed.zip
    release.zip
    Old School Version Control
    Back in the dark age of programming, if you wanted someone else to try out your code, you
    would create a zip and send him. If you changed anything you would send a new zip file.

    View Slide

  25. new-awesome-version.zip
    check-this-out.zip
    all-bugs-are-fixed.zip
    release.zip
    Old School Version Control
    Version control came along and the way it works is that it takes those snapshots of the code and
    generates a list of mutations like “remove those 5 lines”, “add 3 lines”, “replace this word”… using
    a diff algorithm

    View Slide

  26. new-awesome-version.zip
    check-this-out.zip
    all-bugs-are-fixed.zip
    release.zip
    Old School Version Control
    This is exactly what React does but using the DOM as input instead of text files

    View Slide

  27. 10,0003 = 1000 . 109

    ≈ 1000 seconds

    at 1 GHz
    Optimal Diff — O(n3)
    So, as any good engineer, we looked at diff algorithms for trees and found that the optimal
    solution was in O(n
    !
    Let say we’ve got a page with 10,000 DOM nodes. It’s big but not unthinkable. To get an order of
    magnitude we’re going to assume that we can do one operation in one CPU cycle (not going to
    happen) and have a 1GHz machine

    View Slide

  28. 10,0003 = 1000 . 109

    ≈ 1000 seconds

    at 1 GHz
    Optimal Diff — O(n3)
    ≈ 17 minutes
    It would take 17 minutes to do the diff! We can’t use that …

    View Slide

  29. O(n3)
    But, we’re not afraid, we know that we’re still in the phase where we need to be right. So let’s
    study the way it works.

    View Slide

  30. O(n3)
    (1)For every node in the new tree, (2) we’re going to match it again every node of the old tree.
    (3) the matching operation operates on the entire subtree. Here, we get our three nested loops.

    View Slide

  31. O(n3)
    If you think about it, in a web application, we extremely rarely have to move an element anywhere
    else in the page. The only example that comes to mind is drag and drop but it’s far from common.

    View Slide

  32. O(n3)
    The only time where we’re moving elements is between children. You very often add/remove/
    move elements in a list.

    View Slide

  33. O(m²)
    Children by Children
    So, we can instead do the diff children by children. We start with the root and match it against the
    other root.

    View Slide

  34. Children by Children
    O(m²)
    O(m²)
    O(m²) O(m²)
    And do that for all the matching children. We went from having a big scary O(n
    many O(m

    View Slide

  35. We tried to be too good too fast :(
    It turns out that we cannot use Levenstein directly!

    View Slide

  36. Identity





    In order to understand why, the best way is via a small example. Let’s get into React shoes for a
    minute. We see that the first render had three inputs and the next only has two. The question is
    how do you match them?

    View Slide

  37. Identity





    The intuitive reaction is going to match the first two together and delete the third one

    View Slide

  38. Identity





    But, we can also delete the first one and match the last two together

    View Slide

  39. Identity





    One less obvious solution, but still totally valid, is to remove all the previous elements and create
    two new ones. So at this point, we don’t have enough information to do that matching properly
    as we want to be able to handle all the above use cases

    View Slide

  40. Identity

    One idea is to not only use the tag name but also attributes. If they are equal before and after,
    then we do the matching

    View Slide

  41. Identity

    Turns out that this isn’t working for the value attribute. If you are trying to type “oscon”, then the
    two are going to be different
    input focus :(

    View Slide

  42. Identity





    Another more promising attribute is the id one. In a form context, it usually contains the id of the
    model that the input is corresponding to

    View Slide

  43. Identity





    Now, we’re able to match the two lists successfully! (Did you notice that it was yet another
    matching than the three examples I shown before?)

    View Slide

  44. Identity





    But, if you are submitting the form via AJAX instead of letting the browser do it, you’re unlikely to
    put that id attribute in the DOM.

    !
    React introduces the key attribute. Its only job is to help the diff algorithm do the matching

    View Slide

  45. Children by Children
    O(m)
    O(m)
    O(m) O(m)
    It turns out that we can implement the matching using keys much faster than O(n
    in O(n) via a hash table

    View Slide

  46. Children by Children
    O(m)
    O(m)
    O(m) O(m)
    O(n)
    +
    +
    So, if we sum all those partial O(m), we get a total complexity of O(n). It’s not possible to have a
    better complexity :)

    View Slide

  47. Let the goodness begin!
    At this point, we’ve got a solution that’s correct, we can now start implementing all the cool
    optimizations to make it blazing fast :)

    View Slide

  48. align, onwaiting, onvolumechange, ontimeupdate, onsuspend, onsubmit, onstalled, onshow, onselect,
    onseeking, onseeked, onscroll, onresize, onreset, onratechange, onprogress, onplaying, onplay,
    onpause, onmousewheel, onmouseup, onmouseover, onmouseout, onmousemove, onmouseleave, onmouseenter,
    onmousedown, onloadstart, onloadedmetadata, onloadeddata, onload, onkeyup, onkeypress, onkeydown,
    oninvalid, oninput, onfocus, onerror, onended, onemptied, ondurationchange, ondrop, ondragstart,
    ondragover, ondragleave, ondragenter, ondragend, ondrag, ondblclick, oncuechange, oncontextmenu,
    onclose, onclick, onchange, oncanplaythrough, oncanplay, oncancel, onblur, onabort, spellcheck,
    isContentEditable, contentEditable, outerText, innerText, accessKey, hidden, webkitdropzone,
    draggable, tabIndex, dir, translate, lang, title, childElementCount, lastElementChild,
    firstElementChild, children, nextElementSibling, previousElementSibling, onwheel,
    onwebkitfullscreenerror, onwebkitfullscreenchange, onselectstart, onsearch, onpaste, oncut, oncopy,
    onbeforepaste, onbeforecut, onbeforecopy, webkitShadowRoot, dataset, classList, className,
    outerHTML, innerHTML, scrollHeight, scrollWidth, scrollTop, scrollLeft, clientHeight, clientWidth,
    clientTop, clientLeft, offsetParent, offsetHeight, offsetWidth, offsetTop, offsetLeft, localName,
    prefix, namespaceURI, id, style, attributes, tagName, parentElement, textContent, baseURI,
    Rafał Pocztarski
    document.createElement(‘div’)
    If you’ve done any optimization of JS apps, you probably heard that the DOM is slow. Rafal on
    Stack Exchange made a very good illustration. If you enumerate all the attributes of an empty div,
    you are going to see a —lot— of them!

    View Slide

  49. The reason why there are so many attributes is that a DOM node is used for a lot of steps in the
    browser rendering pipeline.


    The browser first looks at the CSS rules and find the ones that matches that node, and stores a
    variety of metadata in the process to make it faster. For example it maintains a map of id to dom
    nodes.

    !
    Then, it takes those styles and compute the layout, which contains a position and location in the
    screen. Again, lots of metadata. It will avoid recomputing layout as much as possible and caches
    previously computed values.

    !
    Then, at some point you actually
    a buffer either on the CPU or GPU.

    !
    All those steps require intermediate representations and use memory and cpu. The browser are
    doing a very good job at optimizing this entire pipeline

    View Slide

  50. Virtual DOM t=1 Virtual DOM t=2
    Virtual DOM
    +
    But, if you think about what’s happening in React, we only use those DOM nodes in the diff
    algorithm. So we can use a much lighter JavaScript object that just contains the tag name and
    attributes. We call it the virtual DOM.

    View Slide

  51. Virtual DOM t=1 Virtual DOM t=2 Mutations
    Virtual DOM
    + =
    The diff algorithm generates a list of DOM mutations, the same way version controls output text
    mutations

    View Slide

  52. Real DOM
    Virtual DOM
    That we can apply to the real DOM. Then we let the browser do all its optimized pipeline. We
    reduced the number of expensive, but needed, DOM mutations to the bare minimum

    View Slide

  53. —— Open Source ——
    — —
    The diff algorithm and virtual DOM are the two optimizations that we had when we open
    sourced React. Let’s take a break and see how the project went since then

    View Slide

  54. GitHub ˒
    0
    2000
    4000
    6000
    8000
    6/11/2013 7/12/2013 8/12/2013 9/16/2013 10/26/2013 11/26/2013 12/27/2013 1/27/2014 2/27/2014 3/30/2014 4/30/2014 5/31/2014 7/1/2014
    React got extremely popular in just a year. If it continues to grow at this rate, it’s going to be the
    biggest Facebook open source project in a couple of months!

    View Slide

  55. Not only are people adding little stars on GitHub but they also use it in production. For example,
    the New York Times is using React to spice up their big news coverage like the Festival de
    Cannes and the world cup

    View Slide

  56. GitHub just announced that they migrated their text editor, Atom, to React in order to improve
    performance. It shows that React is not only viable to build websites but real apps as well

    View Slide

  57. Probably the most unexpected, Sberbank is the largest bank in Russia and is moving all the online
    consumer banking operations to React!

    View Slide

  58. Last but not least, Khan Academy has been the first big adopter of React and they converted all
    their student exercices and admin panels

    View Slide

  59. More commits from open
    source than employees
    Not only people are using React but they are contributing back! And it’s not only typos in the
    docs. The two next optimizations have been brought to life by the community

    View Slide

  60. Opera lists Reflow and Repaint as one of the
    three main contributors to sluggish JavaScript
    We’ve talked about the DOM being slow, the second source of slowness are reflows and repaints.
    Those scary words just mean that when you modify the DOM, then the browser has to update
    the position of elements and update the actual pixels. Th
    !
    When you try to read some attributes from the DOM, the browser, in order to give you a
    consistent view, has to trigger those expensive operations. If you are doing a “read, write, read,
    write…” sequence of operations, you’re going to trigger those expensive reflow and repaint
    without knowing.

    !
    In order to mitigate that, the idea is to reorder “read, write, read, write…” sequence of
    operations into “read, read, read…” then “write, write, write…”.
    concatenation was insecure by default, writing JavaScript applications in the conventional way is
    very prone to trigger reflows and repaints

    View Slide

  61. Batching
    setState Dirty
    Ben Alpert, from Khan Academy, implemented a fix for this problem by batching operations.

    View Slide

  62. Batching
    setState Dirty
    In order to tell React that something changed, you call setState on an element. React will just
    mark the element as dirty but will not compute anything right away. If you call setState on the
    same node multiple times, it will be just as performant

    View Slide

  63. Dirty
    Once the initial event has been fully propagated …

    View Slide

  64. Re-Rendered
    We can start re-rendering elements from top to bottom. This is very important as it ensures that
    we only render elements once

    View Slide

  65. Re-Rendered

    View Slide

  66. Re-Rendered
    Now that all the elements have been re-rendered to the Virtual DOM, we feed that to the diff
    algorithm which outputs DOM mutations. Nowhere in this process did we have to read from the
    DOM. React is (outside of optimizations i’m not going to cover in this talk) write-only

    View Slide

  67. Subtree Re-rendering
    At the beginning I said that the mental model is “re-rendering everything when anything changes”.
    This is not exactly correct in practice. We only re-render the subtree from elements that have
    been flagged by setState.

    View Slide

  68. Subtree Re-rendering
    When you start integrating React into your app, the usual pattern is to have state very low in the
    tree and therefore setState are pretty cheap as they only re-render a small part of the UI

    View Slide

  69. Pruning
    Dirty
    When more of the application got converted, the state tends to go up, which means that you are
    re-rendering a larger portion of the app when anything changes.

    View Slide

  70. Pruning
    Dirty
    bool shouldComponentUpdate(nextProps, nextState)
    To mitigate this effect performance-wise, you can implement “shouldComponentUpdate” which
    with both previous and next state/props can say: “You know what, nothing changed, let’s just skip
    re-rendering this sub-tree”

    View Slide

  71. Pruning
    Re-Rendered
    bool shouldComponentUpdate(nextProps, nextState)
    This lets you prune big parts of the tree and regain performance

    View Slide

  72. Pruning
    Re-Rendered
    bool shouldComponentUpdate(nextProps, nextState)

    View Slide

  73. Pruning
    Re-Rendered
    bool shouldComponentUpdate(nextProps, nextState)

    View Slide

  74. shouldComponentUpdate?
    We introduce shouldComponentUpdate with the open source release but we did not quite
    know how to actually implement it correctly.

    !
    The problem is that in JavaScript you often use objects to hold state and mutate it directly. This
    means that the previous and next version of the state is the same reference to the object. So
    when you try to compare the previous version with the next, it’s going to say yes, even though
    something changed.

    View Slide

  75. shouldComponentUpdate?
    Om: Immutable data structure

    David Nolen, from the New York Times, figured out a good solution. In ClojureScript all most
    values are immutable, meaning that when you update one, you get a new object and the old one
    is left untouched. This works very well with shouldComponentUpdate.

    !
    He wrote a library on-top of React in ClojureScript called Om which uses immutable data
    structures in order to implement shouldComponentUpdate by default

    View Slide

  76. shouldComponentUpdate?
    •Perf.printWasted()
    Unfortunately, using immutable data structures require a big mental leap that everyone is not yet
    ready to take. So for now and the foreseeable future React has to work without them and
    therefore cannot implement shouldComponentUpdate by default.

    !
    Instead, we just released a performance tool. You play around with your application for a while
    and every time a component is re-rendered, if the diff doesn’t output any DOM mutation, then it
    remember the time it took to render. At the end, you get a nice table that tells you the
    components that would benefit most from shouldComponentUpdate!

    !
    This way, you can put it on a few key places and reap most of the perf wins

    View Slide

  77. In this talk we covered four optimizations that React is doing: diff algorithm, virtual DOM,
    batching and pruning. I hope that it shed some light on the reasons why they exist and how they
    work.

    !
    React is used to build our desktop website, mobile website and the Instagram website. It is so
    successful at Facebook that basically all the new front-end products are written using React. This
    is not a project that we just use in internal tools or small features, this is used by the main page
    of Facebook used by hundreds of millions of people every month!
    Conclusion

    View Slide

  78. Since we are at the Open Source conference, I would like to end by reflecting a bit on it.

    We open sourced XHP in 2010 but we’ve done a very bad job at it, we just wrote a single blog
    post in 4 years. We didn’t go to conferences to explain it, write documentation … And yet, inside
    of Facebook we absolutely love it and use it everywhere.

    !
    When we open sourced React last year, it was much harder because we had to explain at the
    same time the benefits of XHP and all the crazy optimizations we had to do in order to make it
    work on the client.

    !
    We talk a lot about the benefits of open sourcing. This was a very good reminder that not open
    sourcing your core technologies can make it harder to open source other projects down the line
    Conclusion

    View Slide

  79. React Architecture
    CHRISTOPHER “VJEUX” CHEDEAU
    FACEBOOK OPEN SOURCE

    View Slide