OSCON - React Architecture

Slide 1

Slide 1 text

React Architecture CHRISTOPHER “VJEUX” CHEDEAU FACEBOOK OPEN SOURCE

Slide 2

Slide 2 text

A JavaScript Library for Building User Interfaces React

Slide 3

Slide 3 text

String Concatenation — 2004 $str = '

' . $talk->name . '

'; Way back in time, in the early days of Facebook when Mark Zuckerberg was still in his dorm room, the way you would build websites using PHP is with string concatenation. It turns out that it’s a very good way to build website, whether you are back-end, front-end or even have no programming experience at all, you can build a big website.

Slide 4

Slide 4 text

$str = '

' . $talk->name . '

'; XSS Injection! String Concatenation — 2004 The only issue with that way of programming is that it’s insecure. If you use this exact code, an attacker can execute arbitrary JavaScript. This is especially bad for Facebook since this code is going to be executed in the user context. So you can basically take over the user account

Slide 5

Slide 5 text

String Concatenation — 2004 Insecure by default If you don’t do anything, you are vulnerable. Worse, for most inputs, it’s actually going to render ﬁne for the developer working on the feature. So there is very few incentive for him/her to add proper escaping

Slide 6

Slide 6 text

String Concatenation — 2004 Insecure by default One mistake and there’s a vulnerability And, the property that’s really bad is the fact that you need to have every single call site in your millions of lines of codes written by hundreds of engineers to be safe. Made one mistake? You are subject to account takeover

Slide 7

Slide 7 text

String Concatenation — 2004 Insecure by default One mistake and there’s a vulnerability You can’t over escape One idea to escape this impossible situation is to just escape everything no matter what. Unfortunately it doesn’t quite work, if you double escape a string it’s going to display control characters. If you accidentally escape markup, then it’s going to show html to the user!

Slide 8

Slide 8 text

XHP — 2010 $content =

{$talk->name}

is not in a string anymore.

Slide 9

Slide 9 text

XHP — 2010 $content =

{$talk->name}

Slide 10

Slide 10 text

Custom Tags $content = ; foreach ($talks as $talk) { $content->appendChild(); } Once XHP was introduced, it wasn’t long until people realized that they could create custom tags. It turns out that they let you build very big applications easily by composing a lot of those tags. This is one implementation of the concepts of Semantic Web and Web Components

Slide 11

Slide 11 text

We started doing more and more in JavaScript in order to avoid the latency between client and server. We’ve tried many techniques like having a cross-browser DOM library and a data binding approach but none of them really worked well for us. ! Given this state of the world, Jordan Walke, a front-end engineer, pitched his manager the idea of porting XHP to JavaScript. He somehow managed to get to work on it for 6 months in order to prove the concept. ! The ﬁrst time I heard about this project, I was like, there’s absolutely no way that it’s going to work, but in the rare chance it does, this is going to be huge. When I ﬁnally got to play with it I immediately started evangelizing it :)

Slide 12

Slide 12 text

JSX — 2013 var content = {talks.map(talk => )} ; ES6 Arrow Function The ﬁrst task was to write an extension of JavaScript that supports this weird XML syntax. It turns out that at Facebook we’ve been using JavaScript transforms for a while. In this example, I’m using the alternative way to write functions of ES6, the next JavaScript standard. ! It took like a week to implement JSX and is not really the most important part of React

Slide 13

Slide 13 text

PHP Anything changes? Re-render everything. Can we make it fast enough? What was way more challenging is to reproduce the update mechanism of PHP. It’s really simple, whenever anything changes, you go to a new page and get a full new page. For a developer point of view this makes writing apps extremely easy as you don’t have to worry about mutations and making sure everything is in sync when something changes in your UI ! However, the question that everybody asks … It’s going to be super slow

Slide 14

Slide 14 text

React Not only is it fast enough... It’s often faster than previous implementations! After 2 years of production usage, I can conﬁdently say that it’s surprisingly faster than most of the code that we replaced it with. In the rest of this talk I’m going to explain the big optimizations that makes this possible

Slide 15

Slide 15 text

“You need to be right before being good”  — Akim Demaille My teacher at school used to say that you need to be right before being good. What he meant is that if you are trying to build something performant, you have a much higher chance to succeed if you ﬁrst build a naive but working implementation and iterate on the performance rather than trying to build it the best way from the start

Slide 16

Slide 16 text

Render t=1 Render t=2 Naive So, let’s try to apply his advice. We’re ﬁrst going to implement the most naive version. Whenever anything changes, we’re going to build an entire new DOM and replace the old one

Slide 17

Slide 17 text

DOM is Stateful Input focus and selection Scroll position Iframe This is kind of working but there are a lot of edge cases. If you blow up the DOM you’re going to lose the currently focused element and cursor, same for the text selection and scroll position. What this really means is that DOM nodes actually contain state. ! The ﬁrst attempt was to try to restore those state, we would remember the focused input and focus the new element, same for cursor and scroll position. Unfortunately, this isn’t enough.

Slide 18

Slide 18 text

DOM is Stateful Input focus and selection Scroll position Iframe If you are using a mac and scrolling, you’re going to have inertia. Turns out that there is no JavaScript API to read or write scrolling inertia. For iframe it’s even worse, if it’s from another domain, the security policy actually disallow you to even look at what’s inside, so you cannot restore it. Not only the DOM is stateful, but it contains hidden state!

Slide 19

Slide 19 text

Reuse Nodes

To get around this, the idea is instead of blowing up the DOM and recreating a new one, we’re going to reuse the DOM nodes that stayed the same between two renders

Slide 20

Slide 20 text

Reuse Nodes

Instead of removing the previous DOM tree and replacing it with the new one, we’re going to match nodes and if they didn’t change, discard the new one and keep the old one which is currently rendered on screen

Slide 21

Slide 21 text

Reuse Nodes As long as we can match nodes, we repeat the process. But at some point we’re going to see a new node that wasn’t there before. In this case, we’re going to move the new one to the old (and currently rendered on screen) dom tree

Slide 22

Slide 22 text

Slide 23

Slide 23 text

“I tend to think of React as  Version Control for the DOM” — AdonisSMU We now have a general idea of how we want React to work but don’t have a speciﬁc plan. This is the moment when I pull out the Analogy card out of my hat

Slide 24

Slide 24 text

new-awesome-version.zip check-this-out.zip all-bugs-are-fixed.zip release.zip Old School Version Control Back in the dark age of programming, if you wanted someone else to try out your code, you would create a zip and send him. If you changed anything you would send a new zip ﬁle.

Slide 25

Slide 25 text

new-awesome-version.zip check-this-out.zip all-bugs-are-fixed.zip release.zip Old School Version Control Version control came along and the way it works is that it takes those snapshots of the code and generates a list of mutations like “remove those 5 lines”, “add 3 lines”, “replace this word”… using a diff algorithm

Slide 26

Slide 26 text

new-awesome-version.zip check-this-out.zip all-bugs-are-fixed.zip release.zip Old School Version Control This is exactly what React does but using the DOM as input instead of text ﬁles

Slide 27

Slide 27 text

10,0003 = 1000 . 109  ≈ 1000 seconds  at 1 GHz Optimal Diff — O(n3) So, as any good engineer, we looked at diff algorithms for trees and found that the optimal solution was in O(n ! Let say we’ve got a page with 10,000 DOM nodes. It’s big but not unthinkable. To get an order of magnitude we’re going to assume that we can do one operation in one CPU cycle (not going to happen) and have a 1GHz machine

Slide 28

Slide 28 text

10,0003 = 1000 . 109  ≈ 1000 seconds  at 1 GHz Optimal Diff — O(n3) ≈ 17 minutes It would take 17 minutes to do the diff! We can’t use that …

Slide 29

Slide 29 text

O(n3) But, we’re not afraid, we know that we’re still in the phase where we need to be right. So let’s study the way it works.

Slide 30

Slide 30 text

O(n3) (1)For every node in the new tree, (2) we’re going to match it again every node of the old tree. (3) the matching operation operates on the entire subtree. Here, we get our three nested loops.

Slide 31

Slide 31 text

O(n3) If you think about it, in a web application, we extremely rarely have to move an element anywhere else in the page. The only example that comes to mind is drag and drop but it’s far from common.

Slide 32

Slide 32 text

O(n3) The only time where we’re moving elements is between children. You very often add/remove/ move elements in a list.

Slide 33

Slide 33 text

O(m²) Children by Children So, we can instead do the diff children by children. We start with the root and match it against the other root.

Slide 34

Slide 34 text

Children by Children O(m²) O(m²) O(m²) O(m²) And do that for all the matching children. We went from having a big scary O(n many O(m

Slide 35

Slide 35 text

We tried to be too good too fast :( It turns out that we cannot use Levenstein directly!

Slide 36

Slide 36 text

Identity In order to understand why, the best way is via a small example. Let’s get into React shoes for a minute. We see that the ﬁrst render had three inputs and the next only has two. The question is how do you match them?

Slide 37

Slide 37 text

Identity The intuitive reaction is going to match the ﬁrst two together and delete the third one

Slide 38

Slide 38 text

Identity But, we can also delete the ﬁrst one and match the last two together

Slide 39

Slide 39 text

Identity One less obvious solution, but still totally valid, is to remove all the previous elements and create two new ones. So at this point, we don’t have enough information to do that matching properly as we want to be able to handle all the above use cases

Slide 40

Slide 40 text

Identity One idea is to not only use the tag name but also attributes. If they are equal before and after, then we do the matching

Slide 41

Slide 41 text

Identity Turns out that this isn’t working for the value attribute. If you are trying to type “oscon”, then the two are going to be different input focus :(

Slide 42

Slide 42 text

Identity Another more promising attribute is the id one. In a form context, it usually contains the id of the model that the input is corresponding to

Slide 43

Slide 43 text

Identity Now, we’re able to match the two lists successfully! (Did you notice that it was yet another matching than the three examples I shown before?)

Slide 44

Slide 44 text

Identity But, if you are submitting the form via AJAX instead of letting the browser do it, you’re unlikely to put that id attribute in the DOM. ! React introduces the key attribute. Its only job is to help the diff algorithm do the matching

Slide 45

Slide 45 text

Children by Children O(m) O(m) O(m) O(m) It turns out that we can implement the matching using keys much faster than O(n in O(n) via a hash table

Slide 46

Slide 46 text

Children by Children O(m) O(m) O(m) O(m) O(n) + + So, if we sum all those partial O(m), we get a total complexity of O(n). It’s not possible to have a better complexity :)

Slide 47

Slide 47 text

Let the goodness begin! At this point, we’ve got a solution that’s correct, we can now start implementing all the cool optimizations to make it blazing fast :)

Slide 48

Slide 48 text

align, onwaiting, onvolumechange, ontimeupdate, onsuspend, onsubmit, onstalled, onshow, onselect, onseeking, onseeked, onscroll, onresize, onreset, onratechange, onprogress, onplaying, onplay, onpause, onmousewheel, onmouseup, onmouseover, onmouseout, onmousemove, onmouseleave, onmouseenter, onmousedown, onloadstart, onloadedmetadata, onloadeddata, onload, onkeyup, onkeypress, onkeydown, oninvalid, oninput, onfocus, onerror, onended, onemptied, ondurationchange, ondrop, ondragstart, ondragover, ondragleave, ondragenter, ondragend, ondrag, ondblclick, oncuechange, oncontextmenu, onclose, onclick, onchange, oncanplaythrough, oncanplay, oncancel, onblur, onabort, spellcheck, isContentEditable, contentEditable, outerText, innerText, accessKey, hidden, webkitdropzone, draggable, tabIndex, dir, translate, lang, title, childElementCount, lastElementChild, firstElementChild, children, nextElementSibling, previousElementSibling, onwheel, onwebkitfullscreenerror, onwebkitfullscreenchange, onselectstart, onsearch, onpaste, oncut, oncopy, onbeforepaste, onbeforecut, onbeforecopy, webkitShadowRoot, dataset, classList, className, outerHTML, innerHTML, scrollHeight, scrollWidth, scrollTop, scrollLeft, clientHeight, clientWidth, clientTop, clientLeft, offsetParent, offsetHeight, offsetWidth, offsetTop, offsetLeft, localName, prefix, namespaceURI, id, style, attributes, tagName, parentElement, textContent, baseURI, Rafał Pocztarski document.createElement(‘div’) If you’ve done any optimization of JS apps, you probably heard that the DOM is slow. Rafal on Stack Exchange made a very good illustration. If you enumerate all the attributes of an empty div, you are going to see a —lot— of them!

Slide 49

Slide 49 text

The reason why there are so many attributes is that a DOM node is used for a lot of steps in the browser rendering pipeline.   The browser ﬁrst looks at the CSS rules and ﬁnd the ones that matches that node, and stores a variety of metadata in the process to make it faster. For example it maintains a map of id to dom nodes. ! Then, it takes those styles and compute the layout, which contains a position and location in the screen. Again, lots of metadata. It will avoid recomputing layout as much as possible and caches previously computed values. ! Then, at some point you actually a buffer either on the CPU or GPU. ! All those steps require intermediate representations and use memory and cpu. The browser are doing a very good job at optimizing this entire pipeline

Slide 50

Slide 50 text

Virtual DOM t=1 Virtual DOM t=2 Virtual DOM + But, if you think about what’s happening in React, we only use those DOM nodes in the diff algorithm. So we can use a much lighter JavaScript object that just contains the tag name and attributes. We call it the virtual DOM.

Slide 51

Slide 51 text

Virtual DOM t=1 Virtual DOM t=2 Mutations Virtual DOM + = The diff algorithm generates a list of DOM mutations, the same way version controls output text mutations

Slide 52

Slide 52 text

Real DOM Virtual DOM That we can apply to the real DOM. Then we let the browser do all its optimized pipeline. We reduced the number of expensive, but needed, DOM mutations to the bare minimum

Slide 53

Slide 53 text

—— Open Source —— — — The diff algorithm and virtual DOM are the two optimizations that we had when we open sourced React. Let’s take a break and see how the project went since then

Slide 54

Slide 54 text

GitHub ˒ 0 2000 4000 6000 8000 6/11/2013 7/12/2013 8/12/2013 9/16/2013 10/26/2013 11/26/2013 12/27/2013 1/27/2014 2/27/2014 3/30/2014 4/30/2014 5/31/2014 7/1/2014 React got extremely popular in just a year. If it continues to grow at this rate, it’s going to be the biggest Facebook open source project in a couple of months!

Slide 55

Slide 55 text

Not only are people adding little stars on GitHub but they also use it in production. For example, the New York Times is using React to spice up their big news coverage like the Festival de Cannes and the world cup

Slide 56

Slide 56 text

GitHub just announced that they migrated their text editor, Atom, to React in order to improve performance. It shows that React is not only viable to build websites but real apps as well

Slide 57

Slide 57 text

Probably the most unexpected, Sberbank is the largest bank in Russia and is moving all the online consumer banking operations to React!

Slide 58

Slide 58 text

Last but not least, Khan Academy has been the ﬁrst big adopter of React and they converted all their student exercices and admin panels

Slide 59

Slide 59 text

More commits from open source than employees Not only people are using React but they are contributing back! And it’s not only typos in the docs. The two next optimizations have been brought to life by the community

Slide 60

Slide 60 text

Opera lists Reflow and Repaint as one of the three main contributors to sluggish JavaScript We’ve talked about the DOM being slow, the second source of slowness are reflows and repaints. Those scary words just mean that when you modify the DOM, then the browser has to update the position of elements and update the actual pixels. Th ! When you try to read some attributes from the DOM, the browser, in order to give you a consistent view, has to trigger those expensive operations. If you are doing a “read, write, read, write…” sequence of operations, you’re going to trigger those expensive reflow and repaint without knowing. ! In order to mitigate that, the idea is to reorder “read, write, read, write…” sequence of operations into “read, read, read…” then “write, write, write…”. concatenation was insecure by default, writing JavaScript applications in the conventional way is very prone to trigger reflows and repaints

Slide 61

Slide 61 text

Batching setState Dirty Ben Alpert, from Khan Academy, implemented a ﬁx for this problem by batching operations.

Slide 62

Slide 62 text

Batching setState Dirty In order to tell React that something changed, you call setState on an element. React will just mark the element as dirty but will not compute anything right away. If you call setState on the same node multiple times, it will be just as performant

Slide 63

Slide 63 text

Dirty Once the initial event has been fully propagated …

Slide 64

Slide 64 text

Re-Rendered We can start re-rendering elements from top to bottom. This is very important as it ensures that we only render elements once

Slide 65

Slide 65 text

Re-Rendered

Slide 66

Slide 66 text

Re-Rendered Now that all the elements have been re-rendered to the Virtual DOM, we feed that to the diff algorithm which outputs DOM mutations. Nowhere in this process did we have to read from the DOM. React is (outside of optimizations i’m not going to cover in this talk) write-only

Slide 67

Slide 67 text

Subtree Re-rendering At the beginning I said that the mental model is “re-rendering everything when anything changes”. This is not exactly correct in practice. We only re-render the subtree from elements that have been ﬂagged by setState.

Slide 68

Slide 68 text

Subtree Re-rendering When you start integrating React into your app, the usual pattern is to have state very low in the tree and therefore setState are pretty cheap as they only re-render a small part of the UI

Slide 69

Slide 69 text

Pruning Dirty When more of the application got converted, the state tends to go up, which means that you are re-rendering a larger portion of the app when anything changes.

Slide 70

Slide 70 text

Pruning Dirty bool shouldComponentUpdate(nextProps, nextState) To mitigate this effect performance-wise, you can implement “shouldComponentUpdate” which with both previous and next state/props can say: “You know what, nothing changed, let’s just skip re-rendering this sub-tree”

Slide 71

Slide 71 text

Pruning Re-Rendered bool shouldComponentUpdate(nextProps, nextState) This lets you prune big parts of the tree and regain performance

Slide 72

Slide 72 text

Pruning Re-Rendered bool shouldComponentUpdate(nextProps, nextState)

Slide 73

Slide 73 text

Pruning Re-Rendered bool shouldComponentUpdate(nextProps, nextState)

Slide 74

Slide 74 text

shouldComponentUpdate? We introduce shouldComponentUpdate with the open source release but we did not quite know how to actually implement it correctly. ! The problem is that in JavaScript you often use objects to hold state and mutate it directly. This means that the previous and next version of the state is the same reference to the object. So when you try to compare the previous version with the next, it’s going to say yes, even though something changed.

Slide 75

Slide 75 text

shouldComponentUpdate? Om: Immutable data structure • David Nolen, from the New York Times, ﬁgured out a good solution. In ClojureScript all most values are immutable, meaning that when you update one, you get a new object and the old one is left untouched. This works very well with shouldComponentUpdate. ! He wrote a library on-top of React in ClojureScript called Om which uses immutable data structures in order to implement shouldComponentUpdate by default

Slide 76

Slide 76 text

shouldComponentUpdate? •Perf.printWasted() Unfortunately, using immutable data structures require a big mental leap that everyone is not yet ready to take. So for now and the foreseeable future React has to work without them and therefore cannot implement shouldComponentUpdate by default. ! Instead, we just released a performance tool. You play around with your application for a while and every time a component is re-rendered, if the diff doesn’t output any DOM mutation, then it remember the time it took to render. At the end, you get a nice table that tells you the components that would beneﬁt most from shouldComponentUpdate! ! This way, you can put it on a few key places and reap most of the perf wins

Slide 77

Slide 77 text

In this talk we covered four optimizations that React is doing: diff algorithm, virtual DOM, batching and pruning. I hope that it shed some light on the reasons why they exist and how they work. ! React is used to build our desktop website, mobile website and the Instagram website. It is so successful at Facebook that basically all the new front-end products are written using React. This is not a project that we just use in internal tools or small features, this is used by the main page of Facebook used by hundreds of millions of people every month! Conclusion

Slide 78

Slide 78 text

Since we are at the Open Source conference, I would like to end by reflecting a bit on it.   We open sourced XHP in 2010 but we’ve done a very bad job at it, we just wrote a single blog post in 4 years. We didn’t go to conferences to explain it, write documentation … And yet, inside of Facebook we absolutely love it and use it everywhere. ! When we open sourced React last year, it was much harder because we had to explain at the same time the benefits of XHP and all the crazy optimizations we had to do in order to make it work on the client. ! We talk a lot about the benefits of open sourcing. This was a very good reminder that not open sourcing your core technologies can make it harder to open source other projects down the line Conclusion

Slide 79

Slide 79 text

React Architecture CHRISTOPHER “VJEUX” CHEDEAU FACEBOOK OPEN SOURCE