How browsers work - Speaker Deck

Slide 1

Slide 1 text

How browsers work? • Debugging browser code for solving problems • Open source – Chrome, FF ,Safari • Huge code base • Unclear complication instructions • Hours to compile • No documentation – where do you start?

Slide 2

Slide 2 text

• My article – http://www.html5rocks.com/en/tutorials/inte rnals/howbrowserswork/ • By Tali Garsiel [email protected] • I’ll be talking about : • Browser high level structure • Browser main flow

Slide 3

Slide 3 text

Browser high level structure I’ll be talking mostly about the rendering engine

Slide 4

Slide 4 text

Rendering engine • Responsible for parsing the HTML and painting it on the screen • Currently most browsers use a process for each tab – each has its own rendering engine.

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

What happens when I click “google.com”? • Loading the resource • Parsing the HTML • Creating the DOM tree • Creating the Render tree • Layout of the render tree • Painting

Slide 7

Slide 7 text

Parsing

Slide 8

Slide 8 text

Parsing - general • Parsing HTML is turning the HTML text into the DOM elements tree it represents • Turning this:

Hello World

Slide 9

Slide 9 text

Into this: • HTML DOM definition can be found at http://www.w3.org/TR/2003/REC-DOM-Level-2- HTML-20030109/idl-definitions.html

Slide 10

Slide 10 text

HTML Parser • Special parser – unlike regular parsers used for parsing languages like css , javascript ,Java. • A Custom parser we cannot use ready made parser techniques like bottom up or top down parsers • Context free grammar – you can define a grammar file in BNF

Slide 11

Slide 11 text

Grammar example – for CSV • CSV – a very simple language used for data like excel • Words surrounded by double quotes and separated by “,”. Each row is separated by a new line • “red”,”green”,”blue” • “yellow”,”gold”

Slide 12

Slide 12 text

Grammar file example for CSV

Slide 13

Slide 13 text

CSV grammar continued

Slide 14

Slide 14 text

Why HTML is difficult to parse • It cannot be described fully by a grammar file • Flexible syntax - extremely error tolerant • Reentrant – you can use “document.write” – This means that more text is added to the parsed text in the middle of parsing! – This is like adding more source code in the middle of compilation – more demanding of the parser

Slide 15

Slide 15 text

Parsing Algorithm • In HTML4 specification there were no specific algorithm how the browsers should parse the HTML • HTML5 workgroup does have such specification - http://www.whatwg.org/specs/web- apps/current-work/multipage/parsing.html#the- list-of-active-formatting-elements • Parsing is done in co-working of a tokenizer and the parser while every “document.write” can add more input to the input stream.

Slide 16

Slide 16 text

Tokenizer + Parser • The Tokenizer knows how To divide the input steam into Tokens (in our case “html” ,”body” ,”hello”,”world”) • Tree construction is done Using a state machine

Slide 17

Slide 17 text

A state machine • Lets see some Webkit source code

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Error tolerance • A great deal of the HTML parser work is to fix our errors. •

Really lousy HTML

• HTML5 workgroup requires browsers to correct many markup errors – Close unclosed tags – Move items to their correct parents

Slide 21

Slide 21 text

Example - some text

Slide 22

Slide 22 text

Render tree construction

Slide 23

Slide 23 text

From Dom tree to Render tree Document HTML body head div span Viewport Scroll Block box Block box Inline box

Slide 24

Slide 24 text

Render tree • The visual representation of the DOM tree • Non displayed elements are not there • Contains Boxes according to CSS box model – block boxes , inline boxes

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Style resolving • Finding all the style rules relevant to the box • computation according to the cascading order – User style sheets – Inline styles – Author style sheets – Browser defaults

Slide 27

Slide 27 text

Optimizations • Memory - Style is a huge construct • CPU - traverse the tree • A few boxes can share the same styles (until changed) – saves memory • Smaller trees like indexes of the big tree

Slide 28

Slide 28 text

Render tree layout

Slide 29

Slide 29 text

Layout • Giving size and position to the render tree boxes • Root node position is 0,0 and its dimensions are the viewport • Recursive – the parent box calls the child nodes to calculate their height • It than adds the accumulative heights + margin + paddings to het its own height

Slide 30

Slide 30 text

Painting

Slide 31

Slide 31 text

Painting • Traversing the render tree – calling each node to paint itself • Using the UI infrastructure component • Some boxes can have the same position with different z-index , they are held in a stack and painted bottom – up • CSS defines painting order , background color, background image , border and than child nodes

Slide 32

Slide 32 text

Re-layout and repaints • Changes can trigger re-layout and repaints • Window resize , scripts that adds , hides or resizes a node will trigger a re-layout • Non geometric changes will trigger only a repaint • Browsers try to minimize the changes –dirty bits system • They batch the changes

Slide 33

Slide 33 text

Is it just theoretical? • Understanding layout and paint can help us avoid re-layouts and re-paints – If you query an element style it will the flush the current batch – Sometimes better to do many changes on a non displayed node and than change its display – Replace class names instead of many inline styles – Try to keep the change low in the tree – Animate absolute or fixed nodes

Slide 34

Slide 34 text

Compiling Browsers • I think Chrome is the easiest • Instructions - http://www.chromium.org/developers/how- tos/build-instructions-windows • Do everything they say… • Make sure your machine is strong enough • It still will take hours

Slide 35

Slide 35 text

Debugging • Make a simple “Hello World” HTML and run it in your compiled browser • Stop at this points: – FrameLoader::load(DocumentLoader* newDocumentLoader) – DocumentLoader::commitData – HTMLDocumentParser::append – HTMLTreeBuilder::constructTreeFromToken

Slide 36

Slide 36 text

The end Resources: • http://www.w3.org/TR/2003/REC-DOM-Level-2-HTML- 20030109/idl-definitions.html • http://www.whatwg.org/specs/web-apps/current- work/multipage/parsing.html • http://www.whatwg.org/specs/web-apps/current- work/multipage/the-end.html#an-introduction-to-error-handling- and-strange-cases-in-the-parser • http://www.phpied.com/rendering-repaint-reflowrelayout-restyle/ • http://www.stubbornella.org/content/2009/03/27/reflows- repaints-css-performance-making-your-javascript-slow/ • http://www.html5rocks.com/en/tutorials/internals/howbrowsersw ork/#Layout