Save 37% off PRO during our Black Friday Sale! »

How browsers work

42bd054d9d8b0bb097bd1391154f54d1?s=47 talig
May 06, 2012

How browsers work

Slides of my browser internals talk at front-trends 2012

42bd054d9d8b0bb097bd1391154f54d1?s=128

talig

May 06, 2012
Tweet

Transcript

  1. How browsers work? • Debugging browser code for solving problems

    • Open source – Chrome, FF ,Safari • Huge code base • Unclear complication instructions • Hours to compile • No documentation – where do you start?
  2. • My article – http://www.html5rocks.com/en/tutorials/inte rnals/howbrowserswork/ • By Tali Garsiel

    tgarsiel@gmail.com • I’ll be talking about : • Browser high level structure • Browser main flow
  3. Browser high level structure I’ll be talking mostly about the

    rendering engine
  4. Rendering engine • Responsible for parsing the HTML and painting

    it on the screen • Currently most browsers use a process for each tab – each has its own rendering engine.
  5. None
  6. What happens when I click “google.com”? • Loading the resource

    • Parsing the HTML • Creating the DOM tree • Creating the Render tree • Layout of the render tree • Painting
  7. Parsing

  8. Parsing - general • Parsing HTML is turning the HTML

    text into the DOM elements tree it represents • Turning this: <html> <body> <p> Hello World </p> <div> <img src="example.png"/></div> </body> </html>
  9. Into this: • HTML DOM definition can be found at

    http://www.w3.org/TR/2003/REC-DOM-Level-2- HTML-20030109/idl-definitions.html
  10. HTML Parser • Special parser – unlike regular parsers used

    for parsing languages like css , javascript ,Java. • A Custom parser we cannot use ready made parser techniques like bottom up or top down parsers • Context free grammar – you can define a grammar file in BNF
  11. Grammar example – for CSV • CSV – a very

    simple language used for data like excel • Words surrounded by double quotes and separated by “,”. Each row is separated by a new line • “red”,”green”,”blue” • “yellow”,”gold”
  12. Grammar file example for CSV

  13. CSV grammar continued

  14. Why HTML is difficult to parse • It cannot be

    described fully by a grammar file • Flexible syntax - extremely error tolerant • Reentrant – you can use “document.write” – This means that more text is added to the parsed text in the middle of parsing! – This is like adding more source code in the middle of compilation – more demanding of the parser
  15. Parsing Algorithm • In HTML4 specification there were no specific

    algorithm how the browsers should parse the HTML • HTML5 workgroup does have such specification - http://www.whatwg.org/specs/web- apps/current-work/multipage/parsing.html#the- list-of-active-formatting-elements • Parsing is done in co-working of a tokenizer and the parser while every “document.write” can add more input to the input stream.
  16. Tokenizer + Parser • The Tokenizer knows how To divide

    the input steam into Tokens (in our case “html” ,”body” ,”hello”,”world”) • Tree construction is done Using a state machine
  17. A state machine • Lets see some Webkit source code

  18. None
  19. None
  20. Error tolerance • A great deal of the HTML parser

    work is to fix our errors. • <html> <mytag> <div> <p> </div> Really lousy HTML <p> </html> • HTML5 workgroup requires browsers to correct many markup errors – Close unclosed tags – Move items to their correct parents
  21. Example - <td>some text<td>

  22. Render tree construction

  23. From Dom tree to Render tree Document HTML body head

    div span Viewport Scroll Block box Block box Inline box
  24. Render tree • The visual representation of the DOM tree

    • Non displayed elements are not there • Contains Boxes according to CSS box model – block boxes , inline boxes
  25. None
  26. Style resolving • Finding all the style rules relevant to

    the box • computation according to the cascading order – User style sheets – Inline styles – Author style sheets – Browser defaults
  27. Optimizations • Memory - Style is a huge construct •

    CPU - traverse the tree • A few boxes can share the same styles (until changed) – saves memory • Smaller trees like indexes of the big tree
  28. Render tree layout

  29. Layout • Giving size and position to the render tree

    boxes • Root node position is 0,0 and its dimensions are the viewport • Recursive – the parent box calls the child nodes to calculate their height • It than adds the accumulative heights + margin + paddings to het its own height
  30. Painting

  31. Painting • Traversing the render tree – calling each node

    to paint itself • Using the UI infrastructure component • Some boxes can have the same position with different z-index , they are held in a stack and painted bottom – up • CSS defines painting order , background color, background image , border and than child nodes
  32. Re-layout and repaints • Changes can trigger re-layout and repaints

    • Window resize , scripts that adds , hides or resizes a node will trigger a re-layout • Non geometric changes will trigger only a repaint • Browsers try to minimize the changes –dirty bits system • They batch the changes
  33. Is it just theoretical? • Understanding layout and paint can

    help us avoid re-layouts and re-paints – If you query an element style it will the flush the current batch – Sometimes better to do many changes on a non displayed node and than change its display – Replace class names instead of many inline styles – Try to keep the change low in the tree – Animate absolute or fixed nodes
  34. Compiling Browsers • I think Chrome is the easiest •

    Instructions - http://www.chromium.org/developers/how- tos/build-instructions-windows • Do everything they say… • Make sure your machine is strong enough • It still will take hours
  35. Debugging • Make a simple “Hello World” HTML and run

    it in your compiled browser • Stop at this points: – FrameLoader::load(DocumentLoader* newDocumentLoader) – DocumentLoader::commitData – HTMLDocumentParser::append – HTMLTreeBuilder::constructTreeFromToken
  36. The end Resources: • http://www.w3.org/TR/2003/REC-DOM-Level-2-HTML- 20030109/idl-definitions.html • http://www.whatwg.org/specs/web-apps/current- work/multipage/parsing.html •

    http://www.whatwg.org/specs/web-apps/current- work/multipage/the-end.html#an-introduction-to-error-handling- and-strange-cases-in-the-parser • http://www.phpied.com/rendering-repaint-reflowrelayout-restyle/ • http://www.stubbornella.org/content/2009/03/27/reflows- repaints-css-performance-making-your-javascript-slow/ • http://www.html5rocks.com/en/tutorials/internals/howbrowsersw ork/#Layout