Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How browsers work

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.
Avatar for talig talig
May 06, 2012

How browsers work

Slides of my browser internals talk at front-trends 2012

Avatar for talig

talig

May 06, 2012
Tweet

Other Decks in Programming

Transcript

  1. How browsers work? • Debugging browser code for solving problems

    • Open source – Chrome, FF ,Safari • Huge code base • Unclear complication instructions • Hours to compile • No documentation – where do you start?
  2. • My article – http://www.html5rocks.com/en/tutorials/inte rnals/howbrowserswork/ • By Tali Garsiel

    [email protected] • I’ll be talking about : • Browser high level structure • Browser main flow
  3. Rendering engine • Responsible for parsing the HTML and painting

    it on the screen • Currently most browsers use a process for each tab – each has its own rendering engine.
  4. What happens when I click “google.com”? • Loading the resource

    • Parsing the HTML • Creating the DOM tree • Creating the Render tree • Layout of the render tree • Painting
  5. Parsing - general • Parsing HTML is turning the HTML

    text into the DOM elements tree it represents • Turning this: <html> <body> <p> Hello World </p> <div> <img src="example.png"/></div> </body> </html>
  6. Into this: • HTML DOM definition can be found at

    http://www.w3.org/TR/2003/REC-DOM-Level-2- HTML-20030109/idl-definitions.html
  7. HTML Parser • Special parser – unlike regular parsers used

    for parsing languages like css , javascript ,Java. • A Custom parser we cannot use ready made parser techniques like bottom up or top down parsers • Context free grammar – you can define a grammar file in BNF
  8. Grammar example – for CSV • CSV – a very

    simple language used for data like excel • Words surrounded by double quotes and separated by “,”. Each row is separated by a new line • “red”,”green”,”blue” • “yellow”,”gold”
  9. Why HTML is difficult to parse • It cannot be

    described fully by a grammar file • Flexible syntax - extremely error tolerant • Reentrant – you can use “document.write” – This means that more text is added to the parsed text in the middle of parsing! – This is like adding more source code in the middle of compilation – more demanding of the parser
  10. Parsing Algorithm • In HTML4 specification there were no specific

    algorithm how the browsers should parse the HTML • HTML5 workgroup does have such specification - http://www.whatwg.org/specs/web- apps/current-work/multipage/parsing.html#the- list-of-active-formatting-elements • Parsing is done in co-working of a tokenizer and the parser while every “document.write” can add more input to the input stream.
  11. Tokenizer + Parser • The Tokenizer knows how To divide

    the input steam into Tokens (in our case “html” ,”body” ,”hello”,”world”) • Tree construction is done Using a state machine
  12. Error tolerance • A great deal of the HTML parser

    work is to fix our errors. • <html> <mytag> <div> <p> </div> Really lousy HTML <p> </html> • HTML5 workgroup requires browsers to correct many markup errors – Close unclosed tags – Move items to their correct parents
  13. From Dom tree to Render tree Document HTML body head

    div span Viewport Scroll Block box Block box Inline box
  14. Render tree • The visual representation of the DOM tree

    • Non displayed elements are not there • Contains Boxes according to CSS box model – block boxes , inline boxes
  15. Style resolving • Finding all the style rules relevant to

    the box • computation according to the cascading order – User style sheets – Inline styles – Author style sheets – Browser defaults
  16. Optimizations • Memory - Style is a huge construct •

    CPU - traverse the tree • A few boxes can share the same styles (until changed) – saves memory • Smaller trees like indexes of the big tree
  17. Layout • Giving size and position to the render tree

    boxes • Root node position is 0,0 and its dimensions are the viewport • Recursive – the parent box calls the child nodes to calculate their height • It than adds the accumulative heights + margin + paddings to het its own height
  18. Painting • Traversing the render tree – calling each node

    to paint itself • Using the UI infrastructure component • Some boxes can have the same position with different z-index , they are held in a stack and painted bottom – up • CSS defines painting order , background color, background image , border and than child nodes
  19. Re-layout and repaints • Changes can trigger re-layout and repaints

    • Window resize , scripts that adds , hides or resizes a node will trigger a re-layout • Non geometric changes will trigger only a repaint • Browsers try to minimize the changes –dirty bits system • They batch the changes
  20. Is it just theoretical? • Understanding layout and paint can

    help us avoid re-layouts and re-paints – If you query an element style it will the flush the current batch – Sometimes better to do many changes on a non displayed node and than change its display – Replace class names instead of many inline styles – Try to keep the change low in the tree – Animate absolute or fixed nodes
  21. Compiling Browsers • I think Chrome is the easiest •

    Instructions - http://www.chromium.org/developers/how- tos/build-instructions-windows • Do everything they say… • Make sure your machine is strong enough • It still will take hours
  22. Debugging • Make a simple “Hello World” HTML and run

    it in your compiled browser • Stop at this points: – FrameLoader::load(DocumentLoader* newDocumentLoader) – DocumentLoader::commitData – HTMLDocumentParser::append – HTMLTreeBuilder::constructTreeFromToken
  23. The end Resources: • http://www.w3.org/TR/2003/REC-DOM-Level-2-HTML- 20030109/idl-definitions.html • http://www.whatwg.org/specs/web-apps/current- work/multipage/parsing.html •

    http://www.whatwg.org/specs/web-apps/current- work/multipage/the-end.html#an-introduction-to-error-handling- and-strange-cases-in-the-parser • http://www.phpied.com/rendering-repaint-reflowrelayout-restyle/ • http://www.stubbornella.org/content/2009/03/27/reflows- repaints-css-performance-making-your-javascript-slow/ • http://www.html5rocks.com/en/tutorials/internals/howbrowsersw ork/#Layout