Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Optimizing client side performance in highly dynamic content websites

Ori Hoch
October 10, 2013

Optimizing client side performance in highly dynamic content websites

In this talk Ori shares his experience as a development team leader in YIT while working on ynet - the most popular news website in Israel.

The talk focuses on the challenges he and his team faced trying to optimize client-side performance of a highly dynamic content management system.

The talk can provide valuable input to anyone who is interested in improving client-side performance and especially to those interested in optimizing highly dynamic content websites.

Ori Hoch

October 10, 2013
Tweet

Other Decks in Programming

Transcript

  1. • Ori Hoch • github.com/astupidog I have many years of

    experience in web development in various positions – from developer to system architect to team leader.
  2. • Ori Hoch • github.com/astupidog • kaltura.org I work at

    Kaltura the leading open source online video platform. I work on Kaltura Mediaspace which is a product that can be used to create video and rich media web portals.
  3. • Ori Hoch • github.com/astupidog • kaltura.org • hasadna.org.il/en In

    my spare time – which I don't have much of because of 2 children - I volunteer for the public knowledge workshop – ירוביצ עדיל הנדסה.
  4. The Public Knowledge Workshop We are hacking for a better

    Israel. You can help! The public knowledge workshop is a non-profit, non-partisan organization that makes Israeli goverment data and other data of public interest openly accessible on the internet. We are always looking for volunteers, so feel free to approach me after the lecture for further details.
  5. Until about 2 months ago I worked at yit. Yit

    started about 15 years ago as the IT department for the Yedioth group which owns newspapers, magazines and internet sites, including Yedioth Aharonot which is one of the leading daily newspapers in Israel.
  6. yit.co.il/eng Since then YIT grown from an IT department to

    an independent company which develops some of the biggest, most popular websites in Israel. In yit I was the web development team leader for ynet.
  7. ynet.co.il ynetnews.com Ynet is the most popular news website in

    Israel and during my time working there me and my team were very lucky to have the chance to rewrite mostly the front-end of the site almost from scratch.
  8. during that process we learnt a lot and in this

    presentation I want to share with you some of the challenges we faced.
  9. walla.co.il ynet.co.il mako.co.il nana10.co.il tapuz.co.il one.co.il 0 100000 200000 300000

    400000 500000 600000 700000 800000 900000 749,045 – ynet average daily visitors 749,045 – ynet average daily visitors Feb 2012, Israel Audience Research Board Feb 2012, Israel Audience Research Board The biggest challenge in working on ynet was not the popularity of the site – although it is one of the most popular in Israel with hundreds of thousands of daily visitors. (http://www.globes.co.il/news/article.aspx? did=1000734364)
  10. The most challenging part was the dynamic nature of the

    site – the site is based on a very sophisticated content management system that allows a very high degree of customization to the site editors. (http://en.m.wikipedia.org/wiki/File:Customized_Ca n-Am_Spyder.jpg)
  11. The editors make changes to the site all the time

    and in addition always want to add new features. All of that needs to happen as soon as possible. (baby godfather meme)
  12. This make the basic optimizations much harder to perform and

    in this presentation I will focus on the problems we encountered and possible solutions.
  13. The plan • Images • Sprites • CSS / JS

    • Caching I'm assuming everyone here knows the basics of these optimizations and I will focus on more advanced problems and solutions. Feel free to ask questions or shout out comments during the presentation.
  14. Images On ynet, being primariliy a news website there are

    a lot of images the editors upload and the images change frequently. (http://ftvlive.com/todays-news/2013/9/20/tv-news-pet-peeves)
  15. Images If there is an ongoing event, the editors will

    get a lot of photos from the photographers in the event and will want to use those photos as soon as possible.
  16. Images of course take a lot of bandwidth and to

    lower costs and improve client-side performance we want the images to be as small as possible but without losing quality. (http://what-if.xkcd.com/31/)
  17. Also we may want to serve the same image in

    different sizes for different devices – using media queries or for external applications. So ideally you want to scale down the original image to match the specific resolution on each device. (http://www.neobytesolutions.com/responsive-web-design-part -3/)
  18. DELEGATE DELEGATE DELEGATE DELEGATE DELEGATE DELEGATE ! The easiest solution

    for the developers is to let the editors do all the work – crop and scale the images manually on the desktop and require to upload an image in the correct size. Of course the editors don't like that solution.. Also, not all editors are photoshop masters and might not scale or crop the images properly.
  19. CROP SCALE A better solution is to add an easy-to-use

    image editing interface online in the sites admin. So the editor uploads a full-resolution file and then presented with an easy-to-use interface that allows to crop and scale the images accordingly.
  20. SMALL BIG 400x300 800x600 We can make the interface even

    easier to use and more fool-proof by limiting only to predefined target sizes. If for example we have 2 size targets – big and small, then the user just selects which size he wants to produce and the interface limits him only to that size and does the crop or scaling as needed.
  21. KEEP CALM AND FIND THE USER ERORR This protects against

    user error and ensures the quality of the resulting images but It's still not good enough when there are a lot of images being uploaded all the time and used in different places.
  22. The editors need to have images ready as soon as

    possible – no time to waste trying to manually crop or scale the images. Also, they sometime upload a lot of images in advance so when they need them they can use them immediately. But usually only a fraction of those images are actually used. (http://knowyourmeme.com/memes/soon)
  23. So the natural solution is to crop or scale automatically.

    This presents a few problems. (http://memecrunch.com/meme/P8ER/automate-it)
  24. Usually the editors upload a full-resolution image they get from

    the photographer – for example this image of a speaker at a conference. In the full-size image you can see who the speaker is and maybe even read the text behind him. (http://www.mysqlperformanceblog.com/2013/04/29/percona-l ive-mysql-conference-2013-wrap-up/)
  25. But if we want to use this image for a

    mobile device, it will be tiny – and none of the details are visible.
  26. So for smaller devices you need to crop the image

    – so that at least some details are visible, but it can't be done automatically (or very hard to do) because you need to identify the relevant part of the image you want to focus on.
  27. Another problem is different proportions. For example, if you have

    a vertical image but you want to use it on a horizontal location. If you crop it automatically, you might chop someone's head off and if you don't crop it you have wasted space.
  28. Manual crop/scale in the online editing interface to predefined sizes

    So the solution we implemented is a semi-auotomatic process. The editors upload the full-resolution image and then in the image editing interface they crop and scale into as few as possible predefined sizes.
  29. From these sizes we can safely scale automatically to other

    predefined sizes which will be used on the site and will be more specifically fitted to the relevant targets.
  30. This way we can achieve our goal of serving each

    user the specific image resolution required and the editors have an efficient workflow for image uploading which minimzes manual work.
  31. sprites Every site has a lot of icons, logos, background

    images etc. that repeat on different pages. (http://forum.mgbr.net/index.php?showtopic=52640)
  32. sprites To reduce the amount of http requests we can

    combine all those images into one big image which is downloaded once, cached on the client-side and then reused on all the site's pages.
  33. .rssicon { background: url('sprites.png') no-repeat 0 0; background-position: -38px -511px;

    } http://brandonsetter.com/60-beautiful-css-sprite-social-media-icons/ Normally to use sprites you take all your images, put them on one big image (there are many tools to do that) and then use that image throughout your site using css.
  34. On ynet it was very hard to do because there

    are a lot of components and different layouts that the site editors can use. (http://knowyourmeme.com/memes/computer-reacti on-faces)
  35. homepage article As an example I'll use 2 of the

    most popular pages on ynet – the homepage and the article page. In addition to those there are thousands of other pages in varying degrees of popularity.
  36. Header component Top story component Search bar component Strip component

    Each of these pages is made up of components that the site's editors can add, remove and edit. The editors can choose from hundreds of different components.
  37. Belong to components on the homepage Belong to components on

    articles Not used on popular pages at all, only on special pages or not at all If we were to create one big image with all those sprites it will be too big and only a few of those sprites will be used at any given moment.
  38. Image that contains the sprites that appear on the homepage

    Image that contains the sprites that appear on articles The solution is to split this big image into groups of sprites that will be used together.
  39. Image that contains the sprites that appear on the homepage

    Image that contains the sprites that appear on articles Overlapping sprites – same sprites appear on both homepage and article One problem with this solution is that if each page has a different image, some of the sprites in that image might repeat between the different pages.
  40. Image that contains the sprites that appear on the homepage

    A component that appears on the homepage but is not included in the homepage sprites image Another problem is that the components that appear on the page are determined by the editor – so there might be a component not included on the predefined image for the page and the sprites in that component will have to be displayed directly without using the combined sprites image.
  41. How to divide the big sprites image into smaller sprite

    groups? The challenge is how to define the sprite groups in the most efficient manner.
  42. Automatically look at the components that are on the page

    and create a unique image for thos components One possible solution is to create the sprite images dynamically according to the components that are currently on the page. In this way, each page will have a unique sprites image associated with it. There are a few problems with this solution.
  43. /homepage (html) contains a link to css file: /homepage.sprites.css (dynamically

    generated) • collects all the sprites used on the homepage • compiles an image with all those sprites • returns css rules which point to that sprites image In that case you need the page's html to contain a link to an external css file that dynamically generates css rules for the all sprites used on that page based on a dynamically generated sprites image.
  44. Overlapping sprites There is also the problem of overlapping sprites

    – some components might be on 2 different pages. If each page has a unique sprites image associated with it the sprites for those components will be in 2 sprite images.
  45. Waste bandwidth and increase page load times Overlapping sprites This

    solution ensures each page will have exactly 1 sprite image related to it – this drasitaclly reduces the amount of http requests. But it also wastes a lot of bandwidth – the sprite images are not reused between different pages.
  46. This solution might be suitable in some cases, it really

    depends on the site editor's usage patterns. For ynet we found it's not suitable.
  47. Components most likely on the homepage Components most likely on

    articles Another solution is to create the sprite groups manually in advance according to client requirements – trying to anticipate the editor's usage patterns and which components will most likely be used on different pages.
  48. Components most likely on the homepage Components most likely on

    articles Common sprites – for all pages Social Sharing icons You can add different groups in any logical way which suits the site's usage patterns.
  49. This solution will slightly increase the number of http requests

    – each page will need to fetch a few different sprites images..
  50. Does not waste bandwidth So, this solution is not ideal

    regarding reduction of http requests but of course a lot better then without sprites. Also the sprites images will be cached on the client-side and reused between different pages – unlike the previous solution.
  51. Does not waste bandwidth No overlapping sprites It solves the

    problem of overlapping sprites – we divide the sprites manually and can make sure there is no overlap.
  52. The site's logo – changes on special occasions Links to

    sub-categories are images that are modified by the editor In addition to static sprites we also have some dynamic images that the editor uploads and can also benefit from spriting. For example on ynet – the site's logo can be changed by the editor and there are some other small images like that – that are used on many pages.
  53. The logo must be 130x50 but the editor uploads 135x55

    If we use an img tag it works: <img src='' width=130 height=50/> But it doesn't work when using sprites with css background-position It's dangerous to compile those sprites automatically because they are uploaded by the editor – we need to ensure they work well with sprites. A common problem we encountered is that the editor uploads an image which is not exactly the required size. It works well in an img tag which scales it accordingly but not in a sprite.
  54. So we implemented a semi-automatic process that allows us to

    compile those dynamic sprites and test them – if there is a problem with one sprite we don't include it in the sprites image and it fallbacks to using a regular image.
  55. All this sprite usage needs tight integration with the development

    and deployment process. If the sprites are dynamically allocated and compiled, the developers can't create them in advanced during development. (http://www.deviantart.com/art/Puzzle-292285090)
  56. Also developers shouldn't need to wait to create sprites –

    a big image with a lot of sprites can take a long time to compile. (http://xkcd.com/303/)
  57. Sprites Api Image Path Css class name ".ynetlogo“ During development

    the usage is very easy – just give the api an image and get a css class name which can be used to display that image.
  58. .ynetlogo { background: url('sprites.png') background-position: -38px -511px; } .ynetlogo {

    background: url('ynetlogo.png') } Sprites Api Development Production Get css declarations The api keeps track of all the sprites that were used and after all the components were rendered it returns all the css declarations for all those sprites. During development, those css declarations will refer to the image directly so developers don't wait for sprite compilation, for qa or production it will refer to the relevant sprites image.
  59. Sprites Api Article sprites image Homepage sprites image Render sprites

    image The api also renders the sprite images according to a configuration that specifies how to distribute the sprites to the different groups.
  60. Of course, the css declarations and sprite images need to

    match precisely and we do this using versioning.
  61. homepage.sprites.v4.css homepage.sprites.v4.png Each sprite group has it's related css rules

    and a related version number. Each time the sprite group is modified, it's compiled and the version is incremented.
  62. group version status homepage 4 ready homepage 5 disabled homepage

    6 compiling sprite_versions Also, each version has a status which indicates if the sprites for that version were compiled and ready for usage on the site. This also allows for easy rollback if there is a problem with a sprite version.
  63. github.com/jakobwesthoff/web-sprite-generator We used web-sprite-generator to render the sprites images and

    generate the css rules – we wrapped it with our own code that knows where to get the individual sprite images from, where to store the output image and update the backend so it knows the image was compiled and ready for usage on the site. The developers have the option to use it directly to test the sprite generation and integration before pushing to qa.
  64. So, developers don't need to do anything special to use

    sprites – they just upload an image and use the sprites api to get a css class name The sprites api handles all the hard work of compiling sprites and providing matching css rules. The sprites are distributed to different groups in production according to the site editor's usage patterns.
  65. css / js The site's content changes frequently but the

    css and javascript do not so we can ideally combine all the separate css and javascript code files into one big file, minify it and it can also be cached on the client-side and re-used between pages. (https://groups.google.com/forum/#! topic/nodejs/LquacmCOs-0)
  66. Css/js for articles components Css/js for homepage components In a

    dynamic cms we have similar problems as with sprites – if we put all the css declarations in one file it will be too big. Also, there are a lot of different components and we don't know in advance which components will appear on the page.
  67. Css/js for articles components Css/js for homepage components The solutions

    are also similar to the sprites solutions – manually or automatically distribute the css and js code into several external files.
  68. Unique page css / js files, each contains code for

    all the components that currently appear on the page homepage.css homepage.js For css and javascript it might be more effective to automatically generate a unique css/js file for every page. although there is overlap of code, because the css and javascript files are usually smaller then the sprite files it does not waste too much bandwidth or harm the page load time as much. It depends on the editors usage patterns.
  69. The ynet cms is made up of a lot of

    components – each component can be placed inside a cell and the cells are arranged in a dynamic grid. The site editors can change the arrangement of cells or the grid in the layout and can place different components inside the cells.
  70. Multi Articles Component MultiArticlesController display (request) : response All the

    parts are developed using mvc methodology, so each component is a controller that has a display method that returns the component's html.
  71. MultiArticlesController display (request) : response getStaticJs (request) : string getStaticCss

    (request) : string Each component's controller also has extra methods that return it's static css and javascript code. The generator knows that this static code can be served from an external file and the developers make sure that the code returned from those methods is static and does not depend on any dynamic variables.
  72. Generator display Iterate over all the component's controllers on the

    current page call the display method on each controller returns the combined html according to the page's layout The object that generates the page – combines the layout, calls the component's etc. is called the generator. The generator's display method returns the combined html of all the page's components according to a layout.
  73. Static Files API homepage.css homepage.js Both files are dynamically generated

    by the static files api The returned html contains links to the static javascript and css files that are dynamically generated by a static files api.
  74. Static Files API GetStaticFile ( type , page ) "js"

    , "homepage" "css" , "homepage" "js" , "article" Iterate over all the component's controllers on the relevant page call the getStaticCss / getStaticJs method on each controller return the combined css / js The static files api gets as a parameter the type of file (js or css) and the relevant page. It then iterates over all the components on the relevant page and collects all the statc js / css code..
  75. It also handles versioning of those files so they can

    be cached indefinately and can optionally minify / compile coffeescripts / scss etc. Some of the functionality is similar to the sprites api and they do share some of the code.
  76. So, css and js code is similar to sprites, only

    the implementation is slightly different. Like for sprites, we made sure it will integrate well into the development process. Each component is completely separate from the other components and the optimizations are done without requiring any specific changes to each component's code.
  77. Caching The most basic optimization but also the hardest is

    caching. There are a lot of different levels of caching – I will focus on full-page cache in the http level.
  78. The basic problem is that without caching all the requests

    hit your servers which increases the load and exposes you to ddos attacks. Also, if you have users from around the world – they all have to reach your server which might require many hops.
  79. CACHING LAYER The solution is to use a cdn –

    content distribution network and/or an internal caching layer that can more effectively handle high loads and can also be nearer to the clients physical location.
  80. “There are only two hard things in computer science: cache

    invalidation, naming things, and off-by-one errors.“ Phil Karlton (excluding the off-by-one joke) A famous quote by Phil Karlton, one of X11 developers says that cache invalidation is one of the hardest things in computer science. He was refering to the X11 implementation – making sure the windows are refreshed while doing as few operations as possible. But it's also true for web caches.
  81. For static assets like images/javascript/css files it's relatively easy to

    do cache invalidation – you can use versioning for those assets and when making a change increase the version so a new file will be served. The best option for those assets is to use an external cdn that stores your files on their servers, ideally with servers around the world that serve the files from the servers nearest to each user.
  82. For dynamic content it's more difficult and I'll focus on

    that. There are 2 main options for cache invalidation:
  83. using ttls (time to live) – each object is cached

    for a predefined amount of time. When the predefined time passes the cache is invalidated and a fresh copy will be served.
  84. On-demand caching – this way the data is saved in

    the cache indefinately and when some data needs to change a request is made to invalidate the relevant data and a fresh copy will be served
  85. Example: • Default ttl for all pages – 5 minutes

    • Homepage needs to be refreshed more often so set at 2 minutes • Articles are updated less often so can be set at 10 minutes Ttl caching is relatively easy to implement – you specify a default ttl for all requests and possibly change that default ttl for specific requests. You need to determine the desired ttl for each resource - if it's too high then the site's content will be stale. If it's too low – then still your servers will get a lot of hits. Usually we want the ttl to be as high as possible but the client wants it lower so the site is more up-to-date. On-demand cache invalidation is much harder to implement.
  86. Edit and save an article Delete the article's page cache

    With on-demand cache invalidation when an editor edits an article – you invalidate the relevant article page's cache.
  87. Edit and save an article Delete the article's page cache

    The article's title is also used on the homepage Delete the homepage's cache But if the data from that article is used in other places – you also need to invalidate those locations.
  88. Edit and save an article Delete the article's page cache

    The article's title is also used on the homepage Delete the homepage's cache Also used on: • Iphone application • RSS feeds • Headlines page • Headlines ticker • ... Sometimes there are a lot of such locations and it's difficult to keep track. Also, you need to make sure you don't have any changes which happen frequently and cause too much cache invalidations – that will make the cache useless.
  89. For a news website the cache must be invalidated as

    quickly as possible – for example if an editor makes a mistake like publishing a story which was not approved or uses a photo which should be censored – those must be fixed immediately For that reason, using a CDN is usually not an option – some cdns don't have the option to actively invalidate the cache and even if they do have that option – the cdn might take a lot of time to invalidate all it's caches because the cache is distributed across data-centers.
  90. External cdn Ttl-based cache invalidation The solution that worked best

    for us is a combination of ttl and cache invalidation. We use a cdn for ttl-based cache invalidation with relatively low ttls.
  91. External cdn Ttl-based cache invalidation Internal caching layer on-demand cache

    invalidation Invalidation requests And an internal caching layer for on-demand cache invalidation – using something like varnish, which handles cache invalidation very fast. This way most requests are served from the cdn but some still reach our servers and they are handled by the on-demand caching layer.
  92. External cdn Ttl-based cache invalidation Internal caching layer on-demand cache

    invalidation There is still a problem that the first user that requests a resource will reach the application server – because the resource is not cached yet
  93. External cdn Ttl-based cache invalidation Internal caching layer on-demand cache

    invalidation Stored in the cache Stored in the cache When this user's request is returned it is stored in the caches and the next user will get it from the cache. If it takes a long time to generate a page you may want to optimize even further by implementing cache warming.
  94. External cdn Ttl-based cache invalidation Internal caching layer on-demand cache

    invalidation Cache invalidation Request updated resource updated resource Stored in the cache Cache warming means that when the cache is invalidated the relevant page is actively re-generated, not waiting for the next request.
  95. External cdn Ttl-based cache invalidation Internal caching layer on-demand cache

    invalidation Cache invalidation Request updated resource updated resource Stored in the cache A possible problem with this is that users might interfere in this process. If a user requests a page after the cache was invalidated but before the updated request was cached – he will still reach our application server. This might be a problem if a certain page takes an extermely long time to generate.
  96. External cdn Ttl-based cache invalidation disabled Internal caching layer on-demand

    cache invalidation active active active A solution to this problem is to do the cache warming one server at a time. When an invalidation is needed – you can disable one caching server so it won't serve requests and then the invalidation and regeneration is performed on that server.
  97. External cdn Ttl-based cache invalidation disabled Internal caching layer on-demand

    cache invalidation active active active After the page was regenerated on that server it is re-enabled, and then another server is disabled and so on. After the first server the page doesn't need to regenerate because it can be copied from the first server.
  98. So, the combination of ttl caching using a cdn in

    combination with an internal on-demand cache invalidation internally worked well for us – only a very low percentage of requests reach our servers – most of them are handled by the cdn, and the few that do reach our servers are mostly handled by the internal caching. The cache-warming might be a bit overkill – but it's useful if you have pages that take a long time to generate.