Optimizing client side performance in highly dynamic content websites

Slide 1

Slide 1 text

Optimizing client-side performance for highly dynamic content websites Hello, my name is Ori Hoch. A few words about myself -

Slide 2

Slide 2 text

● Ori Hoch ● github.com/astupidog I have many years of experience in web development in various positions – from developer to system architect to team leader.

Slide 3

Slide 3 text

● Ori Hoch ● github.com/astupidog ● kaltura.org I work at Kaltura the leading open source online video platform. I work on Kaltura Mediaspace which is a product that can be used to create video and rich media web portals.

Slide 4

Slide 4 text

● Ori Hoch ● github.com/astupidog ● kaltura.org ● hasadna.org.il/en In my spare time – which I don't have much of because of 2 children - I volunteer for the public knowledge workshop – ירוביצ עדיל הנדסה.

Slide 5

Slide 5 text

The Public Knowledge Workshop We are hacking for a better Israel. You can help! The public knowledge workshop is a non-profit, non-partisan organization that makes Israeli goverment data and other data of public interest openly accessible on the internet. We are always looking for volunteers, so feel free to approach me after the lecture for further details.

Slide 6

Slide 6 text

Until about 2 months ago I worked at yit. Yit started about 15 years ago as the IT department for the Yedioth group which owns newspapers, magazines and internet sites, including Yedioth Aharonot which is one of the leading daily newspapers in Israel.

Slide 7

Slide 7 text

yit.co.il/eng Since then YIT grown from an IT department to an independent company which develops some of the biggest, most popular websites in Israel. In yit I was the web development team leader for ynet.

Slide 8

Slide 8 text

ynet.co.il ynetnews.com Ynet is the most popular news website in Israel and during my time working there me and my team were very lucky to have the chance to rewrite mostly the front-end of the site almost from scratch.

Slide 9

Slide 9 text

during that process we learnt a lot and in this presentation I want to share with you some of the challenges we faced.

Slide 10

Slide 10 text

walla.co.il ynet.co.il mako.co.il nana10.co.il tapuz.co.il one.co.il 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 749,045 – ynet average daily visitors 749,045 – ynet average daily visitors Feb 2012, Israel Audience Research Board Feb 2012, Israel Audience Research Board The biggest challenge in working on ynet was not the popularity of the site – although it is one of the most popular in Israel with hundreds of thousands of daily visitors. (http://www.globes.co.il/news/article.aspx? did=1000734364)

Slide 11

Slide 11 text

1395 ynet ynet.co.il Compared to international sites it's nothing special.

Slide 12

Slide 12 text

The most challenging part was the dynamic nature of the site – the site is based on a very sophisticated content management system that allows a very high degree of customization to the site editors. (http://en.m.wikipedia.org/wiki/File:Customized_Ca n-Am_Spyder.jpg)

Slide 13

Slide 13 text

The editors make changes to the site all the time and in addition always want to add new features. All of that needs to happen as soon as possible. (baby godfather meme)

Slide 14

Slide 14 text

This make the basic optimizations much harder to perform and in this presentation I will focus on the problems we encountered and possible solutions.

Slide 15

Slide 15 text

The plan ● Images ● Sprites ● CSS / JS ● Caching I'm assuming everyone here knows the basics of these optimizations and I will focus on more advanced problems and solutions. Feel free to ask questions or shout out comments during the presentation.

Slide 16

Slide 16 text

Images On ynet, being primariliy a news website there are a lot of images the editors upload and the images change frequently. (http://ftvlive.com/todays-news/2013/9/20/tv-news-pet-peeves)

Slide 17

Slide 17 text

Images If there is an ongoing event, the editors will get a lot of photos from the photographers in the event and will want to use those photos as soon as possible.

Slide 18

Slide 18 text

Images of course take a lot of bandwidth and to lower costs and improve client-side performance we want the images to be as small as possible but without losing quality. (http://what-if.xkcd.com/31/)

Slide 19

Slide 19 text

Also we may want to serve the same image in different sizes for different devices – using media queries or for external applications. So ideally you want to scale down the original image to match the specific resolution on each device. (http://www.neobytesolutions.com/responsive-web-design-part -3/)

Slide 20

Slide 20 text

DELEGATE DELEGATE DELEGATE DELEGATE DELEGATE DELEGATE ! The easiest solution for the developers is to let the editors do all the work – crop and scale the images manually on the desktop and require to upload an image in the correct size. Of course the editors don't like that solution.. Also, not all editors are photoshop masters and might not scale or crop the images properly.

Slide 21

Slide 21 text

CROP SCALE A better solution is to add an easy-to-use image editing interface online in the sites admin. So the editor uploads a full-resolution file and then presented with an easy-to-use interface that allows to crop and scale the images accordingly.

Slide 22

Slide 22 text

SMALL BIG 400x300 800x600 We can make the interface even easier to use and more fool-proof by limiting only to predefined target sizes. If for example we have 2 size targets – big and small, then the user just selects which size he wants to produce and the interface limits him only to that size and does the crop or scaling as needed.

Slide 23

Slide 23 text

KEEP CALM AND FIND THE USER ERORR This protects against user error and ensures the quality of the resulting images but It's still not good enough when there are a lot of images being uploaded all the time and used in different places.

Slide 24

Slide 24 text

The editors need to have images ready as soon as possible – no time to waste trying to manually crop or scale the images. Also, they sometime upload a lot of images in advance so when they need them they can use them immediately. But usually only a fraction of those images are actually used. (http://knowyourmeme.com/memes/soon)

Slide 25

Slide 25 text

So the natural solution is to crop or scale automatically. This presents a few problems. (http://memecrunch.com/meme/P8ER/automate-it)

Slide 26

Slide 26 text

Usually the editors upload a full-resolution image they get from the photographer – for example this image of a speaker at a conference. In the full-size image you can see who the speaker is and maybe even read the text behind him. (http://www.mysqlperformanceblog.com/2013/04/29/percona-l ive-mysql-conference-2013-wrap-up/)

Slide 27

Slide 27 text

But if we want to use this image for a mobile device, it will be tiny – and none of the details are visible.

Slide 28

Slide 28 text

So for smaller devices you need to crop the image – so that at least some details are visible, but it can't be done automatically (or very hard to do) because you need to identify the relevant part of the image you want to focus on.

Slide 29

Slide 29 text

Another problem is different proportions. For example, if you have a vertical image but you want to use it on a horizontal location. If you crop it automatically, you might chop someone's head off and if you don't crop it you have wasted space.

Slide 30

Slide 30 text

Manual crop/scale in the online editing interface to predefined sizes So the solution we implemented is a semi-auotomatic process. The editors upload the full-resolution image and then in the image editing interface they crop and scale into as few as possible predefined sizes.

Slide 31

Slide 31 text

From these sizes we can safely scale automatically to other predefined sizes which will be used on the site and will be more specifically fitted to the relevant targets.

Slide 32

Slide 32 text

This way we can achieve our goal of serving each user the specific image resolution required and the editors have an efficient workflow for image uploading which minimzes manual work.

Slide 33

Slide 33 text

sprites Every site has a lot of icons, logos, background images etc. that repeat on different pages. (http://forum.mgbr.net/index.php?showtopic=52640)

Slide 34

Slide 34 text

sprites To reduce the amount of http requests we can combine all those images into one big image which is downloaded once, cached on the client-side and then reused on all the site's pages.

Slide 35

Slide 35 text

.rssicon { background: url('sprites.png') no-repeat 0 0; background-position: -38px -511px; } http://brandonsetter.com/60-beautiful-css-sprite-social-media-icons/ Normally to use sprites you take all your images, put them on one big image (there are many tools to do that) and then use that image throughout your site using css.

Slide 36

Slide 36 text

On ynet it was very hard to do because there are a lot of components and different layouts that the site editors can use. (http://knowyourmeme.com/memes/computer-reacti on-faces)

Slide 37

Slide 37 text

homepage article As an example I'll use 2 of the most popular pages on ynet – the homepage and the article page. In addition to those there are thousands of other pages in varying degrees of popularity.

Slide 38

Slide 38 text

Header component Top story component Search bar component Strip component Each of these pages is made up of components that the site's editors can add, remove and edit. The editors can choose from hundreds of different components.

Slide 39

Slide 39 text

Belong to components on the homepage Belong to components on articles Not used on popular pages at all, only on special pages or not at all If we were to create one big image with all those sprites it will be too big and only a few of those sprites will be used at any given moment.

Slide 40

Slide 40 text

Image that contains the sprites that appear on the homepage Image that contains the sprites that appear on articles The solution is to split this big image into groups of sprites that will be used together.

Slide 41

Slide 41 text

Image that contains the sprites that appear on the homepage Image that contains the sprites that appear on articles Overlapping sprites – same sprites appear on both homepage and article One problem with this solution is that if each page has a different image, some of the sprites in that image might repeat between the different pages.

Slide 42

Slide 42 text

Image that contains the sprites that appear on the homepage A component that appears on the homepage but is not included in the homepage sprites image Another problem is that the components that appear on the page are determined by the editor – so there might be a component not included on the predefined image for the page and the sprites in that component will have to be displayed directly without using the combined sprites image.

Slide 43

Slide 43 text

How to divide the big sprites image into smaller sprite groups? The challenge is how to define the sprite groups in the most efficient manner.

Slide 44

Slide 44 text

Automatically look at the components that are on the page and create a unique image for thos components One possible solution is to create the sprite images dynamically according to the components that are currently on the page. In this way, each page will have a unique sprites image associated with it. There are a few problems with this solution.

Slide 45

Slide 45 text

The hardest problem with that is that it's hard to implement.

Slide 46

Slide 46 text

/homepage (html) contains a link to css file: /homepage.sprites.css (dynamically generated) ● collects all the sprites used on the homepage ● compiles an image with all those sprites ● returns css rules which point to that sprites image In that case you need the page's html to contain a link to an external css file that dynamically generates css rules for the all sprites used on that page based on a dynamically generated sprites image.

Slide 47

Slide 47 text

Overlapping sprites There is also the problem of overlapping sprites – some components might be on 2 different pages. If each page has a unique sprites image associated with it the sprites for those components will be in 2 sprite images.

Slide 48

Slide 48 text

Waste bandwidth and increase page load times Overlapping sprites This solution ensures each page will have exactly 1 sprite image related to it – this drasitaclly reduces the amount of http requests. But it also wastes a lot of bandwidth – the sprite images are not reused between different pages.

Slide 49

Slide 49 text

This solution might be suitable in some cases, it really depends on the site editor's usage patterns. For ynet we found it's not suitable.

Slide 50

Slide 50 text

Components most likely on the homepage Components most likely on articles Another solution is to create the sprite groups manually in advance according to client requirements – trying to anticipate the editor's usage patterns and which components will most likely be used on different pages.

Slide 51

Slide 51 text

Components most likely on the homepage Components most likely on articles Common sprites – for all pages Social Sharing icons You can add different groups in any logical way which suits the site's usage patterns.

Slide 52

Slide 52 text

This solution will slightly increase the number of http requests – each page will need to fetch a few different sprites images..

Slide 53

Slide 53 text

Does not waste bandwidth So, this solution is not ideal regarding reduction of http requests but of course a lot better then without sprites. Also the sprites images will be cached on the client-side and reused between different pages – unlike the previous solution.

Slide 54

Slide 54 text

Does not waste bandwidth No overlapping sprites It solves the problem of overlapping sprites – we divide the sprites manually and can make sure there is no overlap.

Slide 55

Slide 55 text

Easier to implement Does not waste bandwidth No overlapping sprites And, it's easier to implement.

Slide 56

Slide 56 text

The site's logo – changes on special occasions Links to sub-categories are images that are modified by the editor In addition to static sprites we also have some dynamic images that the editor uploads and can also benefit from spriting. For example on ynet – the site's logo can be changed by the editor and there are some other small images like that – that are used on many pages.

Slide 57

Slide 57 text

The logo must be 130x50 but the editor uploads 135x55 If we use an img tag it works: But it doesn't work when using sprites with css background-position It's dangerous to compile those sprites automatically because they are uploaded by the editor – we need to ensure they work well with sprites. A common problem we encountered is that the editor uploads an image which is not exactly the required size. It works well in an img tag which scales it accordingly but not in a sprite.

Slide 58

Slide 58 text

So we implemented a semi-automatic process that allows us to compile those dynamic sprites and test them – if there is a problem with one sprite we don't include it in the sprites image and it fallbacks to using a regular image.

Slide 59

Slide 59 text

All this sprite usage needs tight integration with the development and deployment process. If the sprites are dynamically allocated and compiled, the developers can't create them in advanced during development. (http://www.deviantart.com/art/Puzzle-292285090)

Slide 60

Slide 60 text

Also developers shouldn't need to wait to create sprites – a big image with a lot of sprites can take a long time to compile. (http://xkcd.com/303/)

Slide 61

Slide 61 text

To solve these problems we developed a sprites api that has several responsibilites.

Slide 62

Slide 62 text

Sprites Api Image Path Css class name ".ynetlogo“ During development the usage is very easy – just give the api an image and get a css class name which can be used to display that image.

Slide 63

Slide 63 text

This makes it fast and easy to use during development.

Slide 64

Slide 64 text

.ynetlogo { background: url('sprites.png') background-position: -38px -511px; } .ynetlogo { background: url('ynetlogo.png') } Sprites Api Development Production Get css declarations The api keeps track of all the sprites that were used and after all the components were rendered it returns all the css declarations for all those sprites. During development, those css declarations will refer to the image directly so developers don't wait for sprite compilation, for qa or production it will refer to the relevant sprites image.

Slide 65

Slide 65 text

Sprites Api Article sprites image Homepage sprites image Render sprites image The api also renders the sprite images according to a configuration that specifies how to distribute the sprites to the different groups.

Slide 66

Slide 66 text

Of course, the css declarations and sprite images need to match precisely and we do this using versioning.

Slide 67

Slide 67 text

homepage.sprites.v4.css homepage.sprites.v4.png Each sprite group has it's related css rules and a related version number. Each time the sprite group is modified, it's compiled and the version is incremented.

Slide 68

Slide 68 text

group version status homepage 4 ready homepage 5 disabled homepage 6 compiling sprite_versions Also, each version has a status which indicates if the sprites for that version were compiled and ready for usage on the site. This also allows for easy rollback if there is a problem with a sprite version.

Slide 69

Slide 69 text

github.com/jakobwesthoff/web-sprite-generator We used web-sprite-generator to render the sprites images and generate the css rules – we wrapped it with our own code that knows where to get the individual sprite images from, where to store the output image and update the backend so it knows the image was compiled and ready for usage on the site. The developers have the option to use it directly to test the sprite generation and integration before pushing to qa.

Slide 70

Slide 70 text

So, developers don't need to do anything special to use sprites – they just upload an image and use the sprites api to get a css class name The sprites api handles all the hard work of compiling sprites and providing matching css rules. The sprites are distributed to different groups in production according to the site editor's usage patterns.

Slide 71

Slide 71 text

css / js The site's content changes frequently but the css and javascript do not so we can ideally combine all the separate css and javascript code files into one big file, minify it and it can also be cached on the client-side and re-used between pages. (https://groups.google.com/forum/#! topic/nodejs/LquacmCOs-0)

Slide 72

Slide 72 text

Css/js for articles components Css/js for homepage components In a dynamic cms we have similar problems as with sprites – if we put all the css declarations in one file it will be too big. Also, there are a lot of different components and we don't know in advance which components will appear on the page.

Slide 73

Slide 73 text

Css/js for articles components Css/js for homepage components The solutions are also similar to the sprites solutions – manually or automatically distribute the css and js code into several external files.

Slide 74

Slide 74 text

Unique page css / js files, each contains code for all the components that currently appear on the page homepage.css homepage.js For css and javascript it might be more effective to automatically generate a unique css/js file for every page. although there is overlap of code, because the css and javascript files are usually smaller then the sprite files it does not waste too much bandwidth or harm the page load time as much. It depends on the editors usage patterns.

Slide 75

Slide 75 text

> The implementation is somewhat differente then for sprites. (http://usersnap.com/blog/good-habits-in-web-developm ent/)

Slide 76

Slide 76 text

The ynet cms is made up of a lot of components – each component can be placed inside a cell and the cells are arranged in a dynamic grid. The site editors can change the arrangement of cells or the grid in the layout and can place different components inside the cells.

Slide 77

Slide 77 text

Multi Articles Component MultiArticlesController display (request) : response All the parts are developed using mvc methodology, so each component is a controller that has a display method that returns the component's html.

Slide 78

Slide 78 text

MultiArticlesController display (request) : response getStaticJs (request) : string getStaticCss (request) : string Each component's controller also has extra methods that return it's static css and javascript code. The generator knows that this static code can be served from an external file and the developers make sure that the code returned from those methods is static and does not depend on any dynamic variables.

Slide 79

Slide 79 text

Generator display Iterate over all the component's controllers on the current page call the display method on each controller returns the combined html according to the page's layout The object that generates the page – combines the layout, calls the component's etc. is called the generator. The generator's display method returns the combined html of all the page's components according to a layout.

Slide 80

Slide 80 text

Static Files API homepage.css homepage.js Both files are dynamically generated by the static files api The returned html contains links to the static javascript and css files that are dynamically generated by a static files api.

Slide 81

Slide 81 text

Static Files API GetStaticFile ( type , page ) "js" , "homepage" "css" , "homepage" "js" , "article" Iterate over all the component's controllers on the relevant page call the getStaticCss / getStaticJs method on each controller return the combined css / js The static files api gets as a parameter the type of file (js or css) and the relevant page. It then iterates over all the components on the relevant page and collects all the statc js / css code..

Slide 82

Slide 82 text

It also handles versioning of those files so they can be cached indefinately and can optionally minify / compile coffeescripts / scss etc. Some of the functionality is similar to the sprites api and they do share some of the code.

Slide 83

Slide 83 text

So, css and js code is similar to sprites, only the implementation is slightly different. Like for sprites, we made sure it will integrate well into the development process. Each component is completely separate from the other components and the optimizations are done without requiring any specific changes to each component's code.

Slide 84

Slide 84 text

Caching The most basic optimization but also the hardest is caching. There are a lot of different levels of caching – I will focus on full-page cache in the http level.

Slide 85

Slide 85 text

The basic problem is that without caching all the requests hit your servers which increases the load and exposes you to ddos attacks. Also, if you have users from around the world – they all have to reach your server which might require many hops.

Slide 86

Slide 86 text

CACHING LAYER The solution is to use a cdn – content distribution network and/or an internal caching layer that can more effectively handle high loads and can also be nearer to the clients physical location.

Slide 87

Slide 87 text

“There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors.“ Phil Karlton (excluding the off-by-one joke) A famous quote by Phil Karlton, one of X11 developers says that cache invalidation is one of the hardest things in computer science. He was refering to the X11 implementation – making sure the windows are refreshed while doing as few operations as possible. But it's also true for web caches.

Slide 88

Slide 88 text

For static assets like images/javascript/css files it's relatively easy to do cache invalidation – you can use versioning for those assets and when making a change increase the version so a new file will be served. The best option for those assets is to use an external cdn that stores your files on their servers, ideally with servers around the world that serve the files from the servers nearest to each user.

Slide 89

Slide 89 text

For dynamic content it's more difficult and I'll focus on that. There are 2 main options for cache invalidation:

Slide 90

Slide 90 text

using ttls (time to live) – each object is cached for a predefined amount of time. When the predefined time passes the cache is invalidated and a fresh copy will be served.

Slide 91

Slide 91 text

On-demand caching – this way the data is saved in the cache indefinately and when some data needs to change a request is made to invalidate the relevant data and a fresh copy will be served

Slide 92

Slide 92 text

Example: ● Default ttl for all pages – 5 minutes ● Homepage needs to be refreshed more often so set at 2 minutes ● Articles are updated less often so can be set at 10 minutes Ttl caching is relatively easy to implement – you specify a default ttl for all requests and possibly change that default ttl for specific requests. You need to determine the desired ttl for each resource - if it's too high then the site's content will be stale. If it's too low – then still your servers will get a lot of hits. Usually we want the ttl to be as high as possible but the client wants it lower so the site is more up-to-date. On-demand cache invalidation is much harder to implement.

Slide 93

Slide 93 text

Edit and save an article Delete the article's page cache With on-demand cache invalidation when an editor edits an article – you invalidate the relevant article page's cache.

Slide 94

Slide 94 text

Edit and save an article Delete the article's page cache The article's title is also used on the homepage Delete the homepage's cache But if the data from that article is used in other places – you also need to invalidate those locations.

Slide 95

Slide 95 text

Edit and save an article Delete the article's page cache The article's title is also used on the homepage Delete the homepage's cache Also used on: ● Iphone application ● RSS feeds ● Headlines page ● Headlines ticker ● ... Sometimes there are a lot of such locations and it's difficult to keep track. Also, you need to make sure you don't have any changes which happen frequently and cause too much cache invalidations – that will make the cache useless.

Slide 96

Slide 96 text

For a news website the cache must be invalidated as quickly as possible – for example if an editor makes a mistake like publishing a story which was not approved or uses a photo which should be censored – those must be fixed immediately For that reason, using a CDN is usually not an option – some cdns don't have the option to actively invalidate the cache and even if they do have that option – the cdn might take a lot of time to invalidate all it's caches because the cache is distributed across data-centers.

Slide 97

Slide 97 text

External cdn Ttl-based cache invalidation The solution that worked best for us is a combination of ttl and cache invalidation. We use a cdn for ttl-based cache invalidation with relatively low ttls.

Slide 98

Slide 98 text

External cdn Ttl-based cache invalidation Internal caching layer on-demand cache invalidation Invalidation requests And an internal caching layer for on-demand cache invalidation – using something like varnish, which handles cache invalidation very fast. This way most requests are served from the cdn but some still reach our servers and they are handled by the on-demand caching layer.

Slide 99

Slide 99 text

External cdn Ttl-based cache invalidation Internal caching layer on-demand cache invalidation There is still a problem that the first user that requests a resource will reach the application server – because the resource is not cached yet

Slide 100

Slide 100 text

External cdn Ttl-based cache invalidation Internal caching layer on-demand cache invalidation Stored in the cache Stored in the cache When this user's request is returned it is stored in the caches and the next user will get it from the cache. If it takes a long time to generate a page you may want to optimize even further by implementing cache warming.

Slide 101

Slide 101 text

Slide 102

Slide 102 text

External cdn Ttl-based cache invalidation Internal caching layer on-demand cache invalidation Cache invalidation Request updated resource updated resource Stored in the cache A possible problem with this is that users might interfere in this process. If a user requests a page after the cache was invalidated but before the updated request was cached – he will still reach our application server. This might be a problem if a certain page takes an extermely long time to generate.

Slide 103

Slide 103 text

External cdn Ttl-based cache invalidation disabled Internal caching layer on-demand cache invalidation active active active A solution to this problem is to do the cache warming one server at a time. When an invalidation is needed – you can disable one caching server so it won't serve requests and then the invalidation and regeneration is performed on that server.

Slide 104

Slide 104 text

External cdn Ttl-based cache invalidation disabled Internal caching layer on-demand cache invalidation active active active After the page was regenerated on that server it is re-enabled, and then another server is disabled and so on. After the first server the page doesn't need to regenerate because it can be copied from the first server.

Slide 105

Slide 105 text

So, the combination of ttl caching using a cdn in combination with an internal on-demand cache invalidation internally worked well for us – only a very low percentage of requests reach our servers – most of them are handled by the cdn, and the few that do reach our servers are mostly handled by the internal caching. The cache-warming might be a bit overkill – but it's useful if you have pages that take a long time to generate.

Slide 106

Slide 106 text

THANK YOU FOR YOUR ATTENTION QUESTIONS?