Slide 1

Slide 1 text

My site is slow, what can I do? @BastianHofmann Profiling PHP applications

Slide 2

Slide 2 text

This talk is all about... ...

Slide 3

Slide 3 text

Speed. And with that I mean, not the drug, the movie or any game, but

Slide 4

Slide 4 text

Speed of your web application the ..

Slide 5

Slide 5 text

We'll talk about...

Slide 6

Slide 6 text

Why it matters and why you should care about it

Slide 7

Slide 7 text

How to measure it what actually is pagespeed and ...

Slide 8

Slide 8 text

How to find out where the problems are and ..

Slide 9

Slide 9 text

Before we start ...

Slide 10

Slide 10 text

A few words about me ...

Slide 11

Slide 11 text

I work at ResearchGate, the professional network for scientists and researchers

Slide 12

Slide 12 text

ResearchGate gives science back to the people who make it happen. We help researchers build reputation and accelerate scientific progress. On their terms. ‟ the goal is to give...

Slide 13

Slide 13 text

over 3 million users

Slide 14

Slide 14 text

here some impressions of the page

Slide 15

Slide 15 text

you may have read about us in the news recently

Slide 16

Slide 16 text

http://gigaom.com/2013/06/05/heres-how-bill-gatess- researchgate-investment-might-change-the-world-for-the- better http://venturevillage.eu/researchgate

Slide 17

Slide 17 text

have this, and also work on some cool stuff

Slide 18

Slide 18 text

we are hiring

Slide 19

Slide 19 text

Questions? Ask by the way, if you have any questions throughout this talk, if you don't understand something, just raise your hand and ask.

Slide 20

Slide 20 text

http://speakerdeck.com/u/bastianhofmann the slides will be available on speakerdeck

Slide 21

Slide 21 text

Why should you care? so, pagespeed...

Slide 22

Slide 22 text

It is really important of course because ...

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

but seriously, in the last years multiple studies were made on the importance of speed for a web application. how does it affect usage and conversion? how long are people waiting for content? how does it affect sales? if people left because the site is slow, are they coming back?

Slide 25

Slide 25 text

Every ms counts in short, the result of every study was the same: ...

Slide 26

Slide 26 text

in detail

Slide 27

Slide 27 text

in detail

Slide 28

Slide 28 text

..

Slide 29

Slide 29 text

So what is pagespeed?

Slide 30

Slide 30 text

Server the first thing the contributes to pagespeed is what happens on the server. this is also the easiest part, because it is completely under our control

Slide 31

Slide 31 text

Your PHP application Request Response so what your server and your application is doing between incoming request and outgoing response

Slide 32

Slide 32 text

Your PHP application Request Response Load balancer though this does not mean only your application, but also the rest of your setup, like a loadbalancer

Slide 33

Slide 33 text

Your PHP application Request Response Load balancer and also your application is probably not a single small php script, but a big application with multiple components that each can affect speed differently. so getting a more detailed view on these components might be interesting as well. more to that later

Slide 34

Slide 34 text

web server db http service http service cache user request additionally most bigger applications have some kind of a service oriented architecture, same things apply here. knowing about the speed of the different services is important.

Slide 35

Slide 35 text

what is your web application like?

Slide 36

Slide 36 text

But there is more ... your application does not stop at your server. somehow it needs to get to your user

Slide 37

Slide 37 text

so internet connectivity is also a big part, contributing to pagespeed, that means everything from dns lookup, over ssl handshake to actually transporting the content over the wire

Slide 38

Slide 38 text

when your user received the content, he needs to display it. and some browsers are way slower than others in doing it

Slide 39

Slide 39 text

so the dom needs to be rendered, css fetched and applied, images loaded

Slide 40

Slide 40 text

and of course nearly no web application comes without javascript. this needs to be loaded and executed as well

Slide 41

Slide 41 text

http://www.stevesouders.com/blog/2012/02/10/the-performance-golden-rule/

Slide 42

Slide 42 text

So what happens on your server is really iportant

Slide 43

Slide 43 text

But the rest as well my point is: what is important is the pagespeed your user perceives. this contains everything from server to his browser. in the end it's your fault if the site is slow, even if the user's computer and browser is crappy.

Slide 44

Slide 44 text

Step 1 how to deal with this...

Slide 45

Slide 45 text

Measuring measuring your actual pagespeed

Slide 46

Slide 46 text

who is doing it?

Slide 47

Slide 47 text

if you start: it's going to hurt

Slide 48

Slide 48 text

because although everything seems to be fine on your fast, 2 month old machine, with lot's of ram, cpu power, latest chrome, from your 50mbit vdsl connection with a very low ping to your data center.

Slide 49

Slide 49 text

reality is: Old computers

Slide 50

Slide 50 text

Old feature phones, slower smart phones, mobile networks in general (edge anyone?)

Slide 51

Slide 51 text

old browsers

Slide 52

Slide 52 text

people in countries with big latencies to your datacenter and/or slow internet connections (rember dial up)

Slide 53

Slide 53 text

So you need to measure at your users side

Slide 54

Slide 54 text

Navigation Timing API https://developer.mozilla.org/en-US/docs/ Navigation_timing there is actually a great javascript up that is supported by a lot of modern browsers

Slide 55

Slide 55 text

you get timestamps for all important events of a pageload

Slide 56

Slide 56 text

DEMO

Slide 57

Slide 57 text

For older browsers you have to do it yourself though ... e.g. by manually measuring the time with javascript and on the server. this is actually kind of hard (clock offsets, etc)

Slide 58

Slide 58 text

Or

Slide 59

Slide 59 text

https://github.com/lognormal/boomerang use something of people who already did this for you

Slide 60

Slide 60 text

Getting it back to the server so now you have all the timestamps in your javascript, you need to ...

Slide 61

Slide 61 text

logstash http://logstash.net/ enter the next tool, logstash is a very powerful tool to handle log processing

Slide 62

Slide 62 text

input filter output basic workflow is that you have some input where logstash gets log messages in, on this input you can execute multiple filters that modify the message and then you can output the filtered message somewhere

Slide 63

Slide 63 text

Very rich plugin system to do this it offers a very large and rich plugin system for all kinds of inputs, filters and outputs, and you can also write your own

Slide 64

Slide 64 text

browser JS: boomerang logstash trackingServer access log requests tracking image with timing information as query parameters for our purpose we can just have boomerang in the browser collect the timestamps and then send a small tracking request (inserting an image) to a tracking server. the timestamps are added as query parameters to this request. the server only returns an empty image and logs the request to his access log which logstash can parse.

Slide 65

Slide 65 text

Graphing it we want to...

Slide 66

Slide 66 text

Graphite http://graphite.wikidot.com/ again there are many tools available to collect and display these metrics: one i want to highlight is graphite

Slide 67

Slide 67 text

graphite comes with a powerful interface where you can plot and aggregate this data into graphs and perform different mathematical functions on it to get it exactly the way you want to display your data

Slide 68

Slide 68 text

browser JS: boomerang logstash trackingServer access log requests tracking image with timing information as query parameters graphite statsd and statsd is a small load balancing daemon for it. so this is your setup logstash sends these timestamps to statsd who aggregates the information and sends them to graphite

Slide 69

Slide 69 text

input { file { type => "pagespeed-access" path => [ "/var/log/nginx/ access_log/monitoring-access.log" ] } } in logstash this is how to get the date from the log into logstash

Slide 70

Slide 70 text

filter{ grok { type => "pagespeed-access" pattern => "^.*\s\"[A-Z]+\s[^\?\s]+ \?page=%{DATA:page}\&connectTime=% {NUMBER:connectTime}...)?\sHTTP\/\d\.\d\".* $" } grok { type => "pagespeed-access" match => ["page", "^(profile| home|...)\.logged(In|Out)$"] exclude_tags => ["_grokparsefailure"] } } you can apply filters to put it into a structured form and validate it

Slide 71

Slide 71 text

output { statsd { type => "pagespeed-access" exclude_tags => ["_grokparsefailure"] host => "localhost" port => 8126 namespace => "pagespeed" sender => "" timing => [ "%{page}.connect", "% {connectTime}", ... ] } } and the put the data somehwere else. here we are sending it to statsd. what's that

Slide 72

Slide 72 text

and thats what such graphes then may look like

Slide 73

Slide 73 text

Can we measure more? I said earlier we may need information about services etc

Slide 74

Slide 74 text

Load balancer in a soa architecture we can do something similar with the access logs of our services, which also have timing information. or if we have a load balancer in between as well. we can get useful information from there

Slide 75

Slide 75 text

Example: HAProxy you can get the time of the request, time spent in haproxy queues etc.

Slide 76

Slide 76 text

input { file { type => "haproxy-http-log" path => [ "/var/log/ haproxy-http.log*" ] } } example config

Slide 77

Slide 77 text

filter { grok { type => "haproxy-http-log" pattern => "%{HAPROXYHTTP}" } mutate { type => "haproxy-http-log" gsub => [ "server_name", "\.", "_", "client_ip", "\.", "_" ] } } example config

Slide 78

Slide 78 text

output { statsd { type => "haproxy-http-log" exclude_tags => ["_grokparsefailure"] host => "localhost" port => 8125 namespace => "lb" sender => "" increment => [ ! ! "haproxy.%{backend_name}.%{server_name}.% {client_ip}.hits", ! ! "haproxy.%{backend_name}.%{server_name}.% {client_ip}.responses.%{http_status_code}" ! ! ] timing => [ "haproxy.%{backend_name}.%{server_name}.% {client_ip}.time_request", "%{time_request}", "haproxy.%{backend_name}.%{server_name}.% {client_ip}.time_backend_connect", "%{time_backend_connect}", "haproxy.%{backend_name}.%{server_name}.% {client_ip}.time_backend_response", "%{time_backend_response}", "haproxy.%{backend_name}.%{server_name}.% {client_ip}.time_queue", "%{time_queue}", "haproxy.%{backend_name}.%{server_name}.% {client_ip}.time_duration", "%{time_duration}" ] } } example config

Slide 79

Slide 79 text

browser JS: boomerang logstash trackingServer access log requests tracking image with timing information as query parameters graphite statsd logstash load balancer access log logstash service access log logstash can analyse these logs and send them to statsd as well

Slide 80

Slide 80 text

From within your PHP app What is also useful is to measure certain things from within your php app, e.g. rendering time. time database requests took. time spent of certain business logic etc. you can either just log this to a file and use the same logstash mechanism, or if you just need to have it for debugging, do it differently. more to that later.

Slide 81

Slide 81 text

Only overall?

Slide 82

Slide 82 text

By pages but you should not only measure all your request, you should differentiate by...

Slide 83

Slide 83 text

By browser

Slide 84

Slide 84 text

By country

Slide 85

Slide 85 text

Logged in / out

Slide 86

Slide 86 text

Heavy users .. or just everything that makes sense for you

Slide 87

Slide 87 text

Define goals and after measuring everything and you see that you are slow somehwere, you should define goals, what performance you want to reach

Slide 88

Slide 88 text

Step 2 but before you can start fixing stuff, ...

Slide 89

Slide 89 text

How to find out where the problems are finding out ...

Slide 90

Slide 90 text

Profiling you can do this through ...

Slide 91

Slide 91 text

first tool usefull for this is ... xdebug has quite a few functionalities like offering the ability to make breakpoints in your code, nicer error displays and so on. but one is also profiling of your app

Slide 92

Slide 92 text

xdebug.profiler_enable_trigger = 1 http://url?XDEBUG_PROFILE you can either activate profiling for every request or selectively for all requests that have a GET, POST or COOKIE parameter called XDEBUG_PROFILE

Slide 93

Slide 93 text

Webgrind https://github.com/jokkedk/webgrind this write so called cachegrind files. in order to view this you can use tools like kcachegrind or the easiest one ...

Slide 94

Slide 94 text

you can see everything that happend in this request, every function that was invoked, how often this was and how long it took.

Slide 95

Slide 95 text

DEMO

Slide 96

Slide 96 text

Use it locally on your dev machine one thing with xdebug, it slows down php, so .. but not in production

Slide 97

Slide 97 text

XHProf https://github.com/facebook/xhprof for production there is xhprof, developed by facebook

Slide 98

Slide 98 text

Use it in production for a subset of requests you can safely use it in production, it comes with a performance overhead but only when used, so you can activate it, but only use for a small percentage of requests or when manually activated (e.g. by a cookie).

Slide 99

Slide 99 text

XHGUI https://github.com/preinheimer/xhgui to display the xhprof profiles, there is a nice tool called xhgui

Slide 100

Slide 100 text

in addition to the normal thinkgs like seeing

Slide 101

Slide 101 text

the whole callstack

Slide 102

Slide 102 text

you can also visualize this in a graph

Slide 103

Slide 103 text

and can do analysis over multiple requests and compare them to each other

Slide 104

Slide 104 text

DEMO

Slide 105

Slide 105 text

Symfony Debug Toolbar i said earlier, that there is another good way to get information about your applications internals, especially if you only need it for debugging and not in a graph. this is with the..

Slide 106

Slide 106 text

it comes with standard symfony and you probably all have seen it already

Slide 107

Slide 107 text

you can click on it and it gives you nice detailed information about the request. stuff like doctrine queries, a nice timeline, exceptions, routing, events etc.

Slide 108

Slide 108 text

DEMO

Slide 109

Slide 109 text

Extend it http://symfony.com/doc/current/cookbook/profiler/ data_collector.html but did you know that you can extend it? there are some good ready made extensions available, e.g. for caching, http calls, versioning etc. just check packagist, but you can also write your own easily.

Slide 110

Slide 110 text

here are some examples how we at researchgate extended it (disclaimer: we are not even using full symfony, but only some components).

Slide 111

Slide 111 text

No content

Slide 112

Slide 112 text

No content

Slide 113

Slide 113 text

No content

Slide 114

Slide 114 text

No content

Slide 115

Slide 115 text

No content

Slide 116

Slide 116 text

DEMO

Slide 117

Slide 117 text

Step 3 now that you have all this debugging information to pinpoint your bottlenecks, let's get to ...

Slide 118

Slide 118 text

Fix it

Slide 119

Slide 119 text

That's someting you have to do unfortunately ... since it is very dependent on your application and your setup

Slide 120

Slide 120 text

No content

Slide 121

Slide 121 text

Remember but

Slide 122

Slide 122 text

Speed matters

Slide 123

Slide 123 text

http://twitter.com/BastianHofmann http://lanyrd.com/people/BastianHofmann http://speakerdeck.com/u/bastianhofmann [email protected] thanks, you can contact me on any of these platforms or via mail.