Profiling PHP applications

My site is slow, what can I do? @BastianHofmann Profiling
PHP applications

This talk is all about... ...

Speed. And with that I mean, not the drug, the
movie or any game, but

Speed of your web application the ..

We'll talk about...

Why it matters and why you should care about it

How to measure it what actually is pagespeed and ...

How to find out where the problems are and ..

Before we start ...

A few words about me ...

I work at ResearchGate, the professional network for scientists and
researchers

over 3 million users

here some impressions of the page

you may have read about us in the news recently

have this, and also work on some cool stuff

we are hiring

Questions? Ask by the way, if you have any questions
throughout this talk, if you don't understand something, just raise your hand and ask.

http://speakerdeck.com/u/bastianhofmann the slides will be available on speakerdeck

So speed…

Why should you care?

It is really important of course because ...

but seriously, in the last years multiple studies were made
on the importance of speed for a web application. how does it affect usage and conversion? how long are people waiting for content? how does it affect sales? if people left because the site is slow, are they coming back?

Every ms counts in short, the result of every study
was the same: ...

in detail

So what is pagespeed?

Server the ﬁrst thing the contributes to pagespeed is what
happens on the server. this is also the easiest part, because it is completely under our control

Your PHP application Request Response so what your server and
your application is doing between incoming request and outgoing response

Your PHP application Request Response Load balancer though this does
not mean only your application, but also the rest of your setup, like a loadbalancer

Your PHP application Request Response Load balancer and also your
application is probably not a single small php script, but a big application with multiple components that each can affect speed differently. so getting a more detailed view on these components might be interesting as well. more to that later

web server db http service http service cache user request
additionally most bigger applications have some kind of a service oriented architecture, same things apply here. knowing about the speed of the different services is important.

But there is more ... your application does not stop
at your server. somehow it needs to get to your user

so internet connectivity is also a big part, contributing to
pagespeed, that means everything from dns lookup, over ssl handshake to actually transporting the content over the wire

when your user received the content, he needs to display
it. and some browsers are way slower than others in doing it

so the dom needs to be rendered, css fetched and
applied, images loaded

and of course nearly no web application comes without javascript.
this needs to be loaded and executed as well

http://www.stevesouders.com/blog/2012/02/10/the-performance-golden-rule/

So what happens on your server is really important

But the rest is as well my point is: what
is important is the pagespeed your user perceives. this contains everything from server to his browser. in the end it's your fault if the site is slow, even if the user's computer and browser is crappy.

Step 1 how to deal with this...

Measuring measuring your actual pagespeed

who is doing it?

if you start: it's going to hurt

because although everything seems to be ﬁne on your fast,
2 month old machine, with lot's of ram, cpu power, latest chrome, from your 50mbit vdsl connection with a very low ping to your data center.

reality is: Old computers

Old feature phones, slower smart phones, mobile networks in general
(edge anyone?)

old browsers

people in countries with big latencies to your datacenter and/or
slow internet connections (rember dial up)

So you need to measure at your users side

Navigation Timing API https://developer.mozilla.org/en-US/docs/ Navigation_timing there is actually a great
javascript up that is supported by a lot of modern browsers

you get timestamps for all important events of a pageload

For older browsers you have to do it yourself though
... e.g. by manually measuring the time with javascript and on the server. this is actually kind of hard (clock offsets, etc)

https://github.com/lognormal/boomerang use something of people who already did this for
you

so we want to get a graph like this

Getting it back to the server so now you have
all the timestamps in your javascript, you need to ...

Tracking request

https://tracking.example.com/? page=proﬁle&backend=123&co mplete=890&domReady=580

logstash http://logstash.net/ enter the next tool, logstash is a very
powerful tool to handle log processing

input filter output basic workflow is that you have some
input where logstash gets log messages in, on this input you can execute multiple filters that modify the message and then you can output the filtered message somewhere

Very rich plugin system to do this it offers a
very large and rich plugin system for all kinds of inputs, ﬁlters and outputs, and you can also write your own

browser JS: boomerang logstash trackingServer access log requests tracking image
with timing information as query parameters for our purpose we can just have boomerang in the browser collect the timestamps and then send a small tracking request (inserting an image) to a tracking server. the timestamps are added as query parameters to this request. the server only returns an empty image and logs the request to his access log which logstash can parse.

Graphing it we want to...

Graphite http://graphite.wikidot.com/ again there are many tools available to collect
and display these metrics: one i want to highlight is graphite

graphite comes with a powerful interface where you can plot
and aggregate this data into graphs and perform different mathematical functions on it to get it exactly the way you want to display your data

with timing information as query parameters graphite statsd and statsd is a small load balancing daemon for it. so this is your setup logstash sends these timestamps to statsd who aggregates the information and sends them to graphite

input { file { type => "pagespeed-access" path => [
"/var/log/nginx/ access_log/monitoring-access.log" ] } } in logstash this is how to get the date from the log into logstash

filter{ grok { type => "pagespeed-access" pattern => "^.*\s\"[A-Z]+\s[^\?\s]+ \?page=%{DATA:page}\&connectTime=%
{NUMBER:connectTime}...)?\sHTTP\/\d\.\d\".* $" } grok { type => "pagespeed-access" match => ["page", "^(profile| home|...)\.logged(In|Out)$"] exclude_tags => ["_grokparsefailure"] } } you can apply ﬁlters to put it into a structured form and validate it

output { statsd { type => "pagespeed-access" exclude_tags => ["_grokparsefailure"]
host => "localhost" port => 8126 namespace => "pagespeed" sender => "" timing => [ "%{page}.connect", "% {connectTime}", ... ] } } and the put the data somehwere else. here we are sending it to statsd. what's that

and thats what such graphes then may look like

Can we measure more? I said earlier we may need
information about services etc

Load balancer in a soa architecture we can do something
similar with the access logs of our services, which also have timing information. or if we have a load balancer (like haproxy) in between as well. we can get useful information from there

input { file { type => "haproxy-http-log" path => [
"/var/log/ haproxy-http.log*" ] } } example conﬁg

filter { grok { type => "haproxy-http-log" pattern => "%{HAPROXYHTTP}"
} mutate { type => "haproxy-http-log" gsub => [ "server_name", "\.", "_", "client_ip", "\.", "_" ] } } example conﬁg

output { statsd { type => "haproxy-http-log" exclude_tags => ["_grokparsefailure"]
host => "localhost" port => 8125 namespace => "lb" sender => "" increment => [ ! ! "haproxy.%{backend_name}.%{server_name}.% {client_ip}.hits", ! ! "haproxy.%{backend_name}.%{server_name}.% {client_ip}.responses.%{http_status_code}" ! ! ] timing => [ "haproxy.%{backend_name}.%{server_name}.% {client_ip}.time_request", "%{time_request}", "haproxy.%{backend_name}.%{server_name}.% {client_ip}.time_backend_connect", "%{time_backend_connect}", "haproxy.%{backend_name}.%{server_name}.% {client_ip}.time_backend_response", "%{time_backend_response}", "haproxy.%{backend_name}.%{server_name}.% {client_ip}.time_queue", "%{time_queue}", "haproxy.%{backend_name}.%{server_name}.% {client_ip}.time_duration", "%{time_duration}" ] } } example conﬁg

with timing information as query parameters graphite statsd logstash load balancer access log logstash service access log logstash can analyse these logs and send them to statsd as well

From within your PHP app What is also useful is
to measure certain things from within your php app, e.g. rendering time. time database requests took. time spent of certain business logic etc. you can either just log this to a ﬁle and use the same logstash mechanism, or if you just need to have it for debugging, do it differently. more to that later.

More fine grained metrics

By pages but you should not only measure all your
request, you should differentiate by...

By browser

By country

Logged in / out

Heavy users .. or just everything that makes sense for
you

Define goals and after measuring everything and you see that
you are slow somewhere, you should deﬁne goals, what performance you want to reach

Step 2 but before you can start ﬁxing stuff, ...

How to find out where the problems are ﬁnding out
...

Profiling you can do this through ...

ﬁrst tool usefull for this is ... xdebug has quite
a few functionalities like offering the ability to make breakpoints in your code, nicer error displays and so on. but one is also proﬁling of your app

xdebug.profiler_enable_trigger = 1 http://url?XDEBUG_PROFILE you can either activate proﬁling for
every request or selectively for all requests that have a GET, POST or COOKIE parameter called XDEBUG_PROFILE

Webgrind https://github.com/jokkedk/webgrind this write so called cachegrind ﬁles. in order
to view this you can use tools like kcachegrind or the easiest one ...

you can see everything that happend in this request, every
function that was invoked, how often this was and how long it took.

DEMO data/callgrind.example.out.1087

Use it locally on your dev machine one thing with
xdebug, it slows down php, so .. but not in production

XHProf https://github.com/facebook/xhprof for production there is xhprof, developed by facebook

Use it in production for a subset of requests you
can safely use it in production, it comes with a performance overhead but only when used, so you can activate it, but only use for a small percentage of requests or when manually activated (e.g. by a cookie).

XHGUI https://github.com/perftools/xhgui to display the xhprof proﬁles, there is a
nice tool called xhgui

in addition to the normal thinkgs like seeing

the whole callstack

you can also visualize this in a graph

and can do analysis over multiple requests and compare them
to each other

DEMO http://localhost:8080/xhgui/webroot/

php-meminfo https://github.com/BitOne/php-meminfo

Symfony Debug Toolbar i said earlier, that there is another
good way to get information about your applications internals, especially if you only need it for debugging and not in a graph. this is with the..

it comes with standard symfony and you probably all have
seen it already

you can click on it and it gives you nice
detailed information about the request. stuff like doctrine queries, a nice timeline, exceptions, routing, events etc.

DEMO http://localhost:8080/proﬁling_talk/web/app_dev.php

Extend it http://symfony.com/doc/current/cookbook/proﬁler/ data_collector.html but did you know that you
can extend it? there are some good ready made extensions available, e.g. for caching, http calls, versioning etc. just check packagist, but you can also write your own easily.

here are some examples how we at researchgate extended it
(disclaimer: we are not even using full symfony, but only some components).

Step 3 now that you have all this debugging information
to pinpoint your bottlenecks, let's get to ...

Fix it

That's something you have to do unfortunately ... since it
is very dependent on your application and your setup

Keep up to date

http://www.perfplanet.com/

Remember but

Speed matters

Some hints on better performance

The obvious Of course you have the obvious things like

DB Indexes

Minify JS https://github.com/mishoo/UglifyJS

Minify CSS http://sass-lang.com/

Concatenate JS/ CSS/...

Correct caching headers

Opcache

Data Caching

memcached

PHP 5.5 https://blog.asmallorange.com/2013/08/php-roadmap- performance/

The not so well known

Minimize redirects

Check image compression https://github.com/gruntjs/grunt-contrib-imagemin

Compress HTML

Check YSlow, Pagespeed

DNS prefetch

Move logic to async workers

The "crazy" stuff

Varnish

Proﬁle Publications Publication Publication Publication AboutMe LeftColumn Image Menu Institution

Proﬁle Publications Publication Publication Publication AboutMe LeftColumn Image Menu <esi:include
src="..." /> Institution because every component has it's own url you can just render out a esi placeholder instead of the widget to tell varnish to fetch it separately and provided it has caching headers, get it out of the cache

Load content asynchronously

Proﬁle Publications Publication Publication Publication AboutMe LeftColumn Image Menu Institution

Proﬁle Publications Publication Publication Publication AboutMe LeftColumn Image Menu <div
id="placeholder"></div> <script>loadWidget('/aboutMe', function(w) { w.render({ replace : '#placeholder' }); })</script> Institution so instead of rendering the widget you render a placeholder dom element and a script tag that loads the widget with an ajax request and then renders it on the client side

Flush content early

Move logic to shutdown handlers

http://www.php.net/manual/en/function.register-shutdown- function.php

http://www.php.net/manual/en/function.fastcgi-ﬁnish- request.php

Promises / Futures https://github.com/facebook/libphutil

$future = new HTTPFuture( 'http://www.example.com/' ); list($status, $body,
$headers) = $future-‐>resolve();

pushState and if you are at that, when you switch
pages, you can also just load the differences between and use pushState to change the url (if supported) to make your app faster

Bigpipe

if you look at your widget tree you can mostly
identify larger parts, which are widgets itself

Proﬁle Menu Header LeftColumn RightColumn like this, so what you
can do to dramatically increase the perceived load time is prioritizing the rendering

so our http request looks like this, first you compute
and render the important parts of the page, like the top menu and the profile header as well as the rest of the layout, for the left column and right column which are expensive to compute you just render placeholders and then flush the content to the client so that the browser already renders this

still in the same http request you render out the
javascript needed to make the already rendered components work, so people can use the menu for example

still in the same http request you then compute the
data for the left column and render out some javascript that takes this data and renders it into the components template client side and then replaces the placeholder with the rendered template

still in the same request you then can do this
with the right column -> ﬂush content as early as possible, don't wait for the whole site to be computed

https://joind.in/10710

http://twitter.com/BastianHofmann http://lanyrd.com/people/BastianHofmann http://speakerdeck.com/u/bastianhofmann mail@bastianhofmann.de thanks, you can contact me on
any of these platforms or via mail.

Profiling PHP applications

Profiling PHP applications

More Decks by Bastian Hofmann

Other Decks in Programming

Featured

Transcript