Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Profiling PHP applications

Profiling PHP applications

It's nothing new that speed is important for the success of any web application. Only a few hundred milliseconds may lie between a user leaving your site or staying. Unfortunately performance problems are oftentimes hard to fix and even harder to pinpoint.

In this talk I will show you how we at ResearchGate measure web application performance, which means not only timing how long the PHP backend took to deliver a page, but also tracking the speed the users actually perceives in the browser. After that you will see how you can track down and analyze any problems you found through measuring with the help of tools like Xdebug, XHProf and the Symfony Debug Toolbar. And if you still need to get faster after optimizing and fixing all these issues, I'll introduce you to some tricks, techniques and patterns to even further decrease load times.

Bastian Hofmann

November 07, 2013
Tweet

More Decks by Bastian Hofmann

Other Decks in Programming

Transcript

  1. My site is slow,
    what can I do?
    @BastianHofmann
    Profiling PHP applications

    View Slide

  2. This talk is all
    about...
    ...

    View Slide

  3. Speed. And with that I mean, not the drug, the movie or any game, but

    View Slide

  4. Speed of your web
    application
    the ..

    View Slide

  5. We'll talk about...

    View Slide

  6. Why it matters
    and why you should care about it

    View Slide

  7. How to measure it
    what actually is pagespeed and ...

    View Slide

  8. How to find out
    where the problems
    are
    and ..

    View Slide

  9. Before we start
    ...

    View Slide

  10. A few words
    about me
    ...

    View Slide

  11. I work at ResearchGate, the professional network for scientists and researchers

    View Slide

  12. ResearchGate gives
    science back to the
    people who make it
    happen.
    We help researchers
    build reputation and
    accelerate scientific
    progress.
    On their terms.

    the goal is to give...

    View Slide

  13. over 3 million users

    View Slide

  14. here some impressions of the page

    View Slide

  15. you may have read about us in the news recently

    View Slide

  16. http://gigaom.com/2013/06/05/heres-how-bill-gatess-
    researchgate-investment-might-change-the-world-for-the-
    better
    http://venturevillage.eu/researchgate

    View Slide

  17. have this, and also work on some cool stuff

    View Slide

  18. we are hiring

    View Slide

  19. Questions? Ask
    by the way, if you have any questions throughout this talk, if you don't understand
    something, just raise your hand and ask.

    View Slide

  20. http://speakerdeck.com/u/bastianhofmann
    the slides will be available on speakerdeck

    View Slide

  21. Why should you
    care?
    so, pagespeed...

    View Slide

  22. It is really important
    of course because ...

    View Slide

  23. View Slide

  24. but seriously, in the last years multiple studies were made on the importance of speed for a
    web application. how does it affect usage and conversion? how long are people waiting for
    content? how does it affect sales? if people left because the site is slow, are they coming
    back?

    View Slide

  25. Every ms counts
    in short, the result of every study was the same: ...

    View Slide

  26. in detail

    View Slide

  27. in detail

    View Slide

  28. ..

    View Slide

  29. So what is
    pagespeed?

    View Slide

  30. Server
    the first thing the contributes to pagespeed is what happens on the server. this is also the
    easiest part, because it is completely under our control

    View Slide

  31. Your PHP application
    Request
    Response
    so what your server and your application is doing between incoming request and outgoing
    response

    View Slide

  32. Your PHP application
    Request
    Response
    Load balancer
    though this does not mean only your application, but also the rest of your setup, like a
    loadbalancer

    View Slide

  33. Your PHP application
    Request
    Response
    Load balancer
    and also your application is probably not a single small php script, but a big application with
    multiple components that each can affect speed differently. so getting a more detailed view
    on these components might be interesting as well. more to that later

    View Slide

  34. web server db
    http service
    http service
    cache
    user request
    additionally most bigger applications have some kind of a service oriented architecture, same
    things apply here. knowing about the speed of the different services is important.

    View Slide

  35. what is your web application like?

    View Slide

  36. But there is more
    ... your application does not stop at your server. somehow it needs to get to your user

    View Slide

  37. so internet connectivity is also a big part, contributing to pagespeed, that means everything
    from dns lookup, over ssl handshake to actually transporting the content over the wire

    View Slide

  38. when your user received the content, he needs to display it. and some browsers are way
    slower than others in doing it

    View Slide

  39. so the dom needs to be rendered, css fetched and applied, images loaded

    View Slide

  40. and of course nearly no web application comes without javascript. this needs to be loaded
    and executed as well

    View Slide

  41. http://www.stevesouders.com/blog/2012/02/10/the-performance-golden-rule/

    View Slide

  42. So what happens
    on your server is
    really iportant

    View Slide

  43. But the rest as well
    my point is: what is important is the pagespeed your user perceives. this contains everything
    from server to his browser. in the end it's your fault if the site is slow, even if the user's
    computer and browser is crappy.

    View Slide

  44. Step 1
    how to deal with this...

    View Slide

  45. Measuring
    measuring your actual pagespeed

    View Slide

  46. who is doing it?

    View Slide

  47. if you start: it's going to hurt

    View Slide

  48. because although everything seems to be fine on your fast, 2 month old machine, with lot's
    of ram, cpu power, latest chrome, from your 50mbit vdsl connection with a very low ping to
    your data center.

    View Slide

  49. reality is: Old computers

    View Slide

  50. Old feature phones, slower smart phones, mobile networks in general (edge anyone?)

    View Slide

  51. old browsers

    View Slide

  52. people in countries with big latencies to your datacenter and/or slow internet connections
    (rember dial up)

    View Slide

  53. So you need to
    measure at your
    users side

    View Slide

  54. Navigation Timing
    API
    https://developer.mozilla.org/en-US/docs/
    Navigation_timing
    there is actually a great javascript up that is supported by a lot of modern browsers

    View Slide

  55. you get timestamps for all important events of a pageload

    View Slide

  56. DEMO

    View Slide

  57. For older browsers
    you have to do it
    yourself
    though ...
    e.g. by manually measuring the time with javascript and on the server. this is actually kind of
    hard (clock offsets, etc)

    View Slide

  58. Or

    View Slide

  59. https://github.com/lognormal/boomerang
    use something of people who already did this for you

    View Slide

  60. Getting it back to
    the server
    so now you have all the timestamps in your javascript, you need to ...

    View Slide

  61. logstash
    http://logstash.net/
    enter the next tool, logstash is a very powerful tool to handle log processing

    View Slide

  62. input filter output
    basic workflow is that you have some input where logstash gets log messages in, on this
    input you can execute multiple filters that modify the message and then you can output the
    filtered message somewhere

    View Slide

  63. Very rich plugin
    system
    to do this it offers a very large and rich plugin system for all kinds of inputs, filters and
    outputs, and you can also write your own

    View Slide

  64. browser
    JS: boomerang
    logstash
    trackingServer
    access log
    requests
    tracking image
    with timing
    information as
    query
    parameters
    for our purpose we can just have boomerang in the browser collect the timestamps and then
    send a small tracking request (inserting an image) to a tracking server. the timestamps are
    added as query parameters to this request. the server only returns an empty image and logs
    the request to his access log which logstash can parse.

    View Slide

  65. Graphing it
    we want to...

    View Slide

  66. Graphite
    http://graphite.wikidot.com/
    again there are many tools available to collect and display these metrics: one i want to
    highlight is graphite

    View Slide

  67. graphite comes with a powerful interface where you can plot and aggregate this data into
    graphs and perform different mathematical functions on it to get it exactly the way you want
    to display your data

    View Slide

  68. browser
    JS: boomerang
    logstash
    trackingServer
    access log
    requests
    tracking image
    with timing
    information as
    query
    parameters
    graphite
    statsd
    and statsd is a small load balancing daemon for it. so this is your setup logstash sends these
    timestamps to statsd who aggregates the information and sends them to graphite

    View Slide

  69. input {
    file {
    type => "pagespeed-access"
    path => [ "/var/log/nginx/
    access_log/monitoring-access.log" ]
    }
    }
    in logstash this is how to get the date from the log into logstash

    View Slide

  70. filter{
    grok {
    type => "pagespeed-access"
    pattern => "^.*\s\"[A-Z]+\s[^\?\s]+
    \?page=%{DATA:page}\&connectTime=%
    {NUMBER:connectTime}...)?\sHTTP\/\d\.\d\".*
    $"
    }
    grok {
    type => "pagespeed-access"
    match => ["page", "^(profile|
    home|...)\.logged(In|Out)$"]
    exclude_tags =>
    ["_grokparsefailure"]
    }
    }
    you can apply filters to put it into a structured form and validate it

    View Slide

  71. output {
    statsd {
    type => "pagespeed-access"
    exclude_tags =>
    ["_grokparsefailure"]
    host => "localhost"
    port => 8126
    namespace => "pagespeed"
    sender => ""
    timing => [
    "%{page}.connect", "%
    {connectTime}", ...
    ]
    }
    }
    and the put the data somehwere else. here we are sending it to statsd.
    what's that

    View Slide

  72. and thats what such graphes then may look like

    View Slide

  73. Can we measure
    more?
    I said earlier we may need information about services etc

    View Slide

  74. Load balancer
    in a soa architecture we can do something similar with the access logs of our services, which
    also have timing information. or if we have a load balancer in between as well. we can get
    useful information from there

    View Slide

  75. Example: HAProxy
    you can get the time of the request, time spent in haproxy queues etc.

    View Slide

  76. input {
    file {
    type => "haproxy-http-log"
    path => [ "/var/log/
    haproxy-http.log*" ]
    }
    }
    example config

    View Slide

  77. filter {
    grok {
    type => "haproxy-http-log"
    pattern => "%{HAPROXYHTTP}"
    }
    mutate {
    type => "haproxy-http-log"
    gsub => [
    "server_name", "\.", "_",
    "client_ip", "\.", "_"
    ]
    }
    }
    example config

    View Slide

  78. output {
    statsd {
    type => "haproxy-http-log"
    exclude_tags => ["_grokparsefailure"]
    host => "localhost"
    port => 8125
    namespace => "lb"
    sender => ""
    increment => [
    ! ! "haproxy.%{backend_name}.%{server_name}.%
    {client_ip}.hits",
    ! ! "haproxy.%{backend_name}.%{server_name}.%
    {client_ip}.responses.%{http_status_code}"
    ! ! ]
    timing => [
    "haproxy.%{backend_name}.%{server_name}.%
    {client_ip}.time_request", "%{time_request}",
    "haproxy.%{backend_name}.%{server_name}.%
    {client_ip}.time_backend_connect", "%{time_backend_connect}",
    "haproxy.%{backend_name}.%{server_name}.%
    {client_ip}.time_backend_response", "%{time_backend_response}",
    "haproxy.%{backend_name}.%{server_name}.%
    {client_ip}.time_queue", "%{time_queue}",
    "haproxy.%{backend_name}.%{server_name}.%
    {client_ip}.time_duration", "%{time_duration}"
    ]
    }
    }
    example config

    View Slide

  79. browser
    JS: boomerang
    logstash
    trackingServer
    access log
    requests
    tracking image
    with timing
    information as
    query
    parameters
    graphite
    statsd
    logstash
    load balancer
    access log
    logstash
    service
    access log
    logstash can analyse these logs and send them to statsd as well

    View Slide

  80. From within your
    PHP app
    What is also useful is to measure certain things from within your php app, e.g. rendering
    time. time database requests took. time spent of certain business logic etc. you can either
    just log this to a file and use the same logstash mechanism, or if you just need to have it for
    debugging, do it differently. more to that later.

    View Slide

  81. Only overall?

    View Slide

  82. By pages
    but you should not only measure all your request, you should differentiate by...

    View Slide

  83. By browser

    View Slide

  84. By country

    View Slide

  85. Logged in / out

    View Slide

  86. Heavy users
    .. or just everything that makes sense for you

    View Slide

  87. Define goals
    and after measuring everything and you see that you are slow somehwere, you should define
    goals, what performance you want to reach

    View Slide

  88. Step 2
    but before you can start fixing stuff, ...

    View Slide

  89. How to find out
    where the problems
    are
    finding out ...

    View Slide

  90. Profiling
    you can do this through ...

    View Slide

  91. first tool usefull for this is ...
    xdebug has quite a few functionalities like offering the ability to make breakpoints in your
    code, nicer error displays and so on. but one is also profiling of your app

    View Slide

  92. xdebug.profiler_enable_trigger = 1
    http://url?XDEBUG_PROFILE
    you can either activate profiling for every request or selectively for all requests that have a
    GET, POST or COOKIE parameter called XDEBUG_PROFILE

    View Slide

  93. Webgrind
    https://github.com/jokkedk/webgrind
    this write so called cachegrind files. in order to view this you can use tools like kcachegrind
    or the easiest one ...

    View Slide

  94. you can see everything that happend in this request, every function that was invoked, how
    often this was and how long it took.

    View Slide

  95. DEMO

    View Slide

  96. Use it locally on
    your dev machine
    one thing with xdebug, it slows down php, so .. but not in production

    View Slide

  97. XHProf
    https://github.com/facebook/xhprof
    for production there is xhprof, developed by facebook

    View Slide

  98. Use it in production
    for a subset of
    requests
    you can safely use it in production, it comes with a performance overhead but only when
    used, so you can activate it, but only use for a small percentage of requests or when manually
    activated (e.g. by a cookie).

    View Slide

  99. XHGUI
    https://github.com/preinheimer/xhgui
    to display the xhprof profiles, there is a nice tool called xhgui

    View Slide

  100. in addition to the normal thinkgs like seeing

    View Slide

  101. the whole callstack

    View Slide

  102. you can also visualize this in a graph

    View Slide

  103. and can do analysis over multiple requests and compare them to each other

    View Slide

  104. DEMO

    View Slide

  105. Symfony Debug
    Toolbar
    i said earlier, that there is another good way to get information about your applications
    internals, especially if you only need it for debugging and not in a graph. this is with the..

    View Slide

  106. it comes with standard symfony and you probably all have seen it already

    View Slide

  107. you can click on it and it gives you nice detailed information about the request. stuff like
    doctrine queries, a nice timeline, exceptions, routing, events etc.

    View Slide

  108. DEMO

    View Slide

  109. Extend it
    http://symfony.com/doc/current/cookbook/profiler/
    data_collector.html
    but did you know that you can extend it? there are some good ready made extensions
    available, e.g. for caching, http calls, versioning etc. just check packagist, but you can also
    write your own easily.

    View Slide

  110. here are some examples how we at researchgate extended it (disclaimer: we are not even
    using full symfony, but only some components).

    View Slide

  111. View Slide

  112. View Slide

  113. View Slide

  114. View Slide

  115. View Slide

  116. DEMO

    View Slide

  117. Step 3
    now that you have all this debugging information to pinpoint your bottlenecks, let's get to ...

    View Slide

  118. Fix it

    View Slide

  119. That's someting you
    have to do
    unfortunately ... since it is very dependent on your application and your setup

    View Slide

  120. View Slide

  121. Remember
    but

    View Slide

  122. Speed matters

    View Slide

  123. http://twitter.com/BastianHofmann
    http://lanyrd.com/people/BastianHofmann
    http://speakerdeck.com/u/bastianhofmann
    [email protected]
    thanks, you can contact me on any of these platforms or via mail.

    View Slide