Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Predicting Performance Changes of Distributed Applications

wrzasa
March 19, 2017

Predicting Performance Changes of Distributed Applications

This is my presentation from wroc_love.rb 2017 conference (http://wrocloverb.com) and EuRuKo 2017 (http://euruko2017.org)

The software used in the presentation is now opensource: https://github.com/wrzasa/rbsim/ Feel free to contact me if you need assistance.

wrzasa

March 19, 2017
Tweet

More Decks by wrzasa

Other Decks in Programming

Transcript

  1. PREDICTING PERFORMANCE
    CHANGES OF DISTRIBUTED
    APPLICATIONS
    Wojciech Rząsa
    @wrzasa
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  2. Passion for informatics
    PhD, but primarily an engineer
    Rzeszow University of Technology
    Research: distributed systems
    Teaching: Ruby, Rails, ...
    Rzeszow Ruby User Group
    ABOUT ME
    http://rrug.pl
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  3. THE GRID
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  4. (c) e-ScienceCity , .
    http://www.e-sciencecity.org/ Creative Commons Attribution-ShareAlike 3.0 Unported License
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  5. GRID MONITORING — OCM-G
    Debugging
    Interactive applications
    Shared infrastructure
    Distributed
    No central management
    Standard interface for tools
    Tight security requirements
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  6. SECURITY VS. PERFORMANCE
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  7. EMPOWER DEVELOPERS TO
    ASSESS PERFORMANCE!
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  8. AGENDA
    How to predict performance changes
    Basic example
    Two case studies
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  9. Rząsa W.: Timed colored Petri net based estimation of efficiency of the grid applications. Supervisor: E. Nawarecki, AGH-UST, Kraków, 2011.
    Baliś B., Bubak M., Rząsa W., Szepieniec T., Wismüller R.: Security in the OCM-G Grid Application Monitoring System. PPAM 2003, LNCS 3019,
    pp. 779-787, 2004, Eds. R. Wyrzykowski et al.
    Baliś B., Bubak M., Rząsa W., Szepieniec T., Wismüller R.: Two Aspects of Security Solution for Distributed Systems in the Grid on the Example
    of the OCM-G. In proc. of CGW'03, pp.197-206, Kraków 2004 ISBN 83-915141-3-7.
    Baliś B., Bubak M., Rząsa W., Szepieniec T.: Efficiency of the GSI Secured Network Transmission. ICCS 2004, LNCS 3036, p. 107-115, 2004, Eds.
    M. Bubak et al.
    Rząsa W., Bubak M., Baliś B., Szepieniec T.: Simulation Method for Estimation of Security Overhead of Grid Applications. In proc. of CGW'05,
    pp. 300-307, Kraków 2006 ISBN 83-915141-5-3, EAN 9788391514153.
    Rząsa W., Bubak M., Baliś B., Szepieniec T.: Overhead Verification for Cryptographically Secured Transmission in the Grid. Computing and
    Informatics, Vol. 26, 2007, 89-101.
    Rząsa W., Bubak M.: Application of Petri Nets to Evaluation of Grid Applications Efficiency. In proc. of CGW'08, pp. 261-269, Kraków 2009,
    ISBN 978-83-61433-00-2.
    Rząsa W.: Combining Timed Colored Petri Nets and Real TCP Implementation to Reliably Simulate Distributed Applications. CN 2009, CCIS 39,
    pp. 79-86, 2009, Eds. A. Kwiecień, P. Gaj, and P. Stera.
    Dec G, Jędrzejec B, Rząsa W.: Kolorowana sieć Petriego jako model systemu podejmowania decyzji kredytowej. STUDIA INFORMATICA 2010,
    Volume 31, Number 2A (89).
    Rząsa W., Bubak M.: Simulation Method Supporting Development of Parallel Applications for Grids. In proc. of CGW'10, pp. 194-201, Kraków
    2011, ISBN 978-83-61433-03-3.
    Dec G., Rząsa W.: Modelowanie wielowarstwowej rozproszonej aplikacji www z zastosowaniem TCPN. Praca zbiorowa pod red. L. Trybusa i
    S. Samoleja: Projektowanie, analiza i implementacja systemów czasu rzeczywistego, ISBN 878-83-206-1822-8, Wyd. Komunikacji i Łączności,
    Warszawa 2011, pp. 137-148.
    Rząsa W., Rzońca D., Stec A., Trybus B.: Analysis of Challenge-Response Authentication in a Networked Control System, in: Kwiecien A., Gaj
    P., and Stera P. (Eds.): Computer Networks 2012, Communications in Computer and Information Science 291, Springer-Verlag Berlin
    Heidelberg 2012, pp. 271-279.
    Rząsa W., Bubak M., Nawarecki E.: High-Level Model for Performance Evaluation of Distributed Applications, in: Balicki J., Krawczyk H.,
    Nawarecki E. (Eds.): Grid and Volunteer Computing, Gdansk University of Technology Faculty of Elektronics, Telecomunication and
    Informatics Press, Gdańsk 2012, pp. 7-23.
    Rząsa W.: Synchronization Algorithm for Timed Colored Petri Nets and Ns-2 Simulators, in: Kwiecień A., Gaj P., and Stera P. (Eds): CN2013,
    CCIS 370, pp. 1-10, Springer-Verlag Berlin Heidelberg, 2013, ISSN 1865-0929, ISBN 978-3-642-38864-4.
    Kowalski, M.; Rzasa, W., "Object-oriented approach to Timed Colored Petri Net simulation," Computer Science and Information Systems
    (FedCSIS), 2013 Federated Conference on, pp.1401,1404, 8-11 Sept. 2013, ISBN 978-1-4673-4471-5 (Web), 978-83-60810-53-8 (USB), IEEE Catalog
    Number: CFP1385N-ART (Web),CFP1385N-USB (USB) Jamro M., Rzońca D., Rząsa W.: Testing communication tasks in distributed control
    systems with SysML and Timed Colored Petri Nets model. Computers in Industry, Vol. 71, August 2015, pp. 77-87.
    Rząsa W.: "Simulation-Based Analysis of a Platform as a Service Infrastructure Performance from a User Perspective", P. Gaj et al. (Eds.): CN
    2015, CCIS 522, pp. 182–192, 2015 ISBN: 978-3-319-19418-9. Rząsa W., Rzońca D.: Event-Driven Approach to Modeling and Performance
    Estimation of a Distributed Control System, in: Gaj P., Kwiecień A., and Stera P. (Eds.): Computer Networks 2016, Communications in
    Computer and Information Science 608, Springer International Publishing 2016, pp. 168-179.
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  10. SOLUTION BASICS
    Simulation
    Model described in Ruby-based DSL
    Simulator based on a formalism
    Stats available via Ruby iterators
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  11. BASIC EXAMPLE
    Web server
    Web clients
    Resources
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  12. WEB SERVER
    program :apache do
    on_event :data_received do |data|
    stats_start server: :apache, name: process.name
    cpu do |cpu|
    (100 * data.size.in_bytes / cpu.performance).miliseconds
    end
    send_data to: data.src, size: data.size * 10,
    type: :response, content: data.content
    stats_stop server: :apache, name: process.name
    end
    end
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  13. WEB CLIENT
    program :wget do |opts|
    sent = 0
    on_event :send do
    # . . .
    end
    on_event :data_received do |data|
    # . . .
    end
    register_event :send
    end
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  14. WEB CLIENT — SEND
    program :wget do |opts|
    sent = 0
    on_event :send do
    cpu { |cpu| (150 / cpu.performance).miliseconds }
    send_data to: opts[:target], size: 1024.bytes,
    type: :request, content: sent
    sent += 1
    if sent < opts[:count]
    register_event :send, delay: 5.miliseconds
    end
    end
    on_event :data_received do |data|
    # . . .
    end
    register_event :send
    end
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  15. WEB CLIENT — RECEIVE
    program :wget do |opts|
    sent = 0
    on_event :send do
    # . . .
    end
    on_event :data_received do |data|
    log "Got data #{data} in process #{process.name}"
    stats event: :request_served, client: process.name
    end
    register_event :send
    end
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  16. RESOURCES
    node :desktop do
    cpu 100
    end
    node :gandalf do
    cpu 1400
    end
    net :net01, bw: 1024.bps
    net :net02, bw: 510.bps
    route from: :desktop, to: :gandalf,
    via: [ :net01, :net02 ], twoway: true
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  17. PROCESSES ON NODES
    new_process :client1, program: :wget,
    args: { target: :server, count: 10 }
    new_process :client2, program: :wget,
    args: { target: :server, count: 10 }
    new_process :server, program: :apache
    put :server, on: :gandalf
    put :client1, on: :desktop
    put :client2, on: :desktop
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  18. SAVE YOUR MODEL
    e.g. in model.rb
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  19. RUN SIMULATION
    class Experiment < RBSim::Experiment
    end
    params = { }
    sim = Experiment.new
    sim.run './model.rb', params
    sim.save_stats 'simulation.stats'
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  20. PROCESS SIMULATION STATS
    class Experiment < RBSim::Experiment
    def print_req_times_for(s)
    app_stats.durations(server: s) do |tags, start, stop|
    puts "Req. time #{(stop - start).in_miliseconds} ms."
    end
    end
    end
    all_stats = Experiment.read_stats 'simulation.stats'
    first_experiment = all_stats.first
    first_experiment.print_req_times_for(:apache)
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  21. BASIC EXAMPLE SUMMARY
    Model in DSL
    No boilerplate
    Web server and client (~30 LoC)
    Resources (~12 LoC)
    Mapping processes to resources (~6 LoC)
    Running simulation (~8 LoC)
    Loading saved stats (~3 LoC)
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  22. CASE STUDY #1
    RAPGENIUS VS. HEROKU
    FEBRUARY 2013
    https://genius.com/James-somers-herokus-ugly-
    secret-annotated
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  23. HEROKU HTTP "ROUTING"
    "Intelligent"
    Random
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  24. FOR HEROKU
    Perfect scalability
    Don't have to detect idle/busy dynos
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  25. FOR A CLIENT
    Is "random routing" worse?
    How much?
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  26. https://genius.com/James-somers-herokus-ugly-
    secret-annotated
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  27. -- E. Dijkstra
    In the good old days physicists
    repeated each other's experiments,
    just to be sure. Today they stick to
    FORTRAN, so that they can share each
    other's programs, bugs included.
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  28. ITEMS TO MODEL
    Random HTTP router
    "Intelligent" HTTP
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  29. RANDOM HTTP ROUTER
    program :random_router do |servers|
    on_event :data_received do |data|
    if data.type == :request
    server = servers.sample
    send_data to: server, size: data.size, type: :request,
    content: { from: data.src, content: data.content }
    elsif data.type == :response
    send_data to: data.content[:from], size: data.size,
    type: :response,
    content: data.content[:content]
    else
    raise "Unknown data type #{data.type} received " +
    "by #{process.name}"
    end
    end
    end
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  30. INTELLIGENT HTTP ROUTER
    program :router do |servers|
    request_queue = []
    on_event :data_received do |data|
    # . . .
    end
    on_event :process_request do
    # . . .
    end
    end
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  31. INTELLIGENT HTTP ROUTER
    on_event :data_received do |data|
    if data.type == :request
    request_queue << data
    register_event :process_request
    elsif data.type == :response
    servers << data.src
    send_data to: data.content[:from], size: data.size,
    type: :response, content: data.content[:content]
    register_event :process_request
    else
    raise "Unknown data type #{data.type} received " +
    "by #{process.name}"
    end
    end
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  32. INTELLIGENT HTTP ROUTER
    on_event :process_request do
    unless servers.empty? or request_queue.empty?
    data = request_queue.shift
    server = servers.shift
    send_data to: server, size: data.size, type: :request,
    content: { from: data.src, content: data.content }
    unless request_queue.empty?
    register_event :process_request
    end
    end
    end
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  33. RESOURCES
    (EXAMPLE)
    servers.each do |s|
    node s do
    cpu 1
    end
    new_process s, program: :webserver,
    args: { request_times: params[:request_times] }
    put s, on: s
    end
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  34. RESULTS
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  35. Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  36. Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  37. Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  38. WANT APDEX?
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  39. Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  40. WHAT IF?
    we had more independent "intelligent" routers?
    good scalability + better performance for users?
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  41. Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  42. Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  43. Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  44. Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  45. Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  46. Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  47. CASE STUDY #1 SUMMARY
    Reusability (model items)
    Flexibility (arbitrary algorithms in routers)
    Different levels of details for results
    histograms
    apdex
    What ifs
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  48. CASE STUDY #2
    TO SCALE HEROKU APPLICATION ...
    ...OR NOT TO SCALE?
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  49. HEROKU DYNOS
    Name RAM CPU
    share
    Compute Price per
    dyno-month
    standard-
    1x
    512MB 1x 1x-4x $25
    standard-
    2x
    1024MB 2x 4x-8x $50
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  50. API BACKEND
    Rails
    Unicorn
    CPU intensive
    6 standard-1x dynos
    scale to 3 standard-2x dynos?
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  51. UNICORN
    Master process
    Balancing load of worker processes
    Like Heroku's old "intelligent router"!
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  52. APPLICATION SCALING
    2x faster dynos
    2x fewer dynos
    More Unicorn workers per dyno
    Same number of Unicorn workers per application
    Same price
    More RAM for peaks
    Better load balancing?
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  53. APPLICATION PARAMETERS
    Load (req/min)
    Response times (distribution)
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  54. ITEMS TO MODEL
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  55. REUSED ITEMS
    HTTP client
    HTTP server
    Random HTTP router
    Unicorn master process (Intelligent Heroku router)
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  56. MODELING DYNOS
    OR
    node :standard1x do
    cpu 1
    end
    node :standard2x do
    cpu 2
    end
    node :standard2x do
    cpu 1
    cpu 1
    end
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  57. HEROKU DYNOS
    Name RAM CPU
    share
    Compute Price per
    dyno-month
    standard-
    1x
    512MB 1x 1x-4x $25
    standard-
    2x
    1024MB 2x 4x-8x $50
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  58. DYNO SCALING
    HORIZONTAL OR VERTICAL?
    DOES IT MATTER!?
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  59. SIMULATE
    AND
    node :standard2x do
    cpu 2
    end
    node :standard2x do
    cpu 1
    cpu 1
    end
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  60. HORIZONTAL DYNO SCALING
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  61. VERTICAL DYNO SCALING
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  62. IT DOES MATTER HOW DYNOS
    ARE SCALED!
    HOW TO FIND OUT?
    documentation does not help...
    cat /proc/cpuinfo does not help...
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  63. EXPERIMENT
    (ON HEROKU DYNOS)
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  64. SINGLE CPU INTENSIVE TASK
    Comparable time on both dyno types
    def cpu_intensive_task(n)
    start = Time.now
    (1..n).reduce(:*)
    Time.now - start
    end
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  65. 16 CPU INTENSIVE TASKS AT
    ONCE
    on standard-1x (2 CPUs)
    on standard-2x (4 CPUs)
    real 1m8.690s
    user 2m13.360s
    sys 0m3.871s
    real 0m29.182s
    user 2m17.570s
    sys 0m4.053s
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  66. CONCLUSION
    Dynos are scaled horizontally (more CPUs)
    We shouldn't change dyno config
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  67. PAY THE BILL!
    $0.09
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  68. CASE STUDY #2 SUMMARY
    Modeling with reusable components
    Simulation-tested alternative configurations
    Simple, cheap experiments to verify crucial factors
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide

  69. SUMMARY
    Easier, cheaper, faster
    DSL, no boilerplate code
    What ifs
    No magic — just software science
    Simulation as a Service ;-)
    Rubber duck
    Wojciech Rząsa @wrzasa Predicting Performance Changes of Distributed Applications

    View Slide