Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[DevOpsCon] Developer Targeted Performance Analytics

[DevOpsCon] Developer Targeted Performance Analytics

Slides on the session on "Developer Targeted Performance Analytics" at DevOpsCon'17 in Berlin, Germany

Jürgen Cito

June 14, 2017
Tweet

More Decks by Jürgen Cito

Other Decks in Programming

Transcript

  1. Developer Targeted Performance Analytics
    Supporting Software Development Decisions with Runtime Information
    Jürgen Cito

    DevOpsCon’17, Berlin, Germany
    Source: https://flic.kr/p/bXf4vw
    @citostyle
    Source: https://flic.kr/p/ebdWmN by David

    View full-size slide

  2. 2
    Distributed Architectures
    Multiple Technologies
    Infrastructure at Scale
    Scattered Information
    Figure taken from Chen et al. A provider-side viewof web search response time. 2013.
    Monitoring & Analysis of Complex (Cloud) Systems

    View full-size slide

  3. Code Artifacts
    Deployment
    .......
    [26/06/2015:21205.0], responseTime, "CustomerService", 204
    [26/06/2015:21215.0], responseTime, "CustomerService", 169
    [26/06/2015:21216.0], cpuUtilization, "CustomerServiceVM2", 0.73
    [26/06/2015:21216.0], cpuUtilization, "CustomerServiceVM1", 0.69
    [26/06/2015:21216.1], vmBilled, "CustomerServiceVM1", 0.35
    [26/06/2015:21219.4], ids, "ids", [1,16,32,189,216]
    ........
    Operations Data
    observe
    Context
    Cloud Infrastructure
    VM1
    VM2
    Supplier
    Service
    User
    Interface
    Purch
    S

    View full-size slide

  4. Scenario

    (Overview)
    Software Engineer
    VoIP/Collaboration App

    (Industrial Use Case)
    Adapted from https://xkcd.com/228/

    View full-size slide

  5. Scenario

    (Change Request)

    View full-size slide

  6. Scenario

    (Code Change)

    View full-size slide

  7. Scenario

    (Check-List)
    Change…
    …passes functional tests
    …passes non-functional (performance) tests
    …goes through Continuous Integration
    …gets deployed in production
    …leads to severe performance degradation

    View full-size slide

  8. Scenario

    (What was the Problem?)
    > Login method for 1 users/team included a service call 

    to 3rd party provider to load profile pictures


    > Login was not able to scale gracefully to multiple teams
    > Tests failed to simulate the situation

    View full-size slide

  9. Problem & Conjecture
    Data combinatorics of configuration in production
    environments are different to what profilers and tests can
    simulate either locally or in staging environments.
    Especially in the cloud, scalable infrastructure requires
    approaches to leverage information gathered at production
    runtime and provide feedback to software engineers during
    development.

    View full-size slide

  10. Developer Targeted
    Performance Analytics
    What tools and data are being leveraged for decision making in
    software development for the cloud?
    How can we use runtime information to support data-driven
    decision making for engineers during software development
    guides

    View full-size slide

  11. What tools and data are being leveraged
    for decision making in software
    development for the cloud?

    View full-size slide

  12. Study on Software Development for the Cloud
    “The Making of Cloud Applications” - FSE’15
    Adapted from https://xkcd.com/1423/
    Interview with 25 software developers that deploy in the cloud
    Survey with 294 responses
    Developer
    Me

    View full-size slide

  13. Study on Software Development for the Cloud
    “The Making of Cloud Applications” - FSE’15

    View full-size slide

  14. Study on Software Development for the Cloud
    “The Making of Cloud Applications” - FSE’15
    62% say more metrics are available in the cloud
    &
    84% say they look at performance metrics
    on a regular basis

    View full-size slide

  15. Study on Software Development for the Cloud
    “The Making of Cloud Applications” - FSE’15

    View full-size slide

  16. Study on Software Development for the Cloud
    “The Making of Cloud Applications” - FSE’15
    Adapted from https://xkcd.com/1423/
    Topic: Solving problems that have been detected in production
    Nah, I rather go
    by intuition?
    Do you look at
    any metrics?

    View full-size slide

  17. Log Overload

    View full-size slide

  18. Log Overload Diarrhea

    View full-size slide

  19. State-of-the-Art
    SaaS tools
    Open Source

    View full-size slide

  20. How can we use runtime information to
    support data-driven decision making for
    engineers during software development?

    View full-size slide

  21. Developer Targeted
    Performance Analytics
    integrating metrics into
    daily developer workflows
    prevent bad things
    from happening
    bring metrics
    into context

    View full-size slide

  22. Conceptual Overview

    “Runtime Metric Meets Developer” - SPLASH Onward’15
    Paper: https://peerj.com/preprints/985/

    Blog Post on #themorningpaper

    https://blog.acolyer.org/2015/11/10/runtime-metric-meets-developer-building-better-cloud-applications-using-feedback/

    View full-size slide

  23. Code Artifacts
    Deployment
    .......
    [26/06/2015:21205.0], responseTime, "CustomerService", 204
    [26/06/2015:21215.0], responseTime, "CustomerService", 169
    [26/06/2015:21216.0], cpuUtilization, "CustomerServiceVM2", 0.73
    [26/06/2015:21216.0], cpuUtilization, "CustomerServiceVM1", 0.69
    [26/06/2015:21216.1], vmBilled, "CustomerServiceVM1", 0.35
    [26/06/2015:21219.4], ids, "ids", [1,16,32,189,216]
    ........
    Operations Data
    observe
    readConnecti
    on
    getConnectio
    ns
    ids
    connectionPo
    ol
    readConnecti
    on
    getConnectio
    ns
    ids
    connectionPo
    ol
    readConnection
    getConnections
    ids
    connectionPool
    Runtime Metric
    Annotated AST
    Conceptual Overview

    Cloud Infrastructure
    VM1
    VM2
    Supplier
    Service
    User
    Interface
    Purch
    S
    “Runtime Metric Meets Developer” - SPLASH Onward’15

    View full-size slide

  24. Conceptual Overview

    “Runtime Metric Meets Developer” - SPLASH Onward’15
    Abstract Syntax Tree (AST)
    IDE Distributed Runtime Traces
    Feedback Mapping
    Runtime Metric Annotated AST
    Performance Augmented
    Source Code
    ?
    Prediction through
    Impact Analysis
    .......
    [26/06/2015:21205.0], responseTime, “showConnections, 204
    [26/06/2015:21215.0], responseTime, “setConnectionImage, 169
    [26/06/2015:21216.0], responseTime, “PaymentService”, 79
    [26/06/2015:21216.0], cpuUtilization, “ConnectionsVM1", 0.69
    [26/06/2015:21216.1], vmBilled, "CustomerServiceVM1", 0.35
    [26/06/2015:21219.4], ids, "ids", [1,16,32,189,216]
    ........
    .......
    [26/06/2015:21205.0], responseTime, “showConnections, 204
    [26/06/2015:21215.0], responseTime, “setConnectionImage, 169
    [26/06/2015:21216.0], responseTime, “PaymentService”, 79
    [26/06/2015:21216.0], cpuUtilization, “ConnectionsVM1", 0.69
    [26/06/2015:21216.1], vmBilled, "CustomerServiceVM1", 0.35
    [26/06/2015:21219.4], ids, "ids", [1,16,32,189,216]
    ........
    .......
    [26/06/2015:21216.0], cpuUtilization, “ConnectionsVM2", 0.73
    [26/06/2015:21216.0], cpuUtilization, “ConnectionsVM1", 0.69
    [26/06/2015:21216.1], vmBilled, “PaymentServiceVM, 0.35
    [26/06/2015:21219.4], ids, “connectionIDs, [1,16,32,189,216]
    ........

    View full-size slide

  25. readConnections
    connections
    getConnections
    getImage
    setConnectionImage
    setConnectionStatus
    .......
    [26/06/2015:21205.0], responseTime, “showConnections, 204
    [26/06/2015:21215.0], responseTime, “setConnectionImage, 169
    [26/06/2015:21216.0], responseTime, “PaymentService”, 79
    [26/06/2015:21216.0], cpuUtilization, “ConnectionsVM1", 0.69
    [26/06/2015:21216.1], vmBilled, "CustomerServiceVM1", 0.35
    [26/06/2015:21219.4], ids, "ids", [1,16,32,189,216]
    ........
    .......
    [26/06/2015:21216.0], cpuUtilization, “ConnectionsVM2", 0.73
    [26/06/2015:21216.0], cpuUtilization, “ConnectionsVM1", 0.69
    [26/06/2015:21216.1], vmBilled, “PaymentServiceVM, 0.35
    [26/06/2015:21219.4], ids, “connectionIDs, [1,16,32,189,216]
    ........
    Feedback Specification

    View full-size slide

  26. Industrial Case Study: SAP HANA WebIDE

    Performance Spotter

    View full-size slide

  27. Industrial Case Study: SAP HANA WebIDE

    Performance Spotter

    View full-size slide

  28. New
    Code
    overallRating
    readConnection
    Code
    Change
    overallRating
    readConnection
    size:
    suppliers
    getSuppliers Loop:suppliers
    getPurchaseRating
    ?

    View full-size slide

  29. overallRating
    readConnection
    size:
    suppliers
    getSuppliers Loop:suppliers
    getPurchaseRating
    [Statistical Inference]
    New
    Code
    [Feedback Propagation]
    Predicted Entity

    View full-size slide

  30. Live Performance Prediction
    (PerformanceHat)

    View full-size slide

  31. PoC Implementation of the approach as an Eclipse plugin
    > Performance Awareness
    > Contextualization
    31
    PerformanceHat

    (Industrial Case Study: Cloudmore)

    View full-size slide

  32. PerformanceHat

    (Industrial Case Study: Cloudmore)
    > Instant performance feedback prediction on code changes

    View full-size slide

  33. Does it scale?

    View full-size slide

  34. Matching information to source code is part
    of the build process in the IDE
    Scaling Runtime Performance Matching
    Matching information to source code is part
    of the daily developer workflow
    It cannot be slow!
    34

    View full-size slide

  35. Scaling Runtime Performance Matching
    Component Analysis: Which of the components could 

    impede the development workflow?
    Every build needs to:

    > Identify and extract relevant AST nodes
    > Retrieve information for each relevant node
    > Execute inference for new nodes with unknown properties
    > Propagate predicted entity
    35

    View full-size slide

  36. Scaling Runtime Performance Matching
    Component Analysis: Which of the components could 

    impede the development workflow?
    Analysis needs to take into account different scopes

    > Single File Builds
    > Incremental Builds
    > Full Project Builds
    36

    View full-size slide

  37. Scaling Runtime Performance Matching
    Component Analysis: Which of the components could 

    impede the development workflow?
    Potentially problematic
    37

    View full-size slide

  38. Scaling Runtime Performance Matching
    Revamped architecture to enable scalability
    Local Workstation
    Local Feedback
    Handler
    Local IDE
    Cache
    Datastore
    HTTP
    Infrastructure 1
    Deployed
    Feedback
    Handler
    Deployed
    System
    Transform
    IDE
    Infrastructure 2
    Deployed
    Feedback
    Handler
    Deployed
    System
    Transform
    Infrastructure n
    Deployed
    Feedback
    Handler
    Deployed
    System
    Transform
    ….
    HTTP
    Stream
    (Apache
    Kafka
    Broker)
    HTTP
    Specification
    Function
    Inference
    Function
    Registered
    Filters

    View full-size slide

  39. Scaling Runtime Performance Matching
    Component Analysis: Measurements
    Agilefant Full EU Project Full Story Single File Controller Single File
    Cache / Cold
    Cache / Warm
    No Cache / Cold
    No Cache / Warm
    0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
    Build Time [%]
    Scenario
    Prediction
    Marking
    Fetching
    Other
    Attaching

    View full-size slide

  40. Does it help developers?

    View full-size slide

  41. Controlled User Study 

    Overview
    “Classical” software engineering user study:
    > Between Subject Study

    > 20 software engineers (min. 1 year of experience)
    > Relevant study subject: Agilefant
    > 4 Tasks (both related and not related to performance bugs)
    Control Group (Kibana) Treatment Group (PerformanceHat)
    41

    View full-size slide

  42. Controlled User Study 

    Hypotheses & Metrics
    H01: Given a maintenance task that would introduce a
    performance bug, software engineers using PerformanceHat
    are faster in detecting the performance bug

    H02: Given a maintenance task that would introduce a
    performance bug, software engineers using PerformanceHat
    are faster in finding the root cause of the performance bug

    H03: Given a maintenance task that is not relevant to
    performance, software engineers using PerformanceHat are
    not slower than the control group in solving the task
    [Metric: First Encounter (FE)]
    [Metric: Root-Cause Analysis (RCA)]
    [Metric: Development Time]
    42

    View full-size slide

  43. Controlled User Study 

    Results
    43
    100 200 300 400 500
    Time (in seconds)
    T1 (Total) T2 (Total) T2 (FE) T2 (RCA) T3 (Total) T4 (Total) T4 (FE) T4 (RCA)
    Control
    Treatment

    View full-size slide

  44. @citostyle
    Jürgen Cito
    Developer Targeted Performance Analytics
    leverages runtime data by matching performance
    metrics to source code artefacts to support
    decision-making during software development
    For which (quality) attributes does it make sense 

    to correlate with source code artefacts?
    How much feedback is too much feedback?
    http://sealuzh.github.io/PerformanceHat/

    View full-size slide