$30 off During Our Annual Pro Sale. View Details »

Observability for serverless apps: What should you look at?

Observability for serverless apps: What should you look at?

ServerlessDays, Melbourne, August 29th, 2019

How deeply can you understand what is happening inside your application, from a technical and business point of view? Serverless apps use a distributed architecture. It’s critical to have end-to-end observability of each component and the communications between them in order to quickly identify and debug issues.

In this session, we show how to have the necessary instrumentation and how to use the data you collect to have a better grasp of your production environment. We’ll see how to collects monitoring and operational data in the form of logs, metrics, and events, providing you with a unified view of resources, applications, and services. In this way, you’ll be able to identify and troubleshoot the root cause of performance issues and errors with an end-to-end view of requests as they travel through your application. Examples are based on AWS.

Danilo Poccia

August 29, 2019
Tweet

More Decks by Danilo Poccia

Other Decks in Programming

Transcript

  1. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Danilo Poccia
    Principal Evangelist, Serverless
    @danilop
    Observability for Serverless apps:
    What should you look at

    View Slide

  2. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Monolithic
    Application
    Services Microservices

    View Slide

  3. © 2019, Amazon Web Services, Inc. or its Affiliates.
    “Complexity arises when
    the dependencies among
    the elements become
    important.”
    Scott E. Page, John H. Miller
    Complex Adaptive Systems

    View Slide

  4. © 2019, Amazon Web Services, Inc. or its Affiliates.
    How Amazon SQS works
    Front End
    Back End
    Metadata
    Amazon
    DynamoDB
    Load
    Manager

    View Slide

  5. © 2019, Amazon Web Services, Inc. or its Affiliates.
    © 2019, Amazon Web Services, Inc. or its Affiliates.
    “A complex system that
    works is invariably found
    to have evolved from a
    simple system that
    worked.”
    Gall’s Law

    View Slide

  6. © 2019, Amazon Web Services, Inc. or its Affiliates.
    © 2019, Amazon Web Services, Inc. or its Affiliates.
    “A complex system
    designed from scratch
    never works and cannot
    be patched up to make it
    work. You have to start
    over with a working
    simple system.”

    View Slide

  7. © 2019, Amazon Web Services, Inc. or its Affiliates.
    “Amazon S3 is intentionally
    built with a minimal feature set.
    The focus is on simplicity and
    robustness.”
    – Amazon S3 Press Release,
    March 14, 2006

    View Slide

  8. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Amazon S3
    8 → more than 200
    microservices
    Mai-Lan Tomsen Bukovec
    VP and GM, Amazon S3

    View Slide

  9. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Monolith

    View Slide

  10. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Service
    Service
    Service
    Service
    Service
    Service
    Service
    Service
    Service
    Service
    Service
    Service

    View Slide

  11. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Rust
    Database
    DB
    Database
    Rust
    Go
    Node.js
    Java
    Node.js
    Node.js

    View Slide

  12. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Containers
    Database
    DB
    Database
    Containers
    λ
    Containers
    VMs
    Managed
    Service

    View Slide

  13. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Don’t build a network of connected “black boxes”
    Observability is a developer responsibility

    View Slide

  14. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Observability in Control Theory
    On the General Theory of Control Systems
    R. E. KALMAN
    Introduction
    In no small measure, the great technological progress in
    automatic control and communication systems during the past
    two decades has depended on advances and refinements in the
    mathematical study of such systems. Conversely, the growth
    of technology brought forth many new problems (such as those
    related to using digital computers in control, etc.) to challenge
    the ingenuity and competence of research workers concerned
    with theoretical questions.
    Despite the appearance and effective resolution of many new
    problems, our understanding of fundamental aspects of control
    has remained superficial. The only basic advance so far appears
    to be the theory of information created by Shannon 1. The chief
    significance of his work in our present interpretation is the
    discovery of general' laws' underlying the process of information
    transmission, which are quite independent of the particular
    models being considered or even the methods used for the des-
    cription and analysis of these models. These results could be
    compared with the' laws' of physics, with the crucial difference
    that the' laws' governing man-made objects cannot be discovered
    by straightforward experimentation but only by a purely abstract
    analysis guided by intuition gained in observing present-day
    examples of technology and economic organization. We may
    thus classify Shannon's result as belonging to the pure theory
    of communication and control, while everything else can be
    labelled as the applied theory; this terminology reflects the well-
    known distinctions between pure and applied physics or
    mathematics. For reasons pointed out above, in its methodo-
    logy the pure theory of communication and control closely
    resembles mathematics, rather than physics; however, it is not
    a. branch of mathematics because at present we cannot (yet?)
    d1sregard questions of physical realizability in the study of
    mathematical models.
    This paper initiates study of the pure theory of control
    imitating the spirit of Shannon's investigations but
    using entirely different techniques. Our ultimate objective is
    to answer questions of the following type: What kind and how
    much information is needed to achieve a desired type of control?
    What intrinsic properties characterize a given unalterable plant
    as far as control is concerned?
    At present only superficial answers are available to these
    questions, and even then only in special cases.
    Initial results presented in this Note are far from the degree
    of generality of Shannon's work. By contrast, however, only
    metho?s are employed here, giving some hope of
    beIng able to aVOld the well-known difficulty of Shannon's
    theory: methods of proof which are impractical for actually
    constructing practical solutions. In fact, this paper arose
    fr.om the need for a better understanding of some recently
    d1scovered computation methods of control-system syn-
    thesis 2-s. Another by-product of the paper is a new com-
    putation method for the solution of the classical Wiener
    filtering problem 7.
    The organization of the paper is as follows:
    16
    In Section 3 we introduce the models for which a fairly
    complete theory is available: dynamic systems with a finite
    dimensional state space and linear transition functions (i.e.
    systems obeying linear differential or difference equations).
    The class of random processes considered consists of such
    dynamic systems excited by an uncorrelated gaussian random
    process. Other assumptions, such as stationarity, discretiza-
    tion, single input/single output, etc., are made only to facilitate
    the presentation and will be absent in detailed future accounts
    of the theory.
    In Section 4 we define the concept of controllability and show
    that this is the' natural' generalization of the so-called' dead-
    beat' control scheme discovered by Oldenbourg and Sartorius 21
    and later rederived independently by Tsypkin22 and the author17•
    We then show in Section 5 that the general problem of optimal
    regulation is solvable if and only if the plant is completely
    controllable.
    In Section 6 we introduce the concept of observability and
    solve the problem of reconstructing unmeasurable state variables
    from the measurable ones in the minimum possible length of
    time.
    We formalize the similarities between controllability and
    observability in Section 7 by means of the Principle of Duality
    and show that the Wiener filtering problem is the natural dual
    of the problem of optimal regulation.
    Section 8 is a brief discussion of possible generalizations and
    currently unsolved problems of the pure theory of control.
    Notation and Terminology
    The reader is assumed to be familiar with elements of linear
    algebra, as discussed, for instance, by Halmos 8.
    Consider an n-dimensional real vector space X. A basis in
    X is a set of vectors at ... , all in X such that any vector x in X
    can be written uniquely as
    (I)
    the Xi being real numbers, the components or coordinates of x.
    Vectors will be denoted throughout by small bold-face letters.
    The set X* of all real-valued linear functions x* (= covec-
    tors) on X. with the' natural' definition of addition and scalar
    multiplication, is an n-dimensional vector space. The value of
    a covector y* at any vector x is denoted by [y*, x]. We call
    this the inner product of y* by x. The vector space X* has a
    natural basis a* 1 ... , a* n associated with a given basis in X;
    it is defined by the requirement that
    [a*j, aj] = Ojj
    Using the' orthogonality relation' 2, we may write
    form n
    X =
    L [a*j, x]aj
    j= t
    which will be used frequently.
    (2)
    in the
    (3)
    For purposes of numerical computation, a vector may be
    considered a matrix with one column and a covector a matrix
    481
    491
    J.S.I.A.M. CONTROI
    Ser. A, Vol. 1, No.
    Printed in U.,q.A., 1963
    MATHEMATICAL DESCRIPTION OF LINEAR
    DYNAMICAL SYSTEMS*
    R. E. KALMAN
    Abstract. There are two different ways of describing dynamical systems: (i) by
    means of state w.riables and (if) by input/output relations. The first method may be
    regarded as an axiomatization of Newton’s laws of mechanics and is taken to be the
    basic definition of a system.
    It is then shown (in the linear case) that the input/output relations determine
    only one prt of a system, that which is completely observable and completely con-
    trollable. Using the theory of controllability and observability, methods are given
    for calculating irreducible realizations of a given impulse-response matrix. In par-
    ticular, an explicit procedure is given to determine the minimal number of state
    varibles necessary to realize a given transfer-function matrix. Difficulties arising
    from the use of reducible realizations are discussed briefly.
    1. Introduction and summary. Recent developments in optimM control
    system theory are bsed on vector differential equations as models of
    physical systems. In the older literature on control theory, however, the
    same systems are modeled by ransfer functions (i.e., by the Laplace trans-
    forms of the differential equations relating the inputs to the outputs). Two
    differet languages have arisen, both of which purport to talk about the
    same problem. In the new approach, we talk about state variables, tran-
    sition equations, etc., and make constant use of abstract linear algebra.
    In the old approach, the key words are frequency response, pole-zero pat-
    terns, etc., and the main mathematical tool is complex function theory.
    Is there really a difference between the new and the old? Precisely what
    are the relations between (linear) vector differential equations and transfer-
    functions? In the literature, this question is surrounded by confusion [1].
    This is bad. Communication between research workers and engineers is
    impeded. Important results of the "old theory" are not yet fully integrated
    into the new theory.
    In the writer’s view--which will be argued t length in this paperthe
    diiIiculty is due to insufficient appreciation of the concept of a dynamical
    system. Control theory is supposed to deal with physical systems, and not
    merely with mathematical objects such as a differential equation or a trans-
    fer function. We must therefore pay careful attention to the relationship
    between physical systems and their representation via differential equations,
    transfer functions, etc.
    * Received by the editors July 7, 1962 and in revised form December 9, 1962.
    Presented at the Symposium on Multivariable System Theory, SIAM, November 1,
    1962 at Cambridge, Massachusetts.
    This research was supported in part under U. S. Air Force Contracts AF 49 (638)-382
    and AF 33(616)-6952 as well as NASA Contract NASr-103.
    Research Institute for Advanced Studies (RIAS), Baltimore 12, Maryland.
    152
    Downloaded 11/11/13 to 152.3.159.32. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
    1961-62

    View Slide

  15. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Control Theory
    PV SP
    Controlled
    Process Variable
    Reference
    or Set Point
    Actual Value Desired Value
    SP-PV error

    View Slide

  16. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Observability
    In control theory, observability is a measure of
    how well internal states of a system
    can be inferred from knowledge
    of its external outputs.
    https://en.wikipedia.org/wiki/Observability

    View Slide

  17. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Levels of Observability
    Machine (HW, OS)
    Application
    Network

    View Slide

  18. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Machine (HW, OS)
    Application
    Network
    The Three Pillars of Observability
    Distributed Systems Observability by Cindy Sridharan

    View Slide

  19. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Machine (HW, OS)
    Application
    Network
    The Three Pillars of Observability
    Logs Metrics Tracing
    Distributed Systems Observability by Cindy Sridharan

    View Slide

  20. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Metric Filters & Correlations IDs
    Logs Tracing
    Metric
    Filter
    Correlation
    ID
    Metrics

    View Slide

  21. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Using Observability
    Logs Tracing
    Log aggregation
    & analytics
    Visualizations
    Alerting
    Metric
    Filter
    Correlation
    ID
    Metrics

    View Slide

  22. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Using Observability on AWS
    CloudWatch
    Logs
    AWS
    X-Ray
    Traces
    CloudWatch
    Insights
    CloudWatch
    Dashboard
    CloudWatch
    Alarms
    AWS X-Ray
    ServiceGraph
    Metric
    Filter
    CloudWatch
    Metrics

    View Slide

  23. © 2019, Amazon Web Services, Inc. or its Affiliates.
    CloudWatch Anomaly Detection O
    pen
    Preview

    View Slide

  24. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Dive Deep with Tracing

    View Slide

  25. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Understand performance…
    Systems Performance by Brendan Gregg

    View Slide

  26. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Understand performance… and latency…
    Systems Performance by Brendan Gregg

    View Slide

  27. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Understand performance… and latency… and percentiles!
    P100
    P99
    P90
    P50

    View Slide

  28. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Proactive operations helps mitigate issues
    Degraded state
    Outage
    Latency
    Time (ms)

    View Slide

  29. © 2019, Amazon Web Services, Inc. or its Affiliates.

    View Slide

  30. © 2019, Amazon Web Services, Inc. or its Affiliates.
    What is Serverless?
    No infrastructure to manage Automatic scaling
    Pay for value Highly available and secure

    View Slide

  31. © 2019, Amazon Web Services, Inc. or its Affiliates.
    How does Serverless work?
    Storage
    Databases
    Analytics
    Machine Learning
    . . .
    Your
    unique
    business
    logic
    User uploads a picture
    Customer data updated
    Anomaly detected
    API call
    . . .
    Fully-managed
    services
    Events
    Functions

    View Slide

  32. © 2019, Amazon Web Services, Inc. or its Affiliates.
    What is an “event” ?
    “something that happens”
    Events tell us a fact
    Immutable time series
    Time What
    2019 06 21 08 07 06 CustomerCreated
    2019 06 21 08 07 09 OrderCreated
    2019 06 21 08 07 13 PaymentSuccessful
    2019 06 21 08 07 17 CustomerUpdated
    . . . . . .

    View Slide

  33. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Time is important
    “Modelling events forces you to have a temporal
    focus on what’s going on in the system.
    Time becomes a crucial factor of the system.”
    – Greg Young, A Decade of DDD, CQRS, Event Sourcing, 2016

    View Slide

  34. © 2019, Amazon Web Services, Inc. or its Affiliates.
    © 2019, Amazon Web Services, Inc. or its Affiliates.
    How to simplify event management?
    Photo by Adam Jang on Unsplash

    View Slide

  35. © 2019, Amazon Web Services, Inc. or its Affiliates.
    TweetSource:
    Type: AWS::Serverless::Application
    Properties:
    Location:
    ApplicationId: arn:aws:serverlessrepo:...
    SemanticVersion: 2.0.0
    Parameters:
    TweetProcessorFunctionName: !Ref MyFunction
    SearchText: '#serverless -filter:nativeretweets'
    Nested apps to simplify solving recurring problems
    Standard
    Component
    Custom
    Business
    Logic
    aws-serverless-twitter-event-source app
    Polling schedule
    (CloudWatch
    Events rule)
    trigger
    TwitterProcessor
    SearchCheckpoint
    TwitterSearchPoller
    Twitter
    Search API

    View Slide

  36. © 2019, Amazon Web Services, Inc. or its Affiliates.
    AWS Event Fork Pipelines
    https://github.com/aws-samples/aws-serverless-event-fork-pipelines
    Amazon SNS
    topic
    Event storage & backup pipeline
    Event search & analytics pipeline
    Event replay pipeline
    Your event processing pipeline
    filtered
    events
    events to
    replay
    all
    events Standard
    Components
    Custom
    Business
    Logic

    View Slide

  37. © 2019, Amazon Web Services, Inc. or its Affiliates.
    AWS Event Fork Pipelines – Event Storage & Backup Pipeline
    sns-fork-storage-backup app
    Amazon S3
    backup bucket
    fan out
    filtered
    events
    Amazon SNS
    topic
    Amazon SQS
    queue
    AWS Lambda
    function

    View Slide

  38. © 2019, Amazon Web Services, Inc. or its Affiliates.
    AWS Event Fork Pipelines – Event Search & Analytics Pipeline
    sns-fork-search-analytics app
    Amazon S3
    dead-letter bucket
    fan out
    filtered
    events
    Amazon SNS
    topic
    Amazon SQS
    queue
    AWS Lambda
    function
    Kibana
    dashboard
    Store
    dead-letter
    events

    View Slide

  39. © 2019, Amazon Web Services, Inc. or its Affiliates.
    AWS Event Fork Pipelines – Event Replay Pipeline
    sns-fork-message-replay app
    fan out
    filtered
    events
    Amazon SNS
    topic
    Amazon SQS
    replay queue
    AWS Lambda
    replay function
    Your regular event processing pipeline
    Amazon SQS
    processing queue
    enqueue
    events to
    replay
    Your operators
    enable/disable replay
    reprocess events…

    View Slide

  40. © 2019, Amazon Web Services, Inc. or its Affiliates.
    AWS Event Fork Pipelines – E-Commerce Example

    View Slide

  41. © 2019, Amazon Web Services, Inc. or its Affiliates.
    AWS Event Fork Pipelines in the Serverless Application Repository

    View Slide

  42. Photo by J W on Unsplash
    Can we help more?

    View Slide

  43. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Amazon
    EventBridge
    A serverless event bus service for
    SaaS and AWS services
    • Fully managed, pay-as-you-go
    • Native integration with
    SaaS providers
    • 15 target services
    • Easily build event-driven
    architectures
    N
    ew

    View Slide

  44. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Amazon EventBridge
    Event source
    SaaS event
    bus
    Custom event
    bus
    Default event
    bus
    Rules
    AWS Lambda
    Amazon Kinesis
    AWS Step Functions
    Additional targets

    View Slide

  45. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Amazon EventBridge
    AWS services
    Custom events
    SaaS apps Event source
    SaaS event
    bus
    Custom event
    bus
    Default event
    bus
    Rules
    AWS Lambda
    Amazon Kinesis
    AWS Step Functions
    Additional targets
    "detail-type":
    "source": "aws.partner/example.com/123",
    "detail":
    "ticketId":
    "department":
    "creator":

    View Slide

  46. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Amazon EventBridge
    AWS services
    Custom events
    SaaS apps Event source
    SaaS event
    bus
    Custom event
    bus
    Default event
    bus
    Rules
    AWS Lambda
    Amazon Kinesis
    AWS Step Functions
    Additional targets
    "detail-type":
    "source": "aws.partner/example.com/123"
    "detail":
    "ticketId":
    "department":
    "creator":
    "source":

    View Slide

  47. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Amazon EventBridge
    AWS services
    Custom events
    SaaS apps Event source
    SaaS event
    bus
    Custom event
    bus
    Default event
    bus
    Rules
    AWS Lambda
    Amazon Kinesis
    AWS Step Functions
    Additional targets
    "detail-type":
    "source": "aws.partner/example.com/123",
    "detail":
    "ticketId":
    "department": "billing"
    "creator":
    "detail":
    "department": ["billing", "fulfillment"]

    View Slide

  48. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Amazon EventBridge
    AWS services
    Custom events
    SaaS apps Event source
    SaaS event
    bus
    Custom event
    bus
    Default event
    bus
    Rules
    AWS Lambda
    Amazon Kinesis
    AWS Step Functions
    Additional targets
    "detail-type": "Ticket Created"
    "source": "aws.partner/example.com/123",
    "detail":
    "ticketId":
    "department": "billing",
    "creator":
    "detail-type": ["Ticket Resolved"]

    View Slide

  49. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Amazon EventBridge integration partners

    View Slide

  50. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Common use cases

    View Slide

  51. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Common use cases

    View Slide

  52. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Takeaways
    1. Build the instrumentation you need to understand what is happening inside
    your distributed application
    2. Mix technical and business metrics together to get better insights
    3. Use correlation IDs in log and tracing frameworks to understand the actual
    flow of data
    4. Leverage anomaly detection to understand when you are not in a normal state
    5. Store, analyze, and reply events, they can be the source of truth to understand
    the behavior (and not just the structure) of your application

    View Slide

  53. © 2019, Amazon Web Services, Inc. or its Affiliates.
    AWS Lambda monitoring partners

    View Slide

  54. © 2019, Amazon Web Services, Inc. or its Affiliates.
    © 2019, Amazon Web Services, Inc. or its Affiliates.
    Thank you!
    @danilop

    View Slide