Upgrade to Pro — share decks privately, control downloads, hide ads and more …

API Monitoring with OpenAPI and Ecosystem using Schema

API Monitoring with OpenAPI and Ecosystem using Schema

LINE Developers
PRO

March 11, 2021
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. Engineering
    API Monitoring with OpenAPI

    and Ecosystem using Schema
    Wataru Manji, Verda -- LINE Corp.

    View Slide

  2. Engineering
    The role and experience
    ABOUT ME
    2
    name: Wataru Manji


    role: Software Engineer


    team: Verda Reliability Engineering team


    activities:


    - Development of monitoring system


    - Direction of incident handling


    - Implement of on-call system


    - User support and training


    - and more


    manji0
    manji0#9999

    View Slide

  3. Engineering
    Agenda • What is Verda?


    • Motivation


    • Basic Idea


    • Deep-dive to API Monitoring


    • Practical Operation


    • Future Plans


    • Conclusion
    3

    View Slide

  4. Engineering
    4

    View Slide

  5. Engineering
    Of the LINER, by the LINER, for the LINER
    Verda is the Infra Platform
    5
    Verda Web UI
    Verda REST APIs
    Server (VM/Baremetal) LoadBalancer (L4/L7) Storage (Object/Block)
    Datastore(MySQL,Redis) Kubernetes Elasticsearch
    ɾɾɾ

    View Slide

  6. Engineering
    We manage many of resources
    The Scale is LARGE
    6
    Baremetal & HV
    VM K8s cluster
    22,000+
    65,000+ 800+
    EA※1
    EA※1
    ※1: Count of Dec. 2020
    EA※1

    View Slide

  7. Engineering
    Motivation
    Background to the introduction of API monitoring
    7

    View Slide

  8. ● Server monitoring was running on a metrics basis.


    ● API Monitoring for the services was only log basis.
    ● It does not summarize which part of the micro-services is failing the request.


    ● Periodic spikes in server resource usage, but can't figure out why.


    ● Some products had their own service monitoring, but this information was not
    known by other teams.
    ● Need a unified method to measure API’s availability, throughput, and latency.
    Everything was so not clear
    Server Metrics is Not Enough
    Engineering
    8

    View Slide

  9. ● We can collect the API metrics by implement schemas of them.


    ● The same can be done for other services by introducing the proxy and schema.
    ● Verda k8s team developed “verda-common-proxy”, that is simple http-proxy sidecar
    that supports exporting metrics defined by OpenAPI schema.


    ● Some components already use the proxy for collecting access log.
    That is a sidecar proxy
    We Already Have the Solution
    Engineering
    9

    View Slide

  10. Engineering
    Basic Idea
    Summary of API monitoring implementation
    10

    View Slide

  11. Engineering
    K8s native design
    Overview
    11

    View Slide

  12. Engineering
    Split management is the basic principle
    Schema Management
    12
    k8s manifest

    (e.g Deployment)
    Application

    Image
    Nginx + Schema files

    Image
    Fix each version
    Repo
    Repo

    View Slide

  13. Engineering
    The proxy can get a schema from api-server
    Support for Modern Web-Framework
    13
    ● VCP can get a schema from target service

    → Works well with frameworks that take a schema-first approach

    View Slide

  14. ● request_count_total {deployment, pod, path, method, status_code, error}


    ● request_latency_second_total {deployment, pod, path, method}
    ● request_inflight {deployment, pod, path, method}
    What have we been able to observe
    Queries
    Engineering
    14
    ● request_size_bucket {deployment, pod, path, method}


    ● response_size_bucket {deployment, pod, path, method}
    ● What percentage of requests resulted in a 5xx?


    ● What percentage of requests that violate the API spec?


    ● Which path & method have lower throughput?

    View Slide

  15. ● The dashboard is implemented on Grafana.
    To observe information in a time series
    Visualization
    Engineering
    15
    ● Automatically add dashboards by identifying region and service via metric’s label.

    View Slide

  16. Engineering
    Deep-dive to API Monitoring
    The keyword is OpenAPI
    16

    View Slide

  17. ● VCP is the proxy that process AuthN, validation, and recording access log.
    VCP: verda-common-proxy
    Engineering
    17
    ● More detail… https://engineering.linecorp.com/ja/blog/verda-common-proxy/
    VCP APP
    ɾValidate request with schema


    ɾRecord some metrics


    ɾValidate token and add the result as headers
    ɾRecord access log


    ɾRecord some metrics
    Request to the pod
    Response from the pod

    View Slide

  18. Data-Flow of the Metrics
    Engineering
    18
    ● Separate data-source availability from data-store availability with remote-write

    View Slide

  19. Engineering
    For investigation
    Working with Audit-log
    19
    ● Metrics alone do not provide information on specific requests.
    → Save access log separately and use it for analysis


    ● We are using fluentd for the log transfer.
    ● NOTE: This is a mechanism for analyzing platform-side
    requests, not the requests of apps built on the platform.

    View Slide

  20. Engineering
    Practical Operation
    It helped us in these cases
    20

    View Slide

  21. ● It turns out that the last deployment switched the API endpoint referenced by HV,
    and the load on the new endpoint's resources skyrocketed.
    ● The failure rate of networking management API increased dramatically, greatly
    affecting the creation of VMs and other services.
    ● Solved by calculating the throughput from the API Monitoring values and scaling
    the pod to an appropriate value.
    CASE
    Engineering
    21

    View Slide

  22. Engineering
    CASE
    22

    View Slide

  23. Engineering
    Future Plans
    Provide value to the users of Verda
    23

    View Slide

  24. ● The API Monitoring mechanism should serve as a foundation for publishing those.
    ● Verda's API Spec is not exposed at a high enough level.


    ● There is no portal that lists Verda's services status.


    → Such information is essential for users to trust and use Verda
    ● We would like to develop an interface for users to learn about the functions and
    their reliability.
    UX Issues
    Engineering
    24

    View Slide

  25. ● Schema is an API Spec that can be exposed to users


    ● We can provide users with an API Document that automatically follows changes in
    the application.
    Implement more manageable user documentation
    API Documents generated by Schema
    Engineering
    25

    View Slide

  26. ● From the metrics collected by API Monitoring, we can calculate service status and
    service level.
    A mechanism to expose the health of the API to users
    Summary of Service Status
    Engineering
    26

    View Slide

  27. ● In some cases, the return code of an API is not enough to tell the user whether the
    function has done its job correctly or not.


    → Mainly operations that return 202 Accepted. e.g. Create VM
    ● In order to measure the reliability of those processes, we will implement some
    tracking and verification measures.


    ● openstack request-id based tracing


    ● simple testing of the created resources
    Resource Monitoring
    Engineering
    27

    View Slide

  28. Engineering
    Design of the Eco-System
    28

    View Slide

  29. ● Implement an interface that allows developers to easily define monitoring items.


    ● Metrics scraping target and labeling


    ● Alert definitions and routing


    ● Logging target, parsing rules, and routing


    → Currently under design and development
    ● An interface for users to easily submit requests for documentation and service
    specifications.


    → Will work on a design that is easy to use for both users and developers in
    conjunction with Github features.
    Others
    Engineering
    29

    View Slide

  30. Engineering
    Conclusion
    What we have achieved and Overall future vision
    30

    View Slide

  31. ● We can now measure health and demand for all API paths and methods.


    ● Request count


    ● Latency


    ● Availability
    ● We provide specific benefits and frameworks for schema-first development.


    ● Enable automatic API monitoring


    ● Labor-saving documentation


    ● Clarification of development items and procedures
    What We've Accomplished
    Engineering
    31

    View Slide

  32. ● Automatic generation of API document with API Spec and permissions information
    ● More rigorous and measurable service-level definitions
    Future Plan1: Improving Verda's UX
    Engineering
    32
    ● Implement an ecosystem of user documentation, including the information needed
    to trust and use the service.


    ● API Spec that includes the information about permissions for execution


    ● Difference between SLO and current service level

    View Slide

  33. ● State tracking of resources manipulated by asynchronous APIs.


    ● Track processing through with id per request.
    ● Perfect service level measurement


    ● Easy to understand the cause and extent of trouble
    Future Plan2: Perfect Monitoring
    Engineering
    33

    View Slide

  34. Engineering
    Do you interest in Verda?
    WE ARE HIRING!!!
    34
    List of open positions: https://lin.ee/YwYLuGa


    About Verda SRE team: https://lin.ee/4mgu0nn

    View Slide

  35. Engineering
    EVENT NOTICE
    35

    View Slide

  36. Engineering
    THANK YOU

    View Slide