Upgrade to Pro — share decks privately, control downloads, hide ads and more …

API Monitoring with OpenAPI and Ecosystem using Schema

API Monitoring with OpenAPI and Ecosystem using Schema

53850955f15249a1a9dc49df6113e400?s=128

LINE Developers
PRO

March 11, 2021
Tweet

Transcript

  1. Engineering API Monitoring with OpenAPI 
 and Ecosystem using Schema

    Wataru Manji, Verda -- LINE Corp.
  2. Engineering The role and experience ABOUT ME 2 name: Wataru

    Manji role: Software Engineer team: Verda Reliability Engineering team activities: - Development of monitoring system - Direction of incident handling - Implement of on-call system - User support and training - and more manji0 manji0#9999
  3. Engineering Agenda • What is Verda? • Motivation • Basic

    Idea • Deep-dive to API Monitoring • Practical Operation • Future Plans • Conclusion 3
  4. Engineering 4

  5. Engineering Of the LINER, by the LINER, for the LINER

    Verda is the Infra Platform 5 Verda Web UI Verda REST APIs Server (VM/Baremetal) LoadBalancer (L4/L7) Storage (Object/Block) Datastore(MySQL,Redis) Kubernetes Elasticsearch ɾɾɾ
  6. Engineering We manage many of resources The Scale is LARGE

    6 Baremetal & HV VM K8s cluster 22,000+ 65,000+ 800+ EA※1 EA※1 ※1: Count of Dec. 2020 EA※1
  7. Engineering Motivation Background to the introduction of API monitoring 7

  8. • Server monitoring was running on a metrics basis. •

    API Monitoring for the services was only log basis. • It does not summarize which part of the micro-services is failing the request. • Periodic spikes in server resource usage, but can't figure out why. • Some products had their own service monitoring, but this information was not known by other teams. • Need a unified method to measure API’s availability, throughput, and latency. Everything was so not clear Server Metrics is Not Enough Engineering 8
  9. • We can collect the API metrics by implement schemas

    of them. • The same can be done for other services by introducing the proxy and schema. • Verda k8s team developed “verda-common-proxy”, that is simple http-proxy sidecar that supports exporting metrics defined by OpenAPI schema. • Some components already use the proxy for collecting access log. That is a sidecar proxy We Already Have the Solution Engineering 9
  10. Engineering Basic Idea Summary of API monitoring implementation 10

  11. Engineering K8s native design Overview 11

  12. Engineering Split management is the basic principle Schema Management 12

    k8s manifest 
 (e.g Deployment) Application 
 Image Nginx + Schema files 
 Image Fix each version Repo Repo
  13. Engineering The proxy can get a schema from api-server Support

    for Modern Web-Framework 13 • VCP can get a schema from target service 
 → Works well with frameworks that take a schema-first approach
  14. • request_count_total {deployment, pod, path, method, status_code, error} • request_latency_second_total

    {deployment, pod, path, method} • request_inflight {deployment, pod, path, method} What have we been able to observe Queries Engineering 14 • request_size_bucket {deployment, pod, path, method} • response_size_bucket {deployment, pod, path, method} • What percentage of requests resulted in a 5xx? • What percentage of requests that violate the API spec? • Which path & method have lower throughput?
  15. • The dashboard is implemented on Grafana. To observe information

    in a time series Visualization Engineering 15 • Automatically add dashboards by identifying region and service via metric’s label.
  16. Engineering Deep-dive to API Monitoring The keyword is OpenAPI 16

  17. • VCP is the proxy that process AuthN, validation, and

    recording access log. VCP: verda-common-proxy Engineering 17 • More detail… https://engineering.linecorp.com/ja/blog/verda-common-proxy/ VCP APP ɾValidate request with schema ɾRecord some metrics ɾValidate token and add the result as headers ɾRecord access log ɾRecord some metrics Request to the pod Response from the pod
  18. Data-Flow of the Metrics Engineering 18 • Separate data-source availability

    from data-store availability with remote-write
  19. Engineering For investigation Working with Audit-log 19 • Metrics alone

    do not provide information on specific requests. → Save access log separately and use it for analysis • We are using fluentd for the log transfer. • NOTE: This is a mechanism for analyzing platform-side requests, not the requests of apps built on the platform.
  20. Engineering Practical Operation It helped us in these cases 20

  21. • It turns out that the last deployment switched the

    API endpoint referenced by HV, and the load on the new endpoint's resources skyrocketed. • The failure rate of networking management API increased dramatically, greatly affecting the creation of VMs and other services. • Solved by calculating the throughput from the API Monitoring values and scaling the pod to an appropriate value. CASE Engineering 21
  22. Engineering CASE 22

  23. Engineering Future Plans Provide value to the users of Verda

    23
  24. • The API Monitoring mechanism should serve as a foundation

    for publishing those. • Verda's API Spec is not exposed at a high enough level. • There is no portal that lists Verda's services status. → Such information is essential for users to trust and use Verda • We would like to develop an interface for users to learn about the functions and their reliability. UX Issues Engineering 24
  25. • Schema is an API Spec that can be exposed

    to users • We can provide users with an API Document that automatically follows changes in the application. Implement more manageable user documentation API Documents generated by Schema Engineering 25
  26. • From the metrics collected by API Monitoring, we can

    calculate service status and service level. A mechanism to expose the health of the API to users Summary of Service Status Engineering 26
  27. • In some cases, the return code of an API

    is not enough to tell the user whether the function has done its job correctly or not. → Mainly operations that return 202 Accepted. e.g. Create VM • In order to measure the reliability of those processes, we will implement some tracking and verification measures. • openstack request-id based tracing • simple testing of the created resources Resource Monitoring Engineering 27
  28. Engineering Design of the Eco-System 28

  29. • Implement an interface that allows developers to easily define

    monitoring items. • Metrics scraping target and labeling • Alert definitions and routing • Logging target, parsing rules, and routing → Currently under design and development • An interface for users to easily submit requests for documentation and service specifications. → Will work on a design that is easy to use for both users and developers in conjunction with Github features. Others Engineering 29
  30. Engineering Conclusion What we have achieved and Overall future vision

    30
  31. • We can now measure health and demand for all

    API paths and methods. • Request count • Latency • Availability • We provide specific benefits and frameworks for schema-first development. • Enable automatic API monitoring • Labor-saving documentation • Clarification of development items and procedures What We've Accomplished Engineering 31
  32. • Automatic generation of API document with API Spec and

    permissions information • More rigorous and measurable service-level definitions Future Plan1: Improving Verda's UX Engineering 32 • Implement an ecosystem of user documentation, including the information needed to trust and use the service. • API Spec that includes the information about permissions for execution • Difference between SLO and current service level
  33. • State tracking of resources manipulated by asynchronous APIs. •

    Track processing through with id per request. • Perfect service level measurement • Easy to understand the cause and extent of trouble Future Plan2: Perfect Monitoring Engineering 33
  34. Engineering Do you interest in Verda? WE ARE HIRING!!! 34

    List of open positions: https://lin.ee/YwYLuGa About Verda SRE team: https://lin.ee/4mgu0nn
  35. Engineering EVENT NOTICE 35

  36. Engineering THANK YOU