API Monitoring with OpenAPI and Ecosystem using Schema

Engineering API Monitoring with OpenAPI   and Ecosystem using Schema
Wataru Manji, Verda -- LINE Corp.

Engineering The role and experience ABOUT ME 2 name: Wataru
Manji role: Software Engineer team: Verda Reliability Engineering team activities: - Development of monitoring system - Direction of incident handling - Implement of on-call system - User support and training - and more manji0 manji0#9999

Engineering Agenda • What is Verda? • Motivation • Basic
Idea • Deep-dive to API Monitoring • Practical Operation • Future Plans • Conclusion 3

Engineering 4

Engineering Of the LINER, by the LINER, for the LINER
Verda is the Infra Platform 5 Verda Web UI Verda REST APIs Server (VM/Baremetal) LoadBalancer (L4/L7) Storage (Object/Block) Datastore(MySQL,Redis) Kubernetes Elasticsearch ɾɾɾ

Engineering We manage many of resources The Scale is LARGE
6 Baremetal & HV VM K8s cluster 22,000+ 65,000+ 800+ EA※1 EA※1 ※1: Count of Dec. 2020 EA※1

Engineering Motivation Background to the introduction of API monitoring 7

• Server monitoring was running on a metrics basis. •
API Monitoring for the services was only log basis. • It does not summarize which part of the micro-services is failing the request. • Periodic spikes in server resource usage, but can't figure out why. • Some products had their own service monitoring, but this information was not known by other teams. • Need a unified method to measure API’s availability, throughput, and latency. Everything was so not clear Server Metrics is Not Enough Engineering 8

• We can collect the API metrics by implement schemas
of them. • The same can be done for other services by introducing the proxy and schema. • Verda k8s team developed “verda-common-proxy”, that is simple http-proxy sidecar that supports exporting metrics defined by OpenAPI schema. • Some components already use the proxy for collecting access log. That is a sidecar proxy We Already Have the Solution Engineering 9

Engineering Basic Idea Summary of API monitoring implementation 10

Engineering K8s native design Overview 11

Engineering Split management is the basic principle Schema Management 12
k8s manifest   (e.g Deployment) Application   Image Nginx + Schema files   Image Fix each version Repo Repo

Engineering The proxy can get a schema from api-server Support
for Modern Web-Framework 13 • VCP can get a schema from target service   → Works well with frameworks that take a schema-first approach

• request_count_total {deployment, pod, path, method, status_code, error} • request_latency_second_total
{deployment, pod, path, method} • request_inflight {deployment, pod, path, method} What have we been able to observe Queries Engineering 14 • request_size_bucket {deployment, pod, path, method} • response_size_bucket {deployment, pod, path, method} • What percentage of requests resulted in a 5xx? • What percentage of requests that violate the API spec? • Which path & method have lower throughput?

• The dashboard is implemented on Grafana. To observe information
in a time series Visualization Engineering 15 • Automatically add dashboards by identifying region and service via metric’s label.

Engineering Deep-dive to API Monitoring The keyword is OpenAPI 16

• VCP is the proxy that process AuthN, validation, and
recording access log. VCP: verda-common-proxy Engineering 17 • More detail… https://engineering.linecorp.com/ja/blog/verda-common-proxy/ VCP APP ɾValidate request with schema ɾRecord some metrics ɾValidate token and add the result as headers ɾRecord access log ɾRecord some metrics Request to the pod Response from the pod

Data-Flow of the Metrics Engineering 18 • Separate data-source availability
from data-store availability with remote-write

Engineering For investigation Working with Audit-log 19 • Metrics alone
do not provide information on specific requests. → Save access log separately and use it for analysis • We are using fluentd for the log transfer. • NOTE: This is a mechanism for analyzing platform-side requests, not the requests of apps built on the platform.

Engineering Practical Operation It helped us in these cases 20

• It turns out that the last deployment switched the
API endpoint referenced by HV, and the load on the new endpoint's resources skyrocketed. • The failure rate of networking management API increased dramatically, greatly affecting the creation of VMs and other services. • Solved by calculating the throughput from the API Monitoring values and scaling the pod to an appropriate value. CASE Engineering 21

Engineering CASE 22

Engineering Future Plans Provide value to the users of Verda
23

• The API Monitoring mechanism should serve as a foundation
for publishing those. • Verda's API Spec is not exposed at a high enough level. • There is no portal that lists Verda's services status. → Such information is essential for users to trust and use Verda • We would like to develop an interface for users to learn about the functions and their reliability. UX Issues Engineering 24

• Schema is an API Spec that can be exposed
to users • We can provide users with an API Document that automatically follows changes in the application. Implement more manageable user documentation API Documents generated by Schema Engineering 25

• From the metrics collected by API Monitoring, we can
calculate service status and service level. A mechanism to expose the health of the API to users Summary of Service Status Engineering 26

• In some cases, the return code of an API
is not enough to tell the user whether the function has done its job correctly or not. → Mainly operations that return 202 Accepted. e.g. Create VM • In order to measure the reliability of those processes, we will implement some tracking and verification measures. • openstack request-id based tracing • simple testing of the created resources Resource Monitoring Engineering 27

Engineering Design of the Eco-System 28

• Implement an interface that allows developers to easily define
monitoring items. • Metrics scraping target and labeling • Alert definitions and routing • Logging target, parsing rules, and routing → Currently under design and development • An interface for users to easily submit requests for documentation and service specifications. → Will work on a design that is easy to use for both users and developers in conjunction with Github features. Others Engineering 29

Engineering Conclusion What we have achieved and Overall future vision
30

• We can now measure health and demand for all
API paths and methods. • Request count • Latency • Availability • We provide specific benefits and frameworks for schema-first development. • Enable automatic API monitoring • Labor-saving documentation • Clarification of development items and procedures What We've Accomplished Engineering 31

• Automatic generation of API document with API Spec and
permissions information • More rigorous and measurable service-level definitions Future Plan1: Improving Verda's UX Engineering 32 • Implement an ecosystem of user documentation, including the information needed to trust and use the service. • API Spec that includes the information about permissions for execution • Difference between SLO and current service level

• State tracking of resources manipulated by asynchronous APIs. •
Track processing through with id per request. • Perfect service level measurement • Easy to understand the cause and extent of trouble Future Plan2: Perfect Monitoring Engineering 33

Engineering Do you interest in Verda? WE ARE HIRING!!! 34
List of open positions: https://lin.ee/YwYLuGa About Verda SRE team: https://lin.ee/4mgu0nn

Engineering EVENT NOTICE 35

Engineering THANK YOU

API Monitoring with OpenAPI and Ecosystem using...

API Monitoring with OpenAPI and Ecosystem using Schema

More Decks by LINE Developers

Other Decks in Technology

Featured

Transcript