LINE messaging service • Text message, broadcasting Bot message, purchase sticker, etc. • Server-side authentication ↔ Client-side authentication Talk server Client Reverse Proxy Talk server Talk Server StickerShop Bot backend StickerShop Bot backend Auth server
you can multiplex the requests into single connection • To maximize the merit, HTTP/2 client/server try to keep the connection as much as possible Client Server stream1 stream2 … stream3
you can multiplex the requests into single connection • To maximize the merit, HTTP/2 client/server try to keep the connection as much as possible Client Server Client Server
server/client /LB continue to use the connection that you created once at startup, the client may unintentionally communicate only with the same server. • In worst case, all the requests go to 1 server Client Server Client Server LB
connection can relax the imbalance • But it is not a perfect solution val conn = getConnectionPool(); if (conn.maxAge >= 5 min) conn.disconnect() conn = getConnectionPool(); 7 K req/sec
in < 10ms • Reliable • Handles 600K req/sec within reasonable time • Maintainable • Easy to modify by anyone • Efficiency • HTTP/2 + L4 LB Q. Can Armeria solve the Requirements?
built on top of Java 8, Netty, HTTP/2, Thrift and gRPC • Take care common functionality for microservice • Client side LB • Circuit Breaker / Retry / Throttling • Tracing (Zipkin) / Monitoring integration • etc. https://line.github.io/armeria/
load imbalance if keep a connection • Solution? • Disconnect every time a client receive a response? • Connect between client and server directly + load balance at client side Client LB Client Server Proxy-based LB Server Client Client Server Client-side LB Server LB LB
the load imbalance • But.. how can we know the endpoint location at client? • Hardcode endpoint list to app? • Register locations of all services instances to a service registry and query the location to the registry from client Client Client Server Server LB Service Registry Register LB Query {"myService": [ "10.120.16.2:8080", "10.120.16.2:8080"]} Add "10.120.16.2:8080” to "myService"
Authentication is Single Point Of Failure in our system You can’t do anything at LINE until the service comes back • Outage affects User Experience A user cannot send a sticker to your friend via LINE. I cannot share a photo to my family, etc. • Lose an opportunity to earn
real load to newly released binary • Useful to check the regression without affecting to all users • If it does not find any issue, increase load more • If you find anything, rollback easily
web application • Takes 4 mins per server to restart, and has 1000 servers • Takes 25 mins even if we parallelize the operation • With low-load, new release may have no issue but if apply to all, may find performance issue • Requires many time to resolve • We may solve this by fine grain canary release but, is there any way to release more flexible??
flag users needs to know latest values to make rollout smoothly • But there is no general way to achieve this. • Distribute property file to each servers? Ansible, Chef, or rsync + inotify? • Make a new management API? Need to make APIs every time we make new flag or services?
YAML, XML … • Highly available • multi-master, eventually consistency • Provides an API to notify the change to users https://line.github.io/centraldogma/
minute of being changed • Deploy a YAML/JSON file to all our production servers via CentralDogma • Record changes in CentralDogma commit log 30% CacheV2 rollout: Service Service Service commit YAML Pull & Reload config Developer CentralDogma
rollout • 1 halfway rollout requires 2 regression tests new feature + old feature • If you have tens of halfway rollout, then… • Hard to guarantee the flag completeness • Might be revealed partially • Sometimes hard to control by the flag • e.g. SDK, JVM, library upgrade
created under high load It may cause GC pressure on client/server side for managing the connections. https://github.com/line/armeria/issues/816 https://github.com/line/armeria/pull/1886 • Long-polling-based server healthiness notification a client send a healthcheck request periodically to learn if a server is not healthy. It means a client will send a request to the unhealthy server until it sends a health check request next time. https://github.com/line/armeria/issues/1756 https://github.com/line/armeria/pull/1878
OSSs • Armeria https://line.github.io/armeria/ CentralDogma https://github.com/line/centraldogma • … are still evolving • Trouble may happen • Make a well trouble-prepared system • Let's do a smooth release