Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google
Varun Talwar, product manager on Google's gRPC project discusses the fundamentals and specs of gRPC inside of a Google-scale microservices architecture.
Stubby experience a. HTTP/JSON doesnt cut it b. Establish a Lingua Franca c. Design for fault tolerance and control: Sync/Async, Deadlines, Cancellations, Flow control d. Flying blind without stats e. Diagnosing with tracing f. Load Balancing is critical 3. gRPC a. Cross platform matters ! b. Performance and Standards matter: HTTP/2 c. Pluggability matters: Interceptors, Name Resolvers, Auth plugins d. Usability matters !
a lingua franca 3. Design for fault tolerance and provide control knobs 4. Dont fly blind: Service Analytics 5. Diagnosing problems: Tracing 6. Load Balancing is critical
bled into services 2. Stateless 3. Text on the wire 4. Loose contracts 5. TCP connection per request 6. Nouns based 7. Harder API evolution 8. Think compute, network on cloud platforms 1
2. Start with IDL 3. Have a language agnostic way of agreeing on data semantics 4. Code Gen in various languages 5. Forward and Backward compatibility 6. API Evolution 2
in time. Deadline indicates to the server how long the client is willing to wait for an answer. RPC will fail with DEADLINE_EXCEEDED status code when deadline reached. gRPC Deadlines
rely on the exchange of data that is not part of the declared interface of a service. Deployments rely on their ability to evolve these features at a different rate to the individual APIs exposed by services. Metadata helps in exchange of useful information
resource usage and performance stats in real time by (almost) any arbitrary metadata 1. Service X can monitor CPU usage in their jobs broken down by the name of the invoked RPC and the mdb user who sent it. 2. Social can monitor the RPC latency of shared bigtable jobs when responding to their requests, broken down by whether the request originated from a user on web/Android/iOS. 3. Gmail can collect usage on servers, broken down by according POP/IMAP/web/Android/iOS. Layer propagates Gmail's metadata down to every service, even if the request was made by an intermediary job that Gmail doesn't own • Stats layer export data to varz and streamz, and provides stats to many monitoring systems and dashboards
Its an ad query :-) I need to find out. • Take a sample and store in database; help identify request in sample which took similar amount of time • I didnt get a response from the service. What happened? Which link in the service dependency graph got stuck? Stitch a trace and figure out. • Where is it taking time for a trace? Hotspot analysis • What all are the dependencies for a service?
available - Avoid connection establishment latency ◦ Round-robin-over-list - Lists not sets → ability to represent weights • For anything more advanced, move the burden to an external "LB Controller", a regular gRPC server and rely on a client-side implementation of the so-called gRPC LB policy. client LB Controller backends 1) Control RPC 2) address-list 3) RR over addresses of address-list gRPC LB Next gen of load balancing
strict • Common language helps • Common understanding for deadlines, cancellations, flow control • Common stats/tracing framework is essential for monitoring, debugging • Common framework lets uniform policy application for control and lb Single point of integration for logging, monitoring, tracing, service discovery and load balancing makes lives much easier !
active community • Reliable with continuous running tests on GCE ◦ Deployable in your environment • Measured with an open performance dashboard ◦ Deployable in your environment • Well adopted inside and outside Google Where is the project today?
available on every popular development platform and easy for someone to build for their platform of choice. It should be viable on CPU & memory limited devices. gRPC Principles & Requirements http://www.grpc.io/blog/principles
into an exchange of binary-encoded frames, which are then mapped to messages that belong to a stream, and all of which are multiplexed within a single TCP connection. Binary Framing Stream 1 HEADERS Stream 2 :method: GET :path: /kyiv :version: HTTP/2 :scheme: https HEADERS :status: 200 :version: HTTP/2 :server: nginx/1.10.1 ... DATA <payload> Stream N Request Response TCP
client sends a single request to the server and gets a single response back, just like a normal function call. The client sends a request to the server and gets a stream to read a sequence of messages back. The client reads from the returned stream until there are no more messages. The client send a sequence of messages to the server using a provided stream. Once the client has finished writing the messages, it waits for the server to read them and return its response. Client streaming Both sides send a sequence of messages using a read-write stream. The two streams operate independently. The order of messages in each stream is preserved. BiDi streaming Unary Server streaming
GCE VMs per Pull Request for regression testing. • gRPC Users can run these in their environments. • Good Performance across languages: ◦ Java Throughput: 500 K RPCs/Sec and 1.3 M Streaming messages/Sec on 32 core VMs ◦ Java Latency: ~320 us for unary ping-pong (netperf 120us) ◦ C++ Throughput: ~1.3 M RPCs/Sec and 3 M Streaming Messages/Sec on 32 core VMs. Performance
load-balancing and failover, monitoring, tracing, logging, and so on. Implementations should provide extensions points to allow for plugging in these features and, where useful, default implementations. gRPC Principles & Requirements http://www.grpc.io/blog/principles
Streaming compression 5. Mechanism to do caching 6. Binary Logging a. Debugging, auditing though costly 7. Unit Testing support a. Automated mock testing b. Dont need to bring up all dependent services just to test 8. Web support Coming soon !
community Strict Service contracts Define and enforce contracts, backward compatible Performant 1m+ QPS - unary, 3m+ streaming (dashboard) Pluggable design Auth, Transport, IDL, LB Efficiency on wire 2-3X gains Streaming APIs Large payloads, speech, logs Standard compliant HTTP/2 Easy to use Single line installation
is reliable Latency is zero Bandwidth is infinite The network is secure https://blogs.oracle.com/jag/resource/Fallacies.html Topology doesn't change There is one administrator Transport cost is zero The network is homogeneous
Ease of use Performance Versioning Programming model Developers Uniform Monitoring Debugging/Tracing Cross platform/language Operators Defined Contracts Single uniform framework for control Visibility Architects/Manag ers
of the stack must be able to evolve independently. A revision to the wire-format should not disrupt application layer bindings. http://www.grpc.io/blog/principles
multiplexed bidirectional protocol. gRPC (http://grpc.io): • HTTP/2 transport based, open source, general purpose standards-based, feature-rich RPC framework. • Bidirectional streaming over one single TCP connection. • Netty transport provides asynchronous and non-blocking I/O. • Deadline and cancellations propagation. • Client- and server-side flow-control. • Layered, pluggable and extensible. • Supports 10 programming languages. • Build-in testing support. • Production-ready (current version is 1.0.1) and growing ecosystem.
<Messages>* ◦ Response → <Header Metadata> <Messages>* <Trailing Metadata> <Status> • Generic mechanism for attaching metadata to requests and responses • Commonly used to attach “bearer tokens” to requests for Auth ◦ OAuth2 access tokens ◦ JWT e.g. OpenId Connect Id Tokens • Session state for specific Auth mechanisms is encapsulated in an Auth-credentials object Metadata and Auth