Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[PyCon KR 2018] Callosum: An RPC Transport Library

[PyCon KR 2018] Callosum: An RPC Transport Library

Joongi Kim

August 19, 2018
Tweet

More Decks by Joongi Kim

Other Decks in Programming

Transcript

  1. PyCon KR 2018 One day I opened the aiozmq repo

    while debugging Backend.AI, (※ It came back to aio-libs later as it got a new maintainer!) github.com/aio-libs-abandoned/aiozmq
  2. Limitation of Existing RPC Libs § Typical goal is to

    design IDL better, not to do networking better! • Support only the client-server model – Backend.AI requires bi-directional request-reply • Difficult to extend networking layers – Supporting asyncio – Multiple asynchronous request-reply pairs • Often missing large data streaming (mostly small messages) • Requirements of additional features – Connection bundling & tunneling proxies for cluster federation PyCon KR 2018
  3. Corpus Callosum § A bundle of neural fibers connecting two

    cerebral hemispheres § If gets disconnected... • Split-brain! • Alien hand syndrome • Sometimes intentionally cut to mitigate epileptic seizure Henry Vandyke Carter - Henry Gray (1918) Anatomy of the Human Body PyCon KR 2018
  4. Callosum Design Goals § RPC Transport Library != RPC Library

    • Focus on networking and leave IDL for existing RPC libs! • Make lower networking & upper IDL layers replacible! • Simplify more than aiozmq.rpc • Support encryption & authentication natively § Development • Let's keep MVP (minimum-viable product) in mind • Start from a simple and working example! PyCon KR 2018
  5. Example: JSON over ZeroMQ PyCon KR 2018 import json from

    callosum import Peer async def call(): peer = Peer(connect='tcp://localhost:5000', serializer=json.dumps, deserializer=json.loads, invoke_timeout=2.0) await peer.open() response = await peer.invoke('echo', { 'sent': 'hello', }) print(f"echoed {response['received']}") response = await peer.invoke('add', { 'a': 1234, 'b': 5678, }) print(f"{a} + {b} = {response['result']}") await peer.close() (Client)
  6. Example: JSON over ZeroMQ PyCon KR 2018 import json from

    callosum import Peer async def handle_echo(request): return { 'received': request.body['sent'], } async def handle_add(request): return { 'result': (request.body['a'] + request.body['b']), } async def serve(): peer = Peer(bind='tcp://127.0.0.1:5000', serializer=json.dumps, deserializer=json.loads) peer.handle_function('echo', handle_echo) peer.handle_function('add', handle_add) try: await peer.open() await peer.listen() except asyncio.CancelledError: await peer.close() (Server)
  7. Example: Thrift over ZeroMQ PyCon KR 2018 service SimpleService {

    string echo(1:string msg), i64 add(1:i64 a, 2:i64 b), } from callosum import Peer from callosum.upper.thrift import ThriftClientAdaptor import thriftpy simple_thrift = thriftpy.load( 'simple.thrift', module_name='simple_thrift') async def call(): peer = Peer(connect='tcp://localhost:5000', invoke_timeout=2.0) adaptor = ThriftClientAdaptor( simple_thrift.SimpleService) await peer.open() response = await peer.invoke( 'simple', adaptor.echo('hello')) print(f"echoed {response}") response = await peer.invoke( 'simple', adaptor.add(1234, 5678)) print(f"{a} + {b} = {response}") await peer.close() (Client)
  8. Example: Thrift over ZeroMQ PyCon KR 2018 from callosum import

    Peer from callosum.upper.thrift import ThriftServerAdaptor import thriftpy simple_thrift = thriftpy.load( 'simple.thrift', module_name='simple_thrift') class SimpleDispatcher: async def echo(self, msg): return msg async def add(self, a, b): return a + b async def serve(): peer = Peer(bind='tcp://127.0.0.1:5000') adaptor = ThriftServerAdaptor( peer, simple_thrift.SimpleService, SimpleDispatcher()) peer.handle_function('simple', adaptor.handle_function) # here is the same open-listen-close routine (Server) service SimpleService { string echo(1:string msg), i64 add(1:i64 a, 2:i64 b), }
  9. Layered Architecture PyCon KR 2018 Callosum Lower Layer Upper Layer

    Thrift protobuf JSON ZeroMQ HTTP User Apps IDL supports (e.g., type checks, serialization) Abstraction of raw TCP sockets as message-based communication Transport-layer extensions (e.g., async scheduling, streaming, etc.) User Apps
  10. Encryption & Authentication § CurveZMQ + ZAP PyCon KR 2018

    ZAP Server Connecting Socket Listening Socket Connect inproc://zeromq.zap.01 ZAP request ZAP response User-defined Authenticator Connection success/failure Server init with (domain, server_private_key) init with (domain, server_public_key, client_private_key, client_public_key) Identity (domain, client_public_key) Time Encryption algorithm: curve25519
  11. AbstractAuthenticator PyCon KR 2018 @abc.abstractmethod async def server_identity(self) -> Identity:

    ''' Return the identity of the server. Only used by the binder. ''' raise NotImplementedError @abc.abstractmethod async def check_client(self, client_id: Identity) -> AuthResult: ''' Check if the given domain and client public key is a valid one or not. Only used by the binder. ''' raise NotImplementedError @abc.abstractmethod async def server_public_key(self) -> bytes: ''' Return the public key of the server. Only used by the connector. ''' raise NotImplementedError @abc.abstractmethod async def client_identity(self) -> Identity: ''' Return the identity of the client. Only used by the connector. ''' raise NotImplementedError @abc.abstractmethod async def client_public_key(self) -> bytes: ''' Return the public key of the client. Only used by the connector. ''' raise NotImplementedError Server-side Interface Client-side Interface
  12. AsyncScheduler for Request-Key Ordering § Scheduler features • Customizable ordering

    via request-keys • Integration with aiojobs to limit maximum concurrency • Each peer may have different scheduling policies! PyCon KR 2018 1 Type 0: No ordering Same color : same request key Number : global request index Yellow : request begins Red : response returns 2 3 Client: Server: 1 2 1 2 3 3 Time Each request may be completely different RPC methods!
  13. AsyncScheduler for Request-Key Ordering PyCon KR 2018 1 Type 1:

    Return ordered by request-keys Type 2: Execution serialization by request-keys Same color : same request key Number : global request index Yellow : request begins Red : response returns 2 3 Client: Server: 1 2 1 2 3 3 1 2 3 Client: Server: 1 2 1 2 3 3 Time Time
  14. Multi-channel & Streaming PyCon KR 2018 request 1 request 2

    request 3 response 1 response 2 response 3 request 4 response 4 chunked requests chunked responses Callosum client-side server-side network
  15. Challenge: Integration with IDLs § Apache Thrift • It was

    easy to integrate with Callosum. • Why: aiothrift + Thrift's runtime IDL loading scheme § How about others? Does it apply generally? • What are the requirements for IDL libraries? – Asynchronization often requires IDL compiler changes. • I hope to integrate with nirum! PyCon KR 2018
  16. Challenge: Performance § Minimizing protocol overheads • Main overhead: serialization

    & de-serialization • How to simplify the validation logic in Python's side? • How to reduce the number of memory copies from/to buffers? • How to reduce the network bandwidth usage? § Solution • msgpack • snappy PyCon KR 2018
  17. Ref) Serialization Benchmark PyCon KR 2018 0% 20% 40% 60%

    80% 100% 120% 140% 160% 180% json simplejson ujson msgpack cbor marshal pickle pickle2 Relative Time Taken (LOADS) Relative Time Taken (DUMPS) Relative Encoded Size Code from https://gist.github.com/cactus/4073643
  18. Future Work § More optimization • Cython • PyO3 +

    Rust § Application to Backend.AI • Replace the current manager-agent communication to eliminate the necessity of VPNs for hybrid-cloud & inter-cluster setups (e.g., ) • Extend Callosum to make bundle of Callosum peerings ("bundle of bundles") PyCon KR 2018
  19. Retrospect: Open Source § Yet another Backend.AI derived open source

    library! § Writing my own one vs. Maintaining existing one vs. Contributing to existing one § Tried to integrate with nirum (PyCon KR 2017), but had not enough time difficulties to extend its networking layer PyCon KR 2018 On March 21st, aio-libs Gitter
  20. Retrospect: Development § Conference-driven development?! § 1 week to implement

    Thrift over ZeroMQ § Difficulties when interating external IDL libs • "API contamination": async forces all others to be async • To write async APIs without "async def"... – Live in the hell of callbacks – Rewrite entire Python on top of an event loop (node) – Monkey-patch standard networking functions (gevent) PyCon KR 2018