Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[PyCon KR 2018] Callosum: An RPC Transport Library

[PyCon KR 2018] Callosum: An RPC Transport Library

Ed7b6f41ac2581f1be3fd9b5bc883875?s=128

Joongi Kim

August 19, 2018
Tweet

Transcript

  1. Callosum An RPC Transport Library Joongi Kim () Lablup Inc.

    & KossLab
  2. None
  3. Why yet another RPC? PyCon KR 2018

  4. PyCon KR 2018 One day I opened the aiozmq repo

    while debugging Backend.AI, (※ It came back to aio-libs later as it got a new maintainer!) github.com/aio-libs-abandoned/aiozmq
  5. aiozmq on Python 3.7 PyCon KR 2018 BLEEDING EDGE This

    is the life on...
  6. Limitation of Existing RPC Libs § Typical goal is to

    design IDL better, not to do networking better! • Support only the client-server model – Backend.AI requires bi-directional request-reply • Difficult to extend networking layers – Supporting asyncio – Multiple asynchronous request-reply pairs • Often missing large data streaming (mostly small messages) • Requirements of additional features – Connection bundling & tunneling proxies for cluster federation PyCon KR 2018
  7. 만들면

  8. 만들어야

  9. Why "Callosum"? PyCon KR 2018

  10. Corpus Callosum § A bundle of neural fibers connecting two

    cerebral hemispheres § If gets disconnected... • Split-brain! • Alien hand syndrome • Sometimes intentionally cut to mitigate epileptic seizure Henry Vandyke Carter - Henry Gray (1918) Anatomy of the Human Body PyCon KR 2018
  11. Callosum Design Goals § RPC Transport Library != RPC Library

    • Focus on networking and leave IDL for existing RPC libs! • Make lower networking & upper IDL layers replacible! • Simplify more than aiozmq.rpc • Support encryption & authentication natively § Development • Let's keep MVP (minimum-viable product) in mind • Start from a simple and working example! PyCon KR 2018
  12. Example: JSON over ZeroMQ PyCon KR 2018 import json from

    callosum import Peer async def call(): peer = Peer(connect='tcp://localhost:5000', serializer=json.dumps, deserializer=json.loads, invoke_timeout=2.0) await peer.open() response = await peer.invoke('echo', { 'sent': 'hello', }) print(f"echoed {response['received']}") response = await peer.invoke('add', { 'a': 1234, 'b': 5678, }) print(f"{a} + {b} = {response['result']}") await peer.close() (Client)
  13. Example: JSON over ZeroMQ PyCon KR 2018 import json from

    callosum import Peer async def handle_echo(request): return { 'received': request.body['sent'], } async def handle_add(request): return { 'result': (request.body['a'] + request.body['b']), } async def serve(): peer = Peer(bind='tcp://127.0.0.1:5000', serializer=json.dumps, deserializer=json.loads) peer.handle_function('echo', handle_echo) peer.handle_function('add', handle_add) try: await peer.open() await peer.listen() except asyncio.CancelledError: await peer.close() (Server)
  14. Example: Thrift over ZeroMQ PyCon KR 2018 service SimpleService {

    string echo(1:string msg), i64 add(1:i64 a, 2:i64 b), } from callosum import Peer from callosum.upper.thrift import ThriftClientAdaptor import thriftpy simple_thrift = thriftpy.load( 'simple.thrift', module_name='simple_thrift') async def call(): peer = Peer(connect='tcp://localhost:5000', invoke_timeout=2.0) adaptor = ThriftClientAdaptor( simple_thrift.SimpleService) await peer.open() response = await peer.invoke( 'simple', adaptor.echo('hello')) print(f"echoed {response}") response = await peer.invoke( 'simple', adaptor.add(1234, 5678)) print(f"{a} + {b} = {response}") await peer.close() (Client)
  15. Example: Thrift over ZeroMQ PyCon KR 2018 from callosum import

    Peer from callosum.upper.thrift import ThriftServerAdaptor import thriftpy simple_thrift = thriftpy.load( 'simple.thrift', module_name='simple_thrift') class SimpleDispatcher: async def echo(self, msg): return msg async def add(self, a, b): return a + b async def serve(): peer = Peer(bind='tcp://127.0.0.1:5000') adaptor = ThriftServerAdaptor( peer, simple_thrift.SimpleService, SimpleDispatcher()) peer.handle_function('simple', adaptor.handle_function) # here is the same open-listen-close routine (Server) service SimpleService { string echo(1:string msg), i64 add(1:i64 a, 2:i64 b), }
  16. Inside Callosum PyCon KR 2018

  17. Layered Architecture PyCon KR 2018 Callosum Lower Layer Upper Layer

    Thrift protobuf JSON ZeroMQ HTTP User Apps IDL supports (e.g., type checks, serialization) Abstraction of raw TCP sockets as message-based communication Transport-layer extensions (e.g., async scheduling, streaming, etc.) User Apps
  18. Encryption & Authentication § CurveZMQ + ZAP PyCon KR 2018

    ZAP Server Connecting Socket Listening Socket Connect inproc://zeromq.zap.01 ZAP request ZAP response User-defined Authenticator Connection success/failure Server init with (domain, server_private_key) init with (domain, server_public_key, client_private_key, client_public_key) Identity (domain, client_public_key) Time Encryption algorithm: curve25519
  19. AbstractAuthenticator PyCon KR 2018 @abc.abstractmethod async def server_identity(self) -> Identity:

    ''' Return the identity of the server. Only used by the binder. ''' raise NotImplementedError @abc.abstractmethod async def check_client(self, client_id: Identity) -> AuthResult: ''' Check if the given domain and client public key is a valid one or not. Only used by the binder. ''' raise NotImplementedError @abc.abstractmethod async def server_public_key(self) -> bytes: ''' Return the public key of the server. Only used by the connector. ''' raise NotImplementedError @abc.abstractmethod async def client_identity(self) -> Identity: ''' Return the identity of the client. Only used by the connector. ''' raise NotImplementedError @abc.abstractmethod async def client_public_key(self) -> bytes: ''' Return the public key of the client. Only used by the connector. ''' raise NotImplementedError Server-side Interface Client-side Interface
  20. AsyncScheduler for Request-Key Ordering § Scheduler features • Customizable ordering

    via request-keys • Integration with aiojobs to limit maximum concurrency • Each peer may have different scheduling policies! PyCon KR 2018 1 Type 0: No ordering Same color : same request key Number : global request index Yellow : request begins Red : response returns 2 3 Client: Server: 1 2 1 2 3 3 Time Each request may be completely different RPC methods!
  21. AsyncScheduler for Request-Key Ordering PyCon KR 2018 1 Type 1:

    Return ordered by request-keys Type 2: Execution serialization by request-keys Same color : same request key Number : global request index Yellow : request begins Red : response returns 2 3 Client: Server: 1 2 1 2 3 3 1 2 3 Client: Server: 1 2 1 2 3 3 Time Time
  22. Multi-channel & Streaming PyCon KR 2018 request 1 request 2

    request 3 response 1 response 2 response 3 request 4 response 4 chunked requests chunked responses Callosum client-side server-side network
  23. Challenge: Integration with IDLs § Apache Thrift • It was

    easy to integrate with Callosum. • Why: aiothrift + Thrift's runtime IDL loading scheme § How about others? Does it apply generally? • What are the requirements for IDL libraries? – Asynchronization often requires IDL compiler changes. • I hope to integrate with nirum! PyCon KR 2018
  24. Challenge: Performance § Minimizing protocol overheads • Main overhead: serialization

    & de-serialization • How to simplify the validation logic in Python's side? • How to reduce the number of memory copies from/to buffers? • How to reduce the network bandwidth usage? § Solution • msgpack • snappy PyCon KR 2018
  25. Ref) Serialization Benchmark PyCon KR 2018 0% 20% 40% 60%

    80% 100% 120% 140% 160% 180% json simplejson ujson msgpack cbor marshal pickle pickle2 Relative Time Taken (LOADS) Relative Time Taken (DUMPS) Relative Encoded Size Code from https://gist.github.com/cactus/4073643
  26. Future Work § More optimization • Cython • PyO3 +

    Rust § Application to Backend.AI • Replace the current manager-agent communication to eliminate the necessity of VPNs for hybrid-cloud & inter-cluster setups (e.g., ) • Extend Callosum to make bundle of Callosum peerings ("bundle of bundles") PyCon KR 2018
  27. Retrospect https://github.com/lablup/callosum PyCon KR 2018

  28. Retrospect: Open Source § Yet another Backend.AI derived open source

    library! § Writing my own one vs. Maintaining existing one vs. Contributing to existing one § Tried to integrate with nirum (PyCon KR 2017), but had not enough time difficulties to extend its networking layer PyCon KR 2018 On March 21st, aio-libs Gitter
  29. Retrospect: Development § Conference-driven development?! § 1 week to implement

    Thrift over ZeroMQ § Difficulties when interating external IDL libs • "API contamination": async forces all others to be async • To write async APIs without "async def"... – Live in the hell of callbacks – Rewrite entire Python on top of an event loop (node) – Monkey-patch standard networking functions (gevent) PyCon KR 2018
  30. Thanks! is hiring Python backend & DevOps engineers ! contact@lablup.com

    PyCon KR 2018