Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[PyCon KR 2018] Callosum: An RPC Transport Library

[PyCon KR 2018] Callosum: An RPC Transport Library

Joongi Kim

August 19, 2018
Tweet

More Decks by Joongi Kim

Other Decks in Programming

Transcript

  1. Callosum
    An RPC Transport Library
    Joongi Kim ()
    Lablup Inc. & KossLab

    View Slide

  2. View Slide

  3. Why yet another RPC?
    PyCon KR 2018

    View Slide

  4. PyCon KR 2018
    One day I opened the aiozmq repo while debugging Backend.AI,
    (※ It came back to aio-libs later as it got a new maintainer!)
    github.com/aio-libs-abandoned/aiozmq

    View Slide

  5. aiozmq on Python 3.7
    PyCon KR 2018
    BLEEDING
    EDGE
    This is the life on...

    View Slide

  6. Limitation of Existing RPC Libs
    § Typical goal is to design IDL better, not to do networking better!
    • Support only the client-server model
    – Backend.AI requires bi-directional request-reply
    • Difficult to extend networking layers
    – Supporting asyncio
    – Multiple asynchronous request-reply pairs
    • Often missing large data streaming (mostly small messages)
    • Requirements of additional features
    – Connection bundling & tunneling proxies for cluster federation
    PyCon KR 2018

    View Slide

  7. 만들면

    View Slide

  8. 만들어야

    View Slide

  9. Why "Callosum"?
    PyCon KR 2018

    View Slide

  10. Corpus Callosum
    § A bundle of neural fibers connecting two
    cerebral hemispheres
    § If gets disconnected...
    • Split-brain!
    • Alien hand syndrome
    • Sometimes intentionally cut
    to mitigate epileptic seizure
    Henry Vandyke Carter - Henry Gray (1918)
    Anatomy of the Human Body
    PyCon KR 2018

    View Slide

  11. Callosum Design Goals
    § RPC Transport Library != RPC Library
    • Focus on networking and leave IDL for existing RPC libs!
    • Make lower networking & upper IDL layers replacible!
    • Simplify more than aiozmq.rpc
    • Support encryption & authentication natively
    § Development
    • Let's keep MVP (minimum-viable product) in mind
    • Start from a simple and working example!
    PyCon KR 2018

    View Slide

  12. Example: JSON over ZeroMQ
    PyCon KR 2018
    import json
    from callosum import Peer
    async def call():
    peer = Peer(connect='tcp://localhost:5000',
    serializer=json.dumps,
    deserializer=json.loads,
    invoke_timeout=2.0)
    await peer.open()
    response = await peer.invoke('echo', {
    'sent': 'hello',
    })
    print(f"echoed {response['received']}")
    response = await peer.invoke('add', {
    'a': 1234,
    'b': 5678,
    })
    print(f"{a} + {b} = {response['result']}")
    await peer.close()
    (Client)

    View Slide

  13. Example: JSON over ZeroMQ
    PyCon KR 2018
    import json
    from callosum import Peer
    async def handle_echo(request):
    return {
    'received': request.body['sent'],
    }
    async def handle_add(request):
    return {
    'result': (request.body['a'] +
    request.body['b']),
    }
    async def serve():
    peer = Peer(bind='tcp://127.0.0.1:5000',
    serializer=json.dumps,
    deserializer=json.loads)
    peer.handle_function('echo', handle_echo)
    peer.handle_function('add', handle_add)
    try:
    await peer.open()
    await peer.listen()
    except asyncio.CancelledError:
    await peer.close()
    (Server)

    View Slide

  14. Example: Thrift over ZeroMQ
    PyCon KR 2018
    service SimpleService {
    string echo(1:string msg),
    i64 add(1:i64 a, 2:i64 b),
    }
    from callosum import Peer
    from callosum.upper.thrift import ThriftClientAdaptor
    import thriftpy
    simple_thrift = thriftpy.load(
    'simple.thrift',
    module_name='simple_thrift')
    async def call():
    peer = Peer(connect='tcp://localhost:5000',
    invoke_timeout=2.0)
    adaptor = ThriftClientAdaptor(
    simple_thrift.SimpleService)
    await peer.open()
    response = await peer.invoke(
    'simple',
    adaptor.echo('hello'))
    print(f"echoed {response}")
    response = await peer.invoke(
    'simple',
    adaptor.add(1234, 5678))
    print(f"{a} + {b} = {response}")
    await peer.close()
    (Client)

    View Slide

  15. Example: Thrift over ZeroMQ
    PyCon KR 2018
    from callosum import Peer
    from callosum.upper.thrift import ThriftServerAdaptor
    import thriftpy
    simple_thrift = thriftpy.load(
    'simple.thrift',
    module_name='simple_thrift')
    class SimpleDispatcher:
    async def echo(self, msg):
    return msg
    async def add(self, a, b):
    return a + b
    async def serve():
    peer = Peer(bind='tcp://127.0.0.1:5000')
    adaptor = ThriftServerAdaptor(
    peer,
    simple_thrift.SimpleService,
    SimpleDispatcher())
    peer.handle_function('simple',
    adaptor.handle_function)
    # here is the same open-listen-close routine
    (Server)
    service SimpleService {
    string echo(1:string msg),
    i64 add(1:i64 a, 2:i64 b),
    }

    View Slide

  16. Inside Callosum
    PyCon KR 2018

    View Slide

  17. Layered Architecture
    PyCon KR 2018
    Callosum
    Lower Layer
    Upper Layer
    Thrift protobuf JSON
    ZeroMQ HTTP
    User Apps
    IDL supports
    (e.g., type checks, serialization)
    Abstraction of raw TCP sockets
    as message-based communication
    Transport-layer extensions
    (e.g., async scheduling, streaming, etc.)
    User Apps

    View Slide

  18. Encryption & Authentication
    § CurveZMQ + ZAP
    PyCon KR 2018
    ZAP Server
    Connecting Socket Listening Socket
    Connect
    inproc://zeromq.zap.01
    ZAP request
    ZAP response
    User-defined
    Authenticator
    Connection
    success/failure
    Server
    init with (domain,
    server_private_key)
    init with (domain,
    server_public_key,
    client_private_key,
    client_public_key)
    Identity
    (domain, client_public_key)
    Time
    Encryption algorithm:
    curve25519

    View Slide

  19. AbstractAuthenticator
    PyCon KR 2018
    @abc.abstractmethod
    async def server_identity(self) -> Identity:
    '''
    Return the identity of the server.
    Only used by the binder.
    '''
    raise NotImplementedError
    @abc.abstractmethod
    async def check_client(self, client_id: Identity) -> AuthResult:
    '''
    Check if the given domain and client public key
    is a valid one or not.
    Only used by the binder.
    '''
    raise NotImplementedError
    @abc.abstractmethod
    async def server_public_key(self) -> bytes:
    '''
    Return the public key of the server.
    Only used by the connector.
    '''
    raise NotImplementedError
    @abc.abstractmethod
    async def client_identity(self) -> Identity:
    '''
    Return the identity of the client.
    Only used by the connector.
    '''
    raise NotImplementedError
    @abc.abstractmethod
    async def client_public_key(self) -> bytes:
    '''
    Return the public key of the client.
    Only used by the connector.
    '''
    raise NotImplementedError
    Server-side Interface Client-side Interface

    View Slide

  20. AsyncScheduler for Request-Key Ordering
    § Scheduler features
    • Customizable ordering via request-keys
    • Integration with aiojobs to limit maximum concurrency
    • Each peer may have different scheduling policies!
    PyCon KR 2018
    1
    Type 0: No ordering
    Same color : same request key
    Number : global request index
    Yellow : request begins
    Red : response returns
    2
    3
    Client:
    Server:
    1 2 1
    2
    3 3
    Time
    Each request may be completely
    different RPC methods!

    View Slide

  21. AsyncScheduler for Request-Key Ordering
    PyCon KR 2018
    1
    Type 1: Return ordered by request-keys
    Type 2: Execution serialization by request-keys
    Same color : same request key
    Number : global request index
    Yellow : request begins
    Red : response returns
    2
    3
    Client:
    Server:
    1 2 1 2
    3 3
    1
    2
    3
    Client:
    Server:
    1 2 1 2
    3 3
    Time
    Time

    View Slide

  22. Multi-channel & Streaming
    PyCon KR 2018
    request 1
    request 2
    request 3
    response 1
    response 2
    response 3
    request 4 response 4
    chunked requests
    chunked responses
    Callosum
    client-side server-side
    network

    View Slide

  23. Challenge: Integration with IDLs
    § Apache Thrift
    • It was easy to integrate with Callosum.
    • Why: aiothrift + Thrift's runtime IDL loading scheme
    § How about others? Does it apply generally?
    • What are the requirements for IDL libraries?
    – Asynchronization often requires IDL compiler changes.
    • I hope to integrate with nirum!
    PyCon KR 2018

    View Slide

  24. Challenge: Performance
    § Minimizing protocol overheads
    • Main overhead: serialization & de-serialization
    • How to simplify the validation logic in Python's side?
    • How to reduce the number of memory copies from/to buffers?
    • How to reduce the network bandwidth usage?
    § Solution
    • msgpack
    • snappy
    PyCon KR 2018

    View Slide

  25. Ref) Serialization Benchmark
    PyCon KR 2018
    0%
    20%
    40%
    60%
    80%
    100%
    120%
    140%
    160%
    180%
    json simplejson ujson msgpack cbor marshal pickle pickle2
    Relative Time Taken (LOADS)
    Relative Time Taken (DUMPS)
    Relative Encoded Size
    Code from https://gist.github.com/cactus/4073643

    View Slide

  26. Future Work
    § More optimization
    • Cython
    • PyO3 + Rust
    § Application to Backend.AI
    • Replace the current manager-agent communication
    to eliminate the necessity of VPNs for hybrid-cloud &
    inter-cluster setups (e.g., )
    • Extend Callosum to make bundle of Callosum peerings
    ("bundle of bundles")
    PyCon KR 2018

    View Slide

  27. Retrospect
    https://github.com/lablup/callosum
    PyCon KR 2018

    View Slide

  28. Retrospect: Open Source
    § Yet another Backend.AI derived
    open source library!
    § Writing my own one
    vs. Maintaining existing one
    vs. Contributing to existing one
    § Tried to integrate with nirum
    (PyCon KR 2017), but had not
    enough time difficulties to extend
    its networking layer
    PyCon KR 2018
    On March 21st, aio-libs Gitter

    View Slide

  29. Retrospect: Development
    § Conference-driven development?!
    § 1 week to implement Thrift over ZeroMQ
    § Difficulties when interating external IDL libs
    • "API contamination": async forces all others to be async
    • To write async APIs without "async def"...
    – Live in the hell of callbacks
    – Rewrite entire Python on top of an event loop (node)
    – Monkey-patch standard networking functions (gevent)
    PyCon KR 2018

    View Slide

  30. Thanks!
    is hiring Python backend & DevOps engineers !
    [email protected]
    PyCon KR 2018

    View Slide