Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building and Integrating A Data Platform

Building and Integrating A Data Platform

Last year, Benoit and others were busy integrating and customising Barrel, a modern and opensource data platform with master-master replication written in Erlang.

This talk will describe the different challenge we faced in building a P2P middle-tier platform in Erlang to persist the data and “actors” states from micro-services in different levels of storage and replicate them between different locations. This talk will also describe the different patterns used for Reads and Writes concurrency but also continuous automated indexation of the data and “actors”.

Benoit Chesneau

March 16, 2018
Tweet

More Decks by Benoit Chesneau

Other Decks in Technology

Transcript

  1. BUILDING AND INTEGRATING
    A DATA PLATFORM
    CodeBeam SF - 2018

    View Slide

  2. benoît chesneau
    craftsman working on P2P and custom data
    endpoints solution
    enki multimedia: the corporate interface
    member of the Erlang Industrial User Group
    about me

    View Slide

  3. micromobile
    services

    View Slide

  4. 1. Does my service do only one thing?
    2. Is my service autonomous?
    3. Does this service own its own data?
    a good micro-service?

    View Slide

  5. isolated
    own its own data
    resilient
    communicate with other by asynchronous
    messages
    micro-servcice

    View Slide

  6. sharing data in the mobile age between and across
    micro-services make applications more scalable
    and resilient
    Ex: messaging systems,
    sharing data

    View Slide

  7. sharing data
    update 

    and query
    microservice
    standard solution: client
    call a webservices to query
    and update the data
    problem: if connection is
    slow or absent the micro-
    service stops
    cloud 

    storage

    View Slide

  8. sharing data
    microservice
    local storage replicated
    always available
    eventually consistent
    cloud 

    storage
    update 

    and query
    synchronize
    local 

    storage

    View Slide

  9. barrel
    Bring and keep a
    view of your data
    near your application

    View Slide

  10. a database focusing on simplicity
    document oriented
    Automatic indexing
    Focusing on simplicity

    View Slide

  11. Docs are maps
    { “id” : “someid”,
    “Key” : “value” }

    View Slide

  12. automatic indexing
    Access by path: /locations/country/Germany

    View Slide

  13. Local first
    local first: bring and keep a view of your data near your
    application
    data is synchronised with other storages
    Replication to and from any sources

    View Slide

  14. partial view
    query
    node
    Title Text

    View Slide

  15. library embedded in your Erlang application(*)
    available as a micro-service via HTTP(1,2) or via
    the Erlang distribution
    Peer to peer: a barrel is the unit
    Semantic to allow distributed transactions
    P2p
    (*) including elixir or lfe, or ….

    View Slide

  16. every peers fork the master, updates are offline
    peers pull and merge from the main server
    works well for back pressure (writes can be
    delayed)
    CRDT semantic for conflict-free data structures

    View Slide

  17. causality
    no vector clock
    revision tree

    View Slide

  18. a a a
    b c
    commit
    Alice Iko Bob
    state
    t0
    pull
    operations
    manual
    automatic
    rejected

    View Slide

  19. a a a
    b b
    Alice Iko Bob
    state
    t0
    rejected
    pull
    pull
    t1
    c
    b
    d
    merge
    operations

    View Slide

  20. a a a
    b b
    Alice Iko Bob
    state
    t2
    pull
    t3
    c
    b
    d
    pull
    operations

    View Slide

  21. ?
    ??
    Erlang

    View Slide

  22. Erlang is slow
    Erlang is only for communications protocols
    I should do it rust…
    No access to low level memory and file systems
    APIS
    Why not Erlang

    View Slide

  23. Barrel is more a data orchestration service than a
    database
    Basic indexing
    Focus on replicating the data
    Nifs to help
    Why Erlang

    View Slide

  24. Doc: Revision + Metadata data:
    Read-Modify-Write: concurrency issue
    Incremental changes log: append only
    Indexes: when a new winning version is found the
    doc is indexed.
    Blobs (attachments)
    What we write

    View Slide

  25. Provides connectors for other storages
    RocksdDB for local persistent storage

    https://gitlab.com/barrel-db/erlang-rocksdb.git
    Dirty-nifs
    ETS?
    Use the right tool for …

    View Slide

  26. Goal: anticipate the resource usage at the node
    level
    Return early to the client
    Control applied to all resources in the nodes
    Back-pressure
    let it bend: be resilient

    View Slide

  27. worker_pool

    https://github.com/inaka/worker_pool
    Hard to debug your program
    Little control on the pending requests
    Ecpocxy but handle back-pressure the reverse way
    Simple pooling

    View Slide

  28. Clients and Jobs should be handled independently
    Active and passive regularion
    Request unit: to set the number of requests we
    want to serve / seconds
    Flow-Based programming?
    sbroker, partially fit the bill: 

    https://github.com/fishcakez/sbroker
    Dynamic regulation

    View Slide

  29. Started with a simple “Single Writer Multiple
    Readers” pattern
    bottleneck: A process to handle the final write to the
    database
    We do and // most of the work out of the write
    process
    Indexes are processed asynchronously (but a
    session can read its own writes if needed)
    Concurrency challenge

    View Slide

  30. Read access is shared via ETS
    On request a monitor to the db is created
    ets: to share the state
    between readers

    View Slide

  31. When using the erlang distribution, events are
    dispatched by nodes, processes always subscribe
    locally
    Events

    View Slide

  32. Erlang distribution is not used to share the data
    Erlang distribution can be switched
    HTTP transports
    Transport the data

    View Slide

  33. Roadmap

    View Slide

  34. 1.0: 24 march 2018
    1.1: 24 april 2018

    Milestones

    View Slide

  35. 1.0: Websockets support (with new hackney)
    1.1: Experimental: GRPC
    Coming features

    View Slide

  36. ?

    View Slide

  37. barrel is released in march 2018
    https://barrel-db.org
    contact me @benoitc

    View Slide