Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building and Integrating A Data Platform

Building and Integrating A Data Platform

Last year, Benoit and others were busy integrating and customising Barrel, a modern and opensource data platform with master-master replication written in Erlang.

This talk will describe the different challenge we faced in building a P2P middle-tier platform in Erlang to persist the data and “actors” states from micro-services in different levels of storage and replicate them between different locations. This talk will also describe the different patterns used for Reads and Writes concurrency but also continuous automated indexation of the data and “actors”.

F04edc7cb2099745e5413c754d3d22f5?s=128

Benoit Chesneau

March 16, 2018
Tweet

Transcript

  1. BUILDING AND INTEGRATING A DATA PLATFORM CodeBeam SF - 2018

  2. benoît chesneau craftsman working on P2P and custom data endpoints

    solution enki multimedia: the corporate interface member of the Erlang Industrial User Group about me
  3. micromobile services

  4. 1. Does my service do only one thing? 2. Is

    my service autonomous? 3. Does this service own its own data? a good micro-service?
  5. isolated own its own data resilient communicate with other by

    asynchronous messages micro-servcice
  6. sharing data in the mobile age between and across micro-services

    make applications more scalable and resilient Ex: messaging systems, sharing data
  7. sharing data update 
 and query microservice standard solution: client

    call a webservices to query and update the data problem: if connection is slow or absent the micro- service stops cloud 
 storage
  8. sharing data microservice local storage replicated always available eventually consistent

    cloud 
 storage update 
 and query synchronize local 
 storage
  9. barrel Bring and keep a view of your data near

    your application
  10. a database focusing on simplicity document oriented Automatic indexing Focusing

    on simplicity
  11. Docs are maps { “id” : “someid”, “Key” : “value”

    }
  12. automatic indexing Access by path: /locations/country/Germany

  13. Local first local first: bring and keep a view of

    your data near your application data is synchronised with other storages Replication to and from any sources
  14. partial view query node Title Text

  15. library embedded in your Erlang application(*) available as a micro-service

    via HTTP(1,2) or via the Erlang distribution Peer to peer: a barrel is the unit Semantic to allow distributed transactions P2p (*) including elixir or lfe, or ….
  16. every peers fork the master, updates are offline peers pull

    and merge from the main server works well for back pressure (writes can be delayed) CRDT semantic for conflict-free data structures
  17. causality no vector clock revision tree

  18. a a a b c commit Alice Iko Bob state

    t0 pull operations manual automatic rejected
  19. a a a b b Alice Iko Bob state t0

    rejected pull pull t1 c b d merge operations
  20. a a a b b Alice Iko Bob state t2

    pull t3 c b d pull operations
  21. ? ?? Erlang

  22. Erlang is slow Erlang is only for communications protocols I

    should do it rust… No access to low level memory and file systems APIS Why not Erlang
  23. Barrel is more a data orchestration service than a database

    Basic indexing Focus on replicating the data Nifs to help Why Erlang
  24. Doc: Revision + Metadata data: Read-Modify-Write: concurrency issue Incremental changes

    log: append only Indexes: when a new winning version is found the doc is indexed. Blobs (attachments) What we write
  25. Provides connectors for other storages RocksdDB for local persistent storage


    https://gitlab.com/barrel-db/erlang-rocksdb.git Dirty-nifs ETS? Use the right tool for …
  26. Goal: anticipate the resource usage at the node level Return

    early to the client Control applied to all resources in the nodes Back-pressure let it bend: be resilient
  27. worker_pool
 https://github.com/inaka/worker_pool Hard to debug your program Little control on

    the pending requests Ecpocxy but handle back-pressure the reverse way Simple pooling
  28. Clients and Jobs should be handled independently Active and passive

    regularion Request unit: to set the number of requests we want to serve / seconds Flow-Based programming? sbroker, partially fit the bill: 
 https://github.com/fishcakez/sbroker Dynamic regulation
  29. Started with a simple “Single Writer Multiple Readers” pattern bottleneck:

    A process to handle the final write to the database We do and // most of the work out of the write process Indexes are processed asynchronously (but a session can read its own writes if needed) Concurrency challenge
  30. Read access is shared via ETS On request a monitor

    to the db is created ets: to share the state between readers
  31. When using the erlang distribution, events are dispatched by nodes,

    processes always subscribe locally Events
  32. Erlang distribution is not used to share the data Erlang

    distribution can be switched HTTP transports Transport the data
  33. Roadmap

  34. 1.0: 24 march 2018 1.1: 24 april 2018 … Milestones

  35. 1.0: Websockets support (with new hackney) 1.1: Experimental: GRPC Coming

    features
  36. ?

  37. barrel is released in march 2018 https://barrel-db.org contact me @benoitc