enkidb - an Alternative to Mnesia with Unique Features

Slide 1

Slide 1 text

Benoit Chesneau @benoitc Erlang User Conference - 2014-06-09 enkidb! an alternative to mnesia

Slide 2

Slide 2 text

why using Erlang to build a database?

Slide 3

Slide 3 text

Database challenges • Collecting and organising data so they can be retrieved • Concurrency • ACID transactions

Slide 4

Slide 4 text

ACID , ‣ Atomicity ‣ Consistency ‣ Isolation ‣ Durability

Slide 5

Slide 5 text

Atomicity • each transaction is "all or nothing". • if one fail, the database stay unchanged • Erlang: let it crash & fault tolerance • processes fail fast

Slide 6

Slide 6 text

Consistency • take the database from one valid state to another. • Erlang supervision helps to maintain a consistent system • process recovery

Slide 7

Slide 7 text

Isolation • seriability: concurrent transactions result in the same system state as if they were executed serially. • Erlang: transactions processes are isolated from others • process messages queue • no shared memory • independent recovery

Slide 8

Slide 8 text

Durability • Once a transaction has been committed, it has been recorded in durable storage • Erlang reliability helps to maintain the durability.

Slide 9

Slide 9 text

a need for a speciﬁc database…

Slide 10

Slide 10 text

camp.io Easily coordinate multiple data sources coming from devices, peoples or services around the world through a decentralised data platform . 1 1 using the open source refuge solution: http://refuge.io

Slide 11

Slide 11 text

The Burning of the Library at Alexandria in 391 AD

Slide 12

Slide 12 text

copyists

Slide 13

Slide 13 text

camp.io • take the control of your data back • decentralizing data • replicas and snapshots around the world

Slide 14

Slide 14 text

queries should be decentralized • replicate snapshots data in difference parts of the world, ofﬁces or devices • queries happen on snapshots • sometimes ofﬂine • or disconnected from the primary source • and can be disconnected from other sources.

Slide 15

Slide 15 text

writes happen independently of reads • writes can be centralised • … or replicated • without interactions with other nodes. • over the net using HTTP(s) or not. • support transactional writes

Slide 16

Slide 16 text

mnesia partly ﬁt the bill

Slide 17

Slide 17 text

mnesia, partly ﬁt the bill • replication • Location transparency. • diskless nodes • transactions support with realtime capabilities (locks selection)

Slide 18

Slide 18 text

but • replication works only between connected Erlang nodes • no ofﬂine capabilities • transactions imply dialog between different nodes where there is a replica (write lock)

Slide 19

Slide 19 text

facts and   a bit of history…

Slide 20

Slide 20 text

we started by… using couchdb vs mnesia • limit of a database > 2 GB • master-master replication • no nodes connections needed: P2P • View indexation • Modern storage

Slide 21

Slide 21 text

refuge.io project 2011 couchdb hack 2012 rcouch 03/2014 opencouch enki 06/2014 The time we have lost

Slide 22

Slide 22 text

hack apache couchdb. make it OTPish • rcouch (http://github.com/rcouch) • major refactoring to create an Erlang CouchDB releases • some patches and new features • the view changes • WIP: merge back in Apache CouchDB

Slide 23

Slide 23 text

opencouch - the diet cure… • rcouch was too complicated to embed • in a need of a simpler API to add new features • need to able to use different transports • need something without all the extra • https://github.com/benoitc/opencouch

Slide 24

Slide 24 text

enki one step further…

Slide 25

Slide 25 text

enki design • document oriented database • blob support • 3 components • Peers • Updaters • Storage services

Slide 26

Slide 26 text

enki design application Peers Updater transactions & changes notiﬁcations write storage service read and replicate snapshots replicate

Slide 27

Slide 27 text

peers • Erlang library embedded in Erlang applications • send transactions to the updaters • query the storage services • edit locally (ofﬂine or not) • replication between peers • discovery of updaters and peers handled at the application level

Slide 28

Slide 28 text

peers • can replicate from Apache CouchDB • a REST server exists

Slide 29

Slide 29 text

replication • couchdb uses a revision tree • tested other solutions: • dotted version clock:  https://github.com/ricardobcl/Dotted-Version- Vectors • interval tree clocks:  https://github.com/ricardobcl/Interval-Tree-Clocks • settled to a revision tree with minor adjustments

Slide 30

Slide 30 text

enki revision tree • add concurrent edit concept (also deﬁned by damien katz) • multi-backend support

Slide 31

Slide 31 text

updater • only manage the transactions • can manage conﬂicts via stored functions or transaction functions • accept connections over different transport and using Erlang RPC. • more complicated than a gen_server but not so much.

Slide 32

Slide 32 text

how a document is stored in couchdb? • 2 indexes: by ID, by seq, • transaction happen at document level. • the value is the revision tree. There is one revision tree / document. • Each revisions are stored as immutable chunks in the database ﬁle, only reference are passed to the revision tree.

Slide 33

Slide 33 text

storage • key-value interface and CAS for updating • revision tree is stored as a value associated to the document key • revisions are stored as immutables values • can be remote (amazon dynamodb, postgres, riak..) or local (leveldb, cowdb) • use transaction capabilities of the storage if existing

Slide 34

Slide 34 text

cowdb : local storage engine • based on the Apache CouchDB btree • pure Erlang append only btree • Handle transactions • provide an easy api to store objects

Slide 35

Slide 35 text

the couchdb btreee • copy-on-write (COW) • append-only • can store multiple btrees • but use a lot of space (need to compact)

Slide 36

Slide 36 text

cbt: ﬁrst attempt to extract it • https://bitbucket.org/refugeio/cbt • low level. • wasn’t really usable by the end-developer • wanted to provide a simple way to handle it.

Slide 37

Slide 37 text

1. create a database and initialize a btree      2. initialize the btree      3. read a value 1> {ok, Fd} = cbt_file:open("test.db"). ! {ok,<0.35.0>}! 2> {ok, Btree} = cbt_btree:new(Fd).! {ok,{btree,<0.35.0>,nil,undefined,undefined,undefined,nil,! snappy,1279}} 3> {ok, Btree2} = cbt_btree:add(Btree, [{a, 1}]). {ok,{btree,<0.35.0>, {0,[],32}, undefined,undefined,undefined,nil,snappy,1279}} 4> Root = cbt_btree:get_state(Btree2). {0,[],32} 5> Header = {1, Root}. {1,{0,[],32}} 6> cbt_file:write_header(Fd, Header).

Slide 38

Slide 38 text

1. read the header      2. initialize the btree          3. read a value 1> {ok, Fd} = cbt_file:open("test.db"). {ok,<0.44.0>} 2> {ok, Header} = cbt_file:read_header(Fd). {ok,{1,{0,[],32}}} 12> cbt_btree:lookup(SnapshotBtree, [a]). [{ok,{a,1}}] 10> {_, Root} = Header. {1,{0,[],32}} 11> {ok, SnapshotBtree} = cbt_btree:open(Root, Fd). {ok,{btree,<0.44.0>, {0,[],32}, undefined,undefined,undefined,nil,snappy, 1279}}

Slide 39

Slide 39 text

useful but not for   the end developer.

Slide 40

Slide 40 text

cowdb another object database • https://bitbucket.org/refugeio/cowdb • wrapper around the couchdb btree • doesn’t depends on cbt (but should be probably)

Slide 41

Slide 41 text

initialize a database 1> {ok, Pid} = cowdb:open("testing.db", 1> fun(St, Db) -> cowdb:open_store(Db, "test") end !> ). {ok,<0.35.0>} initialize a store

Slide 42

Slide 42 text

simple transaction 2> cowdb:lookup(Pid, "test", [a,b]). [{ok,{a,1}},{ok,{b,2}}] 3> cowdb:transact(Pid, [{remove, "test", b}, {add, "test", {c, 3}}]). ok 4> cowdb:lookup(Pid, "test", [a,b,c]). [{ok,{a,1}},not_found,{ok,{c,3}}] 5> cowdb:get(Pid, "test", a). {ok,{a,1}}

Slide 43

Slide 43 text

transaction functions 7> cowdb:transact(Pid, [ {fn, fun(Db) -> [{add, "test", {d, 2}}] end} ]). ok 8> cowdb:lookup(Pid, "test", [d]). [{ok,{d,2}}] transaction function

Slide 44

Slide 44 text

opensourcing Enki Enki will be released under an opensource license. Paying support will be available.

Slide 45

Slide 45 text

? @benoitc Refuge.io