Slide 1

Slide 1 text

INTRODUCTION TO ELEPHANTDB A DISTRIBUTED KEY/VALUE DATA STORE FOR EXPORTING DATA FROM HADOOP / Soren Macbeth @sorenmacbeth

Slide 2

Slide 2 text

ANOTHER DATABASE? OH GOD WHY?!?! Hadoop is good at batch processing lots of data. Making the the results of those batch calculation available to higher layers isn't straightforward. This is what ElephantDB does. It is also the only thing that is does.

Slide 3

Slide 3 text

NOTABLE FEATURES Open source, originally created by at BackType Written in Creation of the database index is completely disassociated from serving the index The server is read-only Nathan Marz Clojure

Slide 4

Slide 4 text

BENEFITS Simple. Easy to use. Trivially Scalable.

Slide 5

Slide 5 text

DOMAIN CREATION Hadoop Input/OutputFormat Provided and taps Keys and values stored as byte arrays. Serialization left as an exercise to the reader Pluggable persistence engines. LevelDB and BerkeleyDB Java Edition are provided Domains are versioned. Cascading Cascalog

Slide 6

Slide 6 text

DOMAIN CREATION

Slide 7

Slide 7 text

EXAMPLE DOMAIN-SPEC.YML --- coordinator: elephantdb.persistence.LevelDB persistence_opts: {} shard_count: 60 shard_scheme: elephantdb.partition.HashModScheme

Slide 8

Slide 8 text

SERVING DOMAINS ElephantDB servers watch DFS for new versions of domains When a new version is available, servers automatically download and hotswaps in the latest version

Slide 9

Slide 9 text

SERVING DOMAINS

Slide 10

Slide 10 text

GETTING DATA interface Clojure and Python client provided get and multiGet Thrift-based

Slide 11

Slide 11 text

SIMPLE CLIENT INTERFACE (with-elephant "ip.a.b.c" 3578 client (multi-get client "some-domain" [k1 k2 k3 k4])) => {k1 v1, k2 v2, k3 v3, k4 v4}

Slide 12

Slide 12 text

IN PRODUCTION AT YIELDBOT 8 m1.xlarge instance cluster 500GB of data (compressed)

Slide 13

Slide 13 text

GITHUB https://github.com/nathanmarz/elephantdb

Slide 14

Slide 14 text

QUESTIONS?

Slide 15

Slide 15 text

YIELDBOT IS HIRING! http://yieldbot.com/jobs

Slide 16

Slide 16 text

No content