High-Performance Storage Services with HailDB and Java

High-Performance Storage Services With Java and HailDB Sunny Gleason April
14, 2011

whoami • Sunny Gleason, human • passion: distributed systems engineering
• previous... Ning : custom social networks Amazon.com : infra & web services • now... building cloud infrastructure

whereami • twitter : twitter.com/sunnygleason • github : github.com/sunnygleason •
linkedin : linkedin.com/in/sunnygleason • slideshare : slideshare.net/sunnygleason

what’s in this presentation? • MySQL & NoSQL as Inspiration
• HailDB & InnoDB • JNA: Integration with Java • St8 : A REST-Enabled Data Store • A Handful of Nifty Applications • Results & Next Steps

prior art • Mad props to: • MySQL & InnoDB
teams for creating InnoDB and Embedded InnoDB • Stewart Smith & Drizzle folks for leading the HailDB charge and encouraging plugin apis • Nokia & Percona for publishing results of their Voldemort / MySQL integration • Basho for publishing Riak / InnoStore integration

MySQL & InnoDB • Super-Efﬁcient Database Server • Tried &
True Replication • Bulletproof Durability (when conﬁgured correctly) • Fantastic Stability, Predictability & Insight into Operation

motivation • database on 1 box : ok • database
with master/slave replication : ok • database on cluster : tricky • database on SAN : scary

NoSQL • “Not Only” SQL • What’s the point? •
Proponent: “reaching next level of scale” • Cynic: “cloud is hype, ops nightmare”

what does it gain? • Higher performance, scalability, availability •
More robust fault-tolerance • Simpliﬁed systems design • Easier operations

what does it lose? • Reduced / simpliﬁed programming model
• No ad-hoc queries, no joins, no txns • Not ACID: Weakened Atomicity / Consistency / Isolation / Durability • Operations / management is still evolving • Challenging to quantify health of system • Fewer domain experts

NoSQL Map NoSQL Key-Value Store KV Stores (durable) KV Stores
(volatile) Dynamo, Voldemort, Riak Memcached, Redis Column Store Cassandra, BigTable, HBase Graph Store Document Store CouchDB, MongoDB Neo4J

durable vs. volatile • RAM is ridiculous speed (ns), not
durable • Disk is persistent and slow (3-7ms) • RAID eases the pain a bit (4-8x throughput) • SSD is providing good promise (100-300us) • FusionIO is redeﬁning the space (30-100us)

performance & operational complexity* Complexity Aggregate Operations / Sec 1K
10K 100K 1M MySQL +SSD +FusionIO + Sharding Memcached +Cluster Voldemort * This is not a real graph

just a thought... What if we could use the highly
optimized & durable ‘guts’ of MySQL without having to go through JDBC & SQL?

enter HailDB • use case: Voldemort Storage Engine • let’s
evaluate relative to other NoSQL options • focus on stability & predictability of performance • Graphs are throughput (ops/sec) vs. time

Voldemort schema _key VARBINARY(200) _version VARBINARY(200) _value BLOB PRIMARY KEY(_key,
_version)

experimental setup • OS X: 8-Core Xeon, 32GB RAM, 200GB
OWC SSD • Faban Benchmark : PUT 64-byte key, 1024- byte value • Scenarios:1, 2, 4, 8 threads • 512M Java Heap

BDB-JE • Log-Structured B-Tree • Fast Storage When Mostly Cached
• Conﬁgured without fsync() by default - writes are batched and ﬂushed periodically

Perf: BDB Put 100%

Krati • Fast Hash-Oriented Storage • Uses memory-mapped files for
speed • Configured without fsync() by default - writes are batched and flushed periodically

Perf: Krati Put 100%

Perf: HailDB Put 100%

HailDB & Java • g414-haildb : where the magic happens
• Open Source on GitHub • uses JNA: Java Native Access • dynamic binding to libhaildb shared library • auto-generate initial Java class from .h ﬁle (w/ JNAerator) • Pointer classes & other shenanigans

implementation gotchas • InnoDB API-level usage is unclear • Synchronization
& locking is unclear • Therefore... I learned to love reading C • Error handling is *nasty* • Native library installation a bit of a pain (need to conﬁgure LD_LIBRARY_PATH)

kinder, friendlier APIs • Level 0: JNA bindings int err
= ib_dostuff(); • Level 1: Object-Oriented Transaction t = db.openTransaction(); t.commit(); • Level 2: Templated dbt.inTransaction() { dbt.insert(value); } • Level 3: Functional Maps, Iteration, Filters, Apply

St8 Server • HTTP-enabled Access to HailDB • PUT /1.0/t/mytable
{ "columns":[ {"name":"a","type":"INT","length":4}, {"name":"b","type":"INT","length":8}, {"name":"c","type":"BLOB","length":0}, ], "indexes":[ { "name":"P", "clustered":true,"unique":true, "indexColumns":[{"name":"a"}] } ] }

rest-enabled access • GET /1.0/d/mytable;a=0 • POST /1.0/d/mytable;a=1;b=42;c=xyz • PUT
/1.0/d/mytable;a=1;b=43;c=abc • DELETE /1.0/d/mytable;a=0 *This is matrix-param style, can also use form data style for specifying data

cursors & iterators • GET /1.0/i/mytable.P?q=a+ge+4 • GET /1.0/i/mytable.SecIndex?q=b+le+4 •
GET /1.0/i/mytable.SecIndex?q=b+le+4 &s=abce1212121ceeee2120911 • “s” value is opaque index key of next page of results - way better than LIMIT/OFFSET! (since HailDB can seek directly to the row)

result • REST API provides fun, straightforward access from Ruby,
Python, Java, Command- line... • very easy benchmarking with HTTP-based performance tools • range query support, and more efﬁcient iteration model for large result sets than MySQL provides

high-performance counts • GET /1.0/counts/mykey 0 • POST /1.0/counts/mykey[?inc=1] 1
• POST /1.0/counts/mykey?inc=42 43 • DELETE /1.0/counts/mykey

counts schema • HailDB count service schema _id int 8-byte
unsigned, _key_hash int 8-byte unsigned, _key varchar(80), _count int 8-byte unsigned primary key (“_id”) unique key (“_key_hash”, “key”)

raid0 put counts

ssd put counts

raid0 put/get

ssd put/get

operation: graph store • Social networks, recommendations, any relation you
can think of • Which would you prefer? • SQL adjacency list, stored procedure, custom storage engine, external (Memcached), ... • Graph-aware HailDB application in Java

nifty graph store 1 GET /1.0/graph/bfs?a=1&maxDepth=3 => [[1, 0], [2,
1], [3, 2], [4, 3], [5, 3]] 1 2 3 4 5 6 8

nifty graph store 2 1 2 3 4 5 6
8 GET /1.0/graph/topo?a=1&a=5&a=8 => [8, 6, 4, 3, 2, 5, 1]

nifty recovery tool (Just an idea) • for recovery: shut
down mysql server • run HailDB-enabled recovery tool • export as JSON or whatever

wrap-up • HailDB & InnoDB are phenomenal • With g414-haildb,
can be integrated directly into applications running on the JVM • All the InnoDB tuning tricks apply • Opens up new applications that are tricky with a traditional SQL database

resources • github.com/sunnygleason/g414-st8 github.com/sunnygleason/g414-haildb • haildb.com • jna.dev.java.net

Questions? Thank You!

bonus material! • we probably didn’t get this far in
the live presentation; the following material is here for eager, brave & interested folks...

future work • Improve Packaging / Installation • Codify schema
reﬁnements & perf enhancements • Online backup/export with XtraBackup • JNI Bindings • PBXT explorations

InnoDB tuning • Skinny columns, skinny rows! (esp. Primary Key)
• Varchar enum ‘bad’, enum, int or smallint ‘good’ • fixed-width rows allow in-place updates • Use covering indexes strategically • More data per page means faster index scans, more efficient buffer pool utilization • You only get so many trx’s (read & write) on given CPU/RAM configuration - benchmark this! • Strategically offload reads to Memcached/Redis

HailDB schema _key VARBINARY(200) _version VARBINARY(200) _value BLOB PRIMARY KEY(_key,
_version)

reﬁned schema _id BIGINT (auto increment) _key_hash BIGINT _key VARBINARY(200)
_version VARBINARY(200) _value BLOB PRIMARY KEY(_id) KEY(_key_hash)

online backup • hot backup of data to other machine
/ destination • test Percona Xtrabackup with HailDB • next step: backup/export to Hadoop/HDFS (similar to Cloudera Sqoop tool)

JNI bindings • JNI can get 2-5x perf boost vs.
JNA • ... at the expense of nasty code • Will go for schema optimizations and InnoDB tuning tips *ﬁrst*

Thank You!

High-Performance Storage Services with HailDB a...

High-Performance Storage Services with HailDB and Java

More Decks by Sunny Gleason

Other Decks in Technology

Featured

Transcript