Upgrade to Pro — share decks privately, control downloads, hide ads and more …

High-Performance Storage Services with HailDB and Java

High-Performance Storage Services with HailDB and Java

This presentation introduces the St8 server, an open-source, REST-enabled storage service built using Jersey, Jetty, Guice, and most importantly, HailDB (formerly Embedded InnoDB). We describe interfacing HailDB with Java using JNA, and review benchmarks showing just how fast the service can be (including SSD as well as single-disk and RAID0 benchmarks). We also review a few use cases that are possible by using HailDB outside the context of SQL, such as high-performance counters and more efficient updates.

Sunny Gleason

April 14, 2011
Tweet

More Decks by Sunny Gleason

Other Decks in Technology

Transcript

  1. whoami • Sunny Gleason, human • passion: distributed systems engineering

    • previous... Ning : custom social networks Amazon.com : infra & web services • now... building cloud infrastructure
  2. whereami • twitter : twitter.com/sunnygleason • github : github.com/sunnygleason •

    linkedin : linkedin.com/in/sunnygleason • slideshare : slideshare.net/sunnygleason
  3. what’s in this presentation? • MySQL & NoSQL as Inspiration

    • HailDB & InnoDB • JNA: Integration with Java • St8 : A REST-Enabled Data Store • A Handful of Nifty Applications • Results & Next Steps
  4. prior art • Mad props to: • MySQL & InnoDB

    teams for creating InnoDB and Embedded InnoDB • Stewart Smith & Drizzle folks for leading the HailDB charge and encouraging plugin apis • Nokia & Percona for publishing results of their Voldemort / MySQL integration • Basho for publishing Riak / InnoStore integration
  5. MySQL & InnoDB • Super-Efficient Database Server • Tried &

    True Replication • Bulletproof Durability (when configured correctly) • Fantastic Stability, Predictability & Insight into Operation
  6. motivation • database on 1 box : ok • database

    with master/slave replication : ok • database on cluster : tricky • database on SAN : scary
  7. NoSQL • “Not Only” SQL • What’s the point? •

    Proponent: “reaching next level of scale” • Cynic: “cloud is hype, ops nightmare”
  8. what does it gain? • Higher performance, scalability, availability •

    More robust fault-tolerance • Simplified systems design • Easier operations
  9. what does it lose? • Reduced / simplified programming model

    • No ad-hoc queries, no joins, no txns • Not ACID: Weakened Atomicity / Consistency / Isolation / Durability • Operations / management is still evolving • Challenging to quantify health of system • Fewer domain experts
  10. NoSQL Map NoSQL Key-Value Store KV Stores (durable) KV Stores

    (volatile) Dynamo, Voldemort, Riak Memcached, Redis Column Store Cassandra, BigTable, HBase Graph Store Document Store CouchDB, MongoDB Neo4J
  11. durable vs. volatile • RAM is ridiculous speed (ns), not

    durable • Disk is persistent and slow (3-7ms) • RAID eases the pain a bit (4-8x throughput) • SSD is providing good promise (100-300us) • FusionIO is redefining the space (30-100us)
  12. performance & operational complexity* Complexity Aggregate Operations / Sec 1K

    10K 100K 1M MySQL +SSD +FusionIO + Sharding Memcached +Cluster Voldemort * This is not a real graph
  13. just a thought... What if we could use the highly

    optimized & durable ‘guts’ of MySQL without having to go through JDBC & SQL?
  14. enter HailDB • use case: Voldemort Storage Engine • let’s

    evaluate relative to other NoSQL options • focus on stability & predictability of performance • Graphs are throughput (ops/sec) vs. time
  15. experimental setup • OS X: 8-Core Xeon, 32GB RAM, 200GB

    OWC SSD • Faban Benchmark : PUT 64-byte key, 1024- byte value • Scenarios:1, 2, 4, 8 threads • 512M Java Heap
  16. BDB-JE • Log-Structured B-Tree • Fast Storage When Mostly Cached

    • Configured without fsync() by default - writes are batched and flushed periodically
  17. Krati • Fast Hash-Oriented Storage • Uses memory-mapped files for

    speed • Configured without fsync() by default - writes are batched and flushed periodically
  18. HailDB & Java • g414-haildb : where the magic happens

    • Open Source on GitHub • uses JNA: Java Native Access • dynamic binding to libhaildb shared library • auto-generate initial Java class from .h file (w/ JNAerator) • Pointer classes & other shenanigans
  19. implementation gotchas • InnoDB API-level usage is unclear • Synchronization

    & locking is unclear • Therefore... I learned to love reading C • Error handling is *nasty* • Native library installation a bit of a pain (need to configure LD_LIBRARY_PATH)
  20. kinder, friendlier APIs • Level 0: JNA bindings int err

    = ib_dostuff(); • Level 1: Object-Oriented Transaction t = db.openTransaction(); t.commit(); • Level 2: Templated dbt.inTransaction() { dbt.insert(value); } • Level 3: Functional Maps, Iteration, Filters, Apply
  21. St8 Server • HTTP-enabled Access to HailDB • PUT /1.0/t/mytable

    { "columns":[ {"name":"a","type":"INT","length":4}, {"name":"b","type":"INT","length":8}, {"name":"c","type":"BLOB","length":0}, ], "indexes":[ { "name":"P", "clustered":true,"unique":true, "indexColumns":[{"name":"a"}] } ] }
  22. rest-enabled access • GET /1.0/d/mytable;a=0 • POST /1.0/d/mytable;a=1;b=42;c=xyz • PUT

    /1.0/d/mytable;a=1;b=43;c=abc • DELETE /1.0/d/mytable;a=0 *This is matrix-param style, can also use form data style for specifying data
  23. cursors & iterators • GET /1.0/i/mytable.P?q=a+ge+4 • GET /1.0/i/mytable.SecIndex?q=b+le+4 •

    GET /1.0/i/mytable.SecIndex?q=b+le+4 &s=abce1212121ceeee2120911 • “s” value is opaque index key of next page of results - way better than LIMIT/OFFSET! (since HailDB can seek directly to the row)
  24. result • REST API provides fun, straightforward access from Ruby,

    Python, Java, Command- line... • very easy benchmarking with HTTP-based performance tools • range query support, and more efficient iteration model for large result sets than MySQL provides
  25. high-performance counts • GET /1.0/counts/mykey 0 • POST /1.0/counts/mykey[?inc=1] 1

    • POST /1.0/counts/mykey?inc=42 43 • DELETE /1.0/counts/mykey
  26. counts schema • HailDB count service schema _id int 8-byte

    unsigned, _key_hash int 8-byte unsigned, _key varchar(80), _count int 8-byte unsigned primary key (“_id”) unique key (“_key_hash”, “key”)
  27. operation: graph store • Social networks, recommendations, any relation you

    can think of • Which would you prefer? • SQL adjacency list, stored procedure, custom storage engine, external (Memcached), ... • Graph-aware HailDB application in Java
  28. nifty graph store 2 1 2 3 4 5 6

    8 GET /1.0/graph/topo?a=1&a=5&a=8 => [8, 6, 4, 3, 2, 5, 1]
  29. nifty recovery tool (Just an idea) • for recovery: shut

    down mysql server • run HailDB-enabled recovery tool • export as JSON or whatever
  30. wrap-up • HailDB & InnoDB are phenomenal • With g414-haildb,

    can be integrated directly into applications running on the JVM • All the InnoDB tuning tricks apply • Opens up new applications that are tricky with a traditional SQL database
  31. bonus material! • we probably didn’t get this far in

    the live presentation; the following material is here for eager, brave & interested folks...
  32. future work • Improve Packaging / Installation • Codify schema

    refinements & perf enhancements • Online backup/export with XtraBackup • JNI Bindings • PBXT explorations
  33. InnoDB tuning • Skinny columns, skinny rows! (esp. Primary Key)

    • Varchar enum ‘bad’, enum, int or smallint ‘good’ • fixed-width rows allow in-place updates • Use covering indexes strategically • More data per page means faster index scans, more efficient buffer pool utilization • You only get so many trx’s (read & write) on given CPU/RAM configuration - benchmark this! • Strategically offload reads to Memcached/Redis
  34. refined schema _id BIGINT (auto increment) _key_hash BIGINT _key VARBINARY(200)

    _version VARBINARY(200) _value BLOB PRIMARY KEY(_id) KEY(_key_hash)
  35. online backup • hot backup of data to other machine

    / destination • test Percona Xtrabackup with HailDB • next step: backup/export to Hadoop/HDFS (similar to Cloudera Sqoop tool)
  36. JNI bindings • JNI can get 2-5x perf boost vs.

    JNA • ... at the expense of nasty code • Will go for schema optimizations and InnoDB tuning tips *first*