Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An introduction to Apache Accumulo

Mike Frampton
September 17, 2013

An introduction to Apache Accumulo

A short introduction to Apache Accumulo. What is it and
how does it relate to big table ? How does it use Hadoop,
Zookeeper and Thrift in its implementation ?

Mike Frampton

September 17, 2013
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Apache Accumulo • What is it ? • Design •

    Integrity • Administration • Squirrel www.semtech-solutions.co.nz [email protected]
  2. Accumulo – What is it ? • A key /

    value store • A column oriented database • Based on Google's Big Table • Based on – Apache Hadoop – Apache Zoo Keeper – Apache Thrift • Written in Java • Licensed by Apache www.semtech-solutions.co.nz [email protected]
  3. Accumulo – Design • Has cell level security via column

    visibility • Server side programming created via iterators • Table based constraints written in Java • Sharding can be used for parallel doc storage • Large rows can be larger than memory size www.semtech-solutions.co.nz [email protected]
  4. Accumulo – Integrity • Zookeeper used to manage master fail

    over • Write ahead logs written to each server • Logical time managed for – Consistant transactions – Bulk data import • Fate transactions ( Fault Tolerant Transactions ) – Transactions complete even after master failure • Isolation – Transactions see a consistant view of data at row level www.semtech-solutions.co.nz [email protected]
  5. Accumulo – Administration • System monitoring and stats via web

    page • System and table config stored in Zoo Keeper • Table naming stored in Zoo Keeper via id's • Follow threads of execution using tracing – Record time actions take place • Accumulo can be used with Squirrel server – As next slide shows – Future presentation will cover Squirrel www.semtech-solutions.co.nz [email protected]
  6. Accumulo – Data Management Internal Data Management • Locality groups

    – Group columns within a single file • Smart compaction – Smaller files merged with larger using definable ratio until all files merged • Minor compaction – To avoid max files being reached in memory files merged with larger files • Loading user created jars – Load Jars from HDFS using VFS www.semtech-solutions.co.nz [email protected]
  7. Accumulo – Data Management On Demand Data Management • Compactions

    – Force tablets ( table partitions ) to compact to a single file • Tablet merging – Request tablet merging via shell • Table cloning – Clone a table from an existing one, reference data / config • Table import / export – Copy table / meta data to another cluster www.semtech-solutions.co.nz [email protected]
  8. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems