Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Introduction to Apache HBase

An Introduction to Apache HBase

What is Apache HBase in terms of big data and Hadoop ?
How does it relate to the other Hadoop tools ?

Mike Frampton

July 10, 2013
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Apache Hadoop HBase • What is it ? • Why

    use it ? • Architecture • Storage • Related Projects
  2. Hbase – What is it ? • A Hadoop Data

    Store • A noSQL store for big data • It is Open Source, written in Java • It is a distributed database • Automatic sharding, table data spread over cluster • Automatic region server fail over
  3. Hbase – Why / When use it ? • Data

    in billions of rows • Complex data • High volume of I/O • High level of data nodes, 5 + • No need for extra RDBMS functions i.e. transactions
  4. HBase – Architecture • HBase is a data store •

    Uses Hadoop for distributed storage • Data stored across region servers • Region server data spread across HDFS data nodes • A write ahead log (WAL) is used to record changes
  5. HBase – Storage • Client makes call i.e. put •

    Request RPC'ed as key value to Region server • Key Value routed to region for row • Data is written to WAL • Data written to region memStore • If region server cashes WAL can be used to recover data
  6. HBase – Related Projects • Apache Flume – move large

    data sets to Hadoop • Apache Sqoop – cmd line, move rdbms data to Hadoop • Apache Hbase – Non relational database • Apache Pig – analyse large data sets • Apache Oozie – work flow scheduler • Apache Mahout – machine learning and data mining • Apache Hue – Hadoop user interface • Apache Zoo Keeper – configuration / build
  7. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems