Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An introduction to Apache Gora

An introduction to Apache Gora

A short introduction to Apache Gora, what is it and how does it work ?
How can it provide data store abstraction and persistency for big data ?

Mike Frampton

December 31, 2013
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Apache Gora • What is it ? • Gora –

    Nutch • Supports • Data Access • API's www.semtech-solutions.co.nz [email protected]
  2. Apache Gora – What is it ? • Provides for

    Big Data – In memory data model – Persistence – Data store abstraction • Supports persisting to – Column stores – Key/value stores – Document stores – RDBMS's • Supports use of Hadoop www.semtech-solutions.co.nz [email protected]
  3. Apache Gora – What is it ? • Released via

    Apache 2 license • Written in Java • Offers a persistence framework • Designed for big data applications • Used by Nutch 2.x for web crawl data storage • Used for – Persistence – Indexing – Analytics www.semtech-solutions.co.nz [email protected]
  4. Apache Gora – Nutch • Nutch 2.x now uses Gora

    – Abstracted storage – Data store independence – Handles object to persistent mappings – Use various NoSql solutions www.semtech-solutions.co.nz [email protected]
  5. Apache Gora – Supports • Gora supports the following –

    Apache Accumulo – Apache Cassandra – Apache Hbase – Amazon DynamoDB – Pig – Hive – Cascading – MapReduce www.semtech-solutions.co.nz [email protected]
  6. Apache Gora – Data Access • Java API for data

    access – Independent of location • Core Gora API's – Store – Persistency – Query – MapReduce www.semtech-solutions.co.nz [email protected]
  7. Apache Gora – Store API • Java API – org.apache.gora.store.*

    – DataStore handles object persistence – DataStore methods process objects • Persist • Fetch • Query • Delete www.semtech-solutions.co.nz [email protected]
  8. Apache Gora – Persistency API • Java API – org.apache.gora.persistency.*

    – Core classes • BeanFactory – Construct keys • Persistent – Persist objects • State – State managed through StateManager – NEW, CLEAN (UNMODIFIED) – DIRTY (MODIFIED), DELETED www.semtech-solutions.co.nz [email protected]
  9. Apache Gora – Query API • Java API – org.apache.gora.query.*

    – Core classes • Query – Constructed via DataStore • PartitionQuery – Divide results of Query into partitions. – Run queries on data nodes. – Generate Hadoop InputSplits • Result www.semtech-solutions.co.nz [email protected]
  10. Apache Gora – MapReduce API • Java API – org.apache.gora.mapreduce.*

    – GoraMapper – GoraReducer – ALL Record Counter – Reader – Writer – Hadoop / Avro • Serialise • De-serialise • Persistent www.semtech-solutions.co.nz [email protected]
  11. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems