Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An introduction to Apache Cassandra

An introduction to Apache Cassandra

A introduction to Apache Cassandra, what is it and
how does it work ? How can it be used with Hadoop
and how does it perorm ?

Mike Frampton

August 13, 2013
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Apache Cassandra • What is it ? • How does

    it work ? • Hadoop • Tools • Architecture www.semtech-solutions.co.nz [email protected]
  2. Cassandra – What is it ? • Distributed database management

    system • Designed for big data • Scalable • Fault tolerant • No single point of failure • Has an SQL like query language • NoSQL www.semtech-solutions.co.nz [email protected]
  3. Cassandra – How does it work ? • Organises data

    into tables • Uses Cassandra Query Language ( CQL ) • Does not allow sub queries or joins • Supports Hadoop Map Reduce • Uses asynchronous masterless replication – Gives low latency • Allows indexing • Allows batch analysis via Hadoop www.semtech-solutions.co.nz [email protected]
  4. Cassandra – Hadoop How does Cassandra integrate with Hadoop •

    Support for Map Reduce • Integration with – Apache Pig – Apache Hive • Can also act as a back end for Solr ! www.semtech-solutions.co.nz [email protected]
  5. Cassandra – Tools • User Interface ( GUI ) –

    Cassandra GUI – Toad for cloud db's • Administration – OpsCentre – Cassandra Cluster Admin • Other – Client libraries – Java, Python, .Net, Perl etc www.semtech-solutions.co.nz [email protected]
  6. Cassandra – Architecture • A peer to peer cluster •

    No single point of failure • Tunable consistency – Is performance or accuracy more important ? • Query by key or key range • Row oriented data storage • Rows can hold up to 2 billion columns www.semtech-solutions.co.nz [email protected]