Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An introduction to Apache Storm

An introduction to Apache Storm

A short introduction to Apache Storm, what is it and how does it work ?
How can it provide real time data processing for big data ?

Mike Frampton

February 22, 2014
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Apache Storm • What is it ? • Architecture •

    Storm Vs Hadoop • History • Terms www.semtech-solutions.co.nz [email protected]
  2. Apache Storm – What is it ? • A real

    time big data processing system • Stream based • Fault tolerant and distributed • Non persistent • In the Apache incubator • Written in Clojure and Java • Released via an Eclipse license www.semtech-solutions.co.nz [email protected]
  3. Apache Storm – Storm Vs Hadoop Hadoop • Distributed &

    fault tolerant • Batch / file based • Master/slave plus Zoo Keeper • Persistent, uses HDFS • Big Data Analysis www.semtech-solutions.co.nz [email protected] Storm • Distributed & fault tolerant • Real time / stream based • Master/slave plus Zoo Keeper • Non persistent • Big Data analysis
  4. Apache Storm – Storm Vs Hadoop Hadoop Versus Storm •

    They are complementary technologies • They might both be used in a single system • Storm to process real time streams of data • Hadoop and M/R to process batched data on HDFS www.semtech-solutions.co.nz [email protected]
  5. Apache Storm – Architecture • Composed of stream of tuples,

    bolted together • sourced via spouts www.semtech-solutions.co.nz [email protected]
  6. Apache Storm – History What is Apache Storm's history ?

    • Developed by BackType • Acquired by Twitter • Open sourced by Twitter in Sept 2011 • Added to Apache Incubator in 2013 www.semtech-solutions.co.nz [email protected]
  7. Apache Storm – Terms • Tuple – an ordered list

    of elements • Stream – an unbounded feed of tuples • Spout – like a tap or faucet, a source of streams • Bolt – Functions / Filters etc to process streams • Topologies – ETL like architectures built from – Spouts, Streams, Bolts • Nimbus – master node, like Hadoop job tracker • Supervisor – controls worker processes www.semtech-solutions.co.nz [email protected]
  8. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems