time big data processing system • Stream based • Fault tolerant and distributed • Non persistent • In the Apache incubator • Written in Clojure and Java • Released via an Eclipse license www.semtech-solutions.co.nz [email protected]
fault tolerant • Batch / file based • Master/slave plus Zoo Keeper • Persistent, uses HDFS • Big Data Analysis www.semtech-solutions.co.nz [email protected] Storm • Distributed & fault tolerant • Real time / stream based • Master/slave plus Zoo Keeper • Non persistent • Big Data analysis
They are complementary technologies • They might both be used in a single system • Storm to process real time streams of data • Hadoop and M/R to process batched data on HDFS www.semtech-solutions.co.nz [email protected]
• Developed by BackType • Acquired by Twitter • Open sourced by Twitter in Sept 2011 • Added to Apache Incubator in 2013 www.semtech-solutions.co.nz [email protected]
of elements • Stream – an unbounded feed of tuples • Spout – like a tap or faucet, a source of streams • Bolt – Functions / Filters etc to process streams • Topologies – ETL like architectures built from – Spouts, Streams, Bolts • Nimbus – master node, like Hadoop job tracker • Supervisor – controls worker processes www.semtech-solutions.co.nz [email protected]
www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems