Distributed StreamingAlbert BifetMay 2012
View Slide
COMP423A/COMP523A Data Stream MiningOutline1. Introduction2. Stream Algorithmics3. Concept drift4. Evaluation5. Classification6. Ensemble Methods7. Regression8. Clustering9. Frequent Pattern Mining10. Distributed Streaming
Data StreamsBig Data & Real Time
Distributed SystemsHadoop, S4 and Storm
HadoopHadoop
HadoopHadoop architecture
Apache MahoutMahout: open source framework
PigPig: Similar to SQL
PigA = LOAD ’data’ USING PigStorage() AS(f1:int, f2:int, f3:int);B = GROUP A BY f1;C = FOREACH B GENERATE COUNT ($0);DUMP C;Pig: Similar to SQL
Apache S4Apache S4
Apache S4
StormStorm from Twitter
StormStream, Spout, Bolt, Topology