×
Copy
Open
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Distributed Streaming Albert Bifet May 2012
Slide 2
Slide 2 text
COMP423A/COMP523A Data Stream Mining Outline 1. Introduction 2. Stream Algorithmics 3. Concept drift 4. Evaluation 5. Classification 6. Ensemble Methods 7. Regression 8. Clustering 9. Frequent Pattern Mining 10. Distributed Streaming
Slide 3
Slide 3 text
Data Streams Big Data & Real Time
Slide 4
Slide 4 text
Distributed Systems Hadoop, S4 and Storm
Slide 5
Slide 5 text
Hadoop Hadoop
Slide 6
Slide 6 text
Hadoop Hadoop architecture
Slide 7
Slide 7 text
Apache Mahout Mahout: open source framework
Slide 8
Slide 8 text
Pig Pig: Similar to SQL
Slide 9
Slide 9 text
Pig A = LOAD ’data’ USING PigStorage() AS (f1:int, f2:int, f3:int); B = GROUP A BY f1; C = FOREACH B GENERATE COUNT ($0); DUMP C; Pig: Similar to SQL
Slide 10
Slide 10 text
Apache S4 Apache S4
Slide 11
Slide 11 text
Apache S4
Slide 12
Slide 12 text
Storm Storm from Twitter
Slide 13
Slide 13 text
Storm Stream, Spout, Bolt, Topology