Slide 1

Slide 1 text

Distributed Adaptive Model Rules for Mining Big Data Streams
 Anh Thu Vu, Gianmarco De Francisci Morales, Joao Gama, Albert Bifet

Slide 2

Slide 2 text

Motivation Regression: fundamental machine learning task Predict how much rain tomorrow Applications Trend prediction Click-through rate prediction 2

Slide 3

Slide 3 text

Regression Input: training examples with numeric label Output: model that predicts value of unlabeled instance x ŷ=ƒ(x) Minimize error
 MSE = ∑(y-ŷ)2 3

Slide 4

Slide 4 text

Setting Big Data Streams High velocity, large volume Large model Concept drift Scalable solution

Slide 5

Slide 5 text

SAMOA 5 SAMOA Data Mining Distributed Batch Hadoop Mahout Stream Storm, S4, Samza SAMOA Non Distributed Batch R, WEKA,… Stream MOA G. De Francisci Morales, A. Bifet: “SAMOA: Scalable Advanced Massive Online Analysis”. JMLR (2014) http://samoa-project.net

Slide 6

Slide 6 text

Rules     Rules Rules: self-contained, modular, easy to interpret,
 no need to cover universe keeps sufficient statistics to: make predictions expand the rule detect changes and anomalies 6

Slide 7

Slide 7 text

AMRules Rule sets Predicting with a rule set           E.g: x = [4, 1, 1, 2] ˆ f( x ) = X Rl 2S( x i ) ✓l ˆ yl, Adaptive Model Rules Ruleset: ensemble of rules Rule prediction: mean, linear model Ruleset prediction Weighted avg. of predictions of rules covering instance x Weights inversely proportional to error Default rule covers uncovered instances 7

Slide 8

Slide 8 text

Ensembles of Adaptive Model Rules from High-Speed Data Streams AMRules Rule sets Algorithm 1: Training AMRules Input : S: Stream of examples begin R {}, D 0 foreach ( x , y) 2 S do foreach Rule r 2 S( x ) do if ¬IsAnomaly( x , r) then if PHTest(errorr , ) then Remove the rule from R else Update sufficient statistics Lr ExpandRule(r) if S( x ) = ; then Update LD ExpandRule(D) if D expanded then R R [ D D 0 return (R, LD ) Rule Induction • Rule creation: default rule expansion • Rule expansion: split on attribute maximizing σ reduction • Hoeffding bound ε • Expand when σ1st /σ2nd < 1 - ε • Evict rule when drift is detected 
 (Page-Hinckley test error large) • Detect and explain local anomalies = r R2 ln(1/ ) 2n 8

Slide 9

Slide 9 text

DSPEs Live Streams Stream 1 Stream 2 Stream 3 PE PE PE PE PE External Persister Output 1 Output 2 Event routing 9

Slide 10

Slide 10 text

Example status.text:"Introducing #S4: a distributed #stream processing system" PE1 PE2 PE3 PE4 RawStatus null text="Int..." EV KEY VAL Topic topic="S4" count=1 EV KEY VAL Topic topic="stream" count=1 EV KEY VAL Topic reportKey="1" topic="S4", count=4 EV KEY VAL TopicExtractorPE (PE1) extracts hashtags from status.text TopicCountAndReportPE (PE2-3) keeps counts for each topic across all tweets. Regularly emits report event if topic count is above a configured threshold. TopicNTopicPE (PE4) keeps counts for top topics and outputs top-N topics to external persister 10

Slide 11

Slide 11 text

PE PE PEI PEI PEI PEI Groupings Key Grouping 
 (hashing) Shuffle Grouping
 (round-robin) All Grouping
 (broadcast) 11

Slide 12

Slide 12 text

PE PE PEI PEI PEI PEI Groupings Key Grouping 
 (hashing) Shuffle Grouping
 (round-robin) All Grouping
 (broadcast) 12

Slide 13

Slide 13 text

PE PE PEI PEI PEI PEI Groupings Key Grouping 
 (hashing) Shuffle Grouping
 (round-robin) All Grouping
 (broadcast) 12

Slide 14

Slide 14 text

PE PE PEI PEI PEI PEI Groupings Key Grouping 
 (hashing) Shuffle Grouping
 (round-robin) All Grouping
 (broadcast) 12

Slide 15

Slide 15 text

PE PE PEI PEI PEI PEI Groupings Key Grouping 
 (hashing) Shuffle Grouping
 (round-robin) All Grouping
 (broadcast) 13

Slide 16

Slide 16 text

PE PE PEI PEI PEI PEI Groupings Key Grouping 
 (hashing) Shuffle Grouping
 (round-robin) All Grouping
 (broadcast) 13

Slide 17

Slide 17 text

PE PE PEI PEI PEI PEI Groupings Key Grouping 
 (hashing) Shuffle Grouping
 (round-robin) All Grouping
 (broadcast) 13

Slide 18

Slide 18 text

PE PE PEI PEI PEI PEI Groupings Key Grouping 
 (hashing) Shuffle Grouping
 (round-robin) All Grouping
 (broadcast) 14

Slide 19

Slide 19 text

PE PE PEI PEI PEI PEI Groupings Key Grouping 
 (hashing) Shuffle Grouping
 (round-robin) All Grouping
 (broadcast) 14

Slide 20

Slide 20 text

PE PE PEI PEI PEI PEI Groupings Key Grouping 
 (hashing) Shuffle Grouping
 (round-robin) All Grouping
 (broadcast) 14

Slide 21

Slide 21 text

Model Aggregator Learner 1 Learner 2 Learner p Predictions Instances New Rules Rule Updates VAMR Vertical AMRules Model: rule body + head Target mean updated continuously
 with covered instances for predictions Default rule 
 (creates new rules) 15

Slide 22

Slide 22 text

VAMR Learner: statistics Vertical: Learner tracks statistics of independent subset of rules One rule tracked by only one Learner Model -> Learner: key grouping on rule ID Model Aggregator Learner 1 Learner 2 Learner p Predictions Instances New Rules Rule Updates 16

Slide 23

Slide 23 text

HAMR VAMR single model is bottleneck Hybrid AMRules
 (Vertical + Horizontal) Shuffle among multiple
 Models for parallelism Learners Model Aggregator 1 Model Aggregator 2 Model Aggregator r Predictions Instances New Rules Rule Updates Learners Learners 17

Slide 24

Slide 24 text

HAMR Problem: distributed default rule decreases performance Separate dedicate Learner for default rule Predictions Instances New Rules Rule Updates Learners Learners Learners Model Aggregator 2 Model Aggregator 2 Model Aggregators Default Rule Learner New Rules 18

Slide 25

Slide 25 text

Task Overview Instances, Rules, Predictions Double line = broadcast Source -> Model = shuffle grouping Model -> Learner = 
 key grouping Source Default Rule Learner Learner Model Aggregator Evaluator 19

Slide 26

Slide 26 text

Experiments 10-nodes Samza cluster + Kafka 2VCPUs, 4GB RAM Throughput, Accuracy, Memory usage Compare with sequential algorithm in MOA # instances # attributes Airlines 5.8M 10 Electricity 2M 12 Waveform 1M 40 20

Slide 27

Slide 27 text

Throughput (Airlines) 1 2 4 8 Parallelism Level Fig. 5: Throughput of distributed AMRules with electricity. 0 5 10 15 20 25 30 35 1 2 4 8 Throughput (thousands instances/second) Parallelism Level MAMR VAMR HAMR-1 HAMR-2 Fig. 6: Throughput of distributed AMRules with airlines. Fig. Al compu chang bottle instan coveri learne of the instan parall Th scalab the th model when throug is in t 21

Slide 28

Slide 28 text

Throughput (Electricity) 0 5 10 15 20 25 30 35 1 2 4 8 Throughput (thousands instances/second) Parallelism Level MAMR VAMR HAMR-1 HAMR-2 Fig. 5: Throughput of distributed AMRules with electricity. 0 5 10 15 20 25 30 35 Throughput (thousands instances/second) Fig. 7 22

Slide 29

Slide 29 text

Throughput (Waveform) ctricity. 0 5 10 15 20 25 30 35 1 2 4 8 Throughput (thousands instances/second) Parallelism Level MAMR VAMR HAMR-1 HAMR-2 Fig. 7: Throughput of distributed AMRules with waveform. 23

Slide 30

Slide 30 text

Throughput / Message Size (a) MAE Fig. 9: MAE and RMSE of distributed AMR 0 10 20 30 40 50 500 Airlines Electricity 1000 Waveform 2000 Throughput (thousands instances/second) Result message size (B) Reference Max throughput Fig. 8: Maximum throughput of HAMR vs message size. TABL datase TABL datase 24

Slide 31

Slide 31 text

Accuracy (Airlines) 8 0 0.005 0.01 0.015 0.02 1 2 4 8 RMSE/(Max-Min) Parallelism Level MAMR VAMR HAMR-1 HAMR-2 (b) RMSE 25

Slide 32

Slide 32 text

8 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 1 2 4 8 RMSE/(Max-Min) Parallelism Level MAMR VAMR HAMR-1 HAMR-2 (b) RMSE Accuracy (Electricity) 26

Slide 33

Slide 33 text

Accuracy (Waveform) 8 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1 2 4 8 RMSE/(Max-Min) Parallelism Level MAMR VAMR HAMR-1 HAMR-2 (b) RMSE 27

Slide 34

Slide 34 text

Memory Usage e. the ted the TABLE III: Memory consumption of VAMR for different datasets and parallelism levels. Dataset Parallelism Memory Consumption (MB) Model Aggregator Learner Avg. Std. Dev. Avg. Std. Dev. Electricity 1 266.5 5.6 40.1 4.3 2 264.9 2.8 23.8 3.9 4 267.4 6.6 20.1 3.2 8 273.5 3.9 34.7 29 Airlines 1 337.6 2.8 83.6 4.1 2 338.1 1.0 38.7 1.8 4 337.3 1.0 38.8 7.1 8 336.4 0.8 31.7 0.2 Waveform 1 286.3 5.0 171.7 2.5 2 286.8 4.3 119.5 10.4 4 289.1 5.9 46.5 12.1 8 287.3 3.1 33.8 5.7 28

Slide 35

Slide 35 text

Memory Usage e. the ted the TABLE III: Memory consumption of VAMR for different datasets and parallelism levels. Dataset Parallelism Memory Consumption (MB) Model Aggregator Learner Avg. Std. Dev. Avg. Std. Dev. Electricity 1 266.5 5.6 40.1 4.3 2 264.9 2.8 23.8 3.9 4 267.4 6.6 20.1 3.2 8 273.5 3.9 34.7 29 Airlines 1 337.6 2.8 83.6 4.1 2 338.1 1.0 38.7 1.8 4 337.3 1.0 38.8 7.1 8 336.4 0.8 31.7 0.2 Waveform 1 286.3 5.0 171.7 2.5 2 286.8 4.3 119.5 10.4 4 289.1 5.9 46.5 12.1 8 287.3 3.1 33.8 5.7 MRules with electricity dataset. ABLE II: Memory consumption of MAMR for different asets. Dataset Memory consumption (MB) Avg. Std. Dev. Electricity 52.4 2.1 Airlines 120.7 51.1 Waveform 223.5 8 ABLE III: Memory consumption of VAMR for different asets and parallelism levels. Dataset Parallelism Memory Consumption (MB) Model Aggregator Learner Avg. Std. Dev. Avg. Std. Dev. Electricity 1 266.5 5.6 40.1 4.3 2 264.9 2.8 23.8 3.9 4 267.4 6.6 20.1 3.2 8 273.5 3.9 34.7 29 Airlines 1 337.6 2.8 83.6 4.1 2 338.1 1.0 38.7 1.8 4 337.3 1.0 38.8 7.1 8 336.4 0.8 31.7 0.2 Waveform 28

Slide 36

Slide 36 text

Memory Usage (Learner) SAMOA Distributed Streaming Regression Rules Evaluation Conclusions Memory Usage Memory Usage of Learner 0 50 100 150 200 Airlines Electricity Waveform Average Memory Usage (MB) P=1 P=2 P=4 P=8 36 / 38 29

Slide 37

Slide 37 text

Conclusions Distributed streaming algorithm for regression Runs on top of distributed stream processing engines Up to ~5x increase in throughput Accuracy comparable with sequential algorithm Scalable memory usage 30