A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments
Gregory Ditzler The University of Arizona Department of Electrical & Computer Engineering [email protected] http://www2.engr.arizona.edu/˜ditzler 25 August 2016 IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Overview Plan of Attack 1 Overview of concept drift with
multiple experts 2 Using SML to estimate voting weights 3 Simulations & real-world data streams 4 Conclusions & discussion Dow Jones Industrial Average index with the 30 companies associated with the index 1993 1998 2004 2009 0 50 100 150 200 250 IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Learning in Nonstationary Environments Learning in Nonstationary Environments Learning Modalities
Online Learning Incremental Learning Supervised vs. Unsupervised Drift Detection Model Adaptation Passive Approach Active Approach Knowledge Shifts Transfer Learning Domain Adaptation Covariate Shift Applications Sensor Networks Spam Prediction Electrical Load Forecasting Big Data Time-series & Data Stream 1G. Ditzler, M. Roveri, C. Alippi, and R. Polikar, “Adaptive strategies for learning in nonstationary environments: a survey,” IEEE Computational Intelligence Magazine , 2015, vol. 10, no. 4, pp. 12–25. IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Concept Drift & Multiple Experts Concept drift Concept drift can
be modeled as a change in a probability distribution, P ( X, Y ) . The change can be in P ( X ) , P ( X|Y ) , P ( Y ) , or joint changes in P ( ⌦|X ) . P ( Y|X ) = P ( X|Y ) P ( Y ) P ( X ) We generally reserve names for speciﬁc types of drift (e.g., real and virtual ) Drift types: sudden, gradual, incremental, & reoccurring General Examples: electricity demand, ﬁnancial, climate, epidemiological, and spam (to name a few) 0 5 10 15 20 25 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 dα/dt time Constant Sinusoidal Exponential IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Concept Drift & Multiple Experts Concept drift Concept drift can
be modeled as a change in a probability distribution, P ( X, Y ) . The change can be in P ( X ) , P ( X|Y ) , P ( Y ) , or joint changes in P ( ⌦|X ) . P ( Y|X ) = P ( X|Y ) P ( Y ) P ( X ) We generally reserve names for specific types of drift (e.g., real and virtual ) Drift types: sudden, gradual, incremental, & reoccurring General Examples: electricity demand, financial, climate, epidemiological, and spam (to name a few) 0 5 10 15 20 25 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 dα/dt time Constant Sinusoidal Exponential Incremental Learning Incremental learning can be summarized as the preservation of old knowledge without access to old data. Desired concept drift algorithm should find a balance between prior knowledge ( stability ) and new knowledge ( plasticity ). . . Stability-Plasticity Dilemma Ensembles have been shown to provide a good balance between stability and plasticity IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Concept Drift & Multiple Experts Incremental Learning Procedure Train Expert
ht Update Weights Predict St+ 1 Measure ` ( H, ft+ 1) S 1 , . . . , St ` ( H, ft+ 1) H ( x ) = t X k= 1 wk,t hk ( x ) ) H = t X k= 1 wk,t hk and ˆ y = sign( H ) Algorithms that follow this setting ( more or less ): Learn++.NSE, SERA, SEA, DWM, . . . We’re going to look at NSE learning scenario from passive perspective . See Manuel’s tutorial! http://www.wcci2016.org/document/tutorials/ijcnn4.pdf IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Concept Drift & Multiple Experts Incremental Learning Procedure Train Expert
ht Update Weights Predict St+ 1 Measure ` ( H, ft+ 1) S 1 , . . . , St ` ( H, ft+ 1) H ( x ) = t X k= 1 wk,t hk ( x ) ) H = t X k= 1 wk,t hk and ˆ y = sign( H ) Algorithms that follow this setting ( more or less ): Learn++.NSE, SERA, SEA, DWM, . . . We’re going to look at NSE learning scenario from passive perspective . See Manuel’s tutorial! http://www.wcci2016.org/document/tutorials/ijcnn4.pdf The twist!: Concept Drift Concept drift is characterized by changes in ft (i.e., the oracle) or change in the distribution on x Old experts/classiﬁers/hypotheses have varying degrees of relevancy at time t. How should we combine the experts? Loss t+ 1( H )  some interpretable quantity IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Preliminaries How Are the MES Weights Determined? A common weight
for a classiﬁer in an ensemble is typically of the form wk = log 1 ✏k ✏k where ✏k is some expected measure of loss for the kth classiﬁer. How do the expert weights affect the loss when being tested on an unknown distribution? 1G. Ditzler, G. Rosen, and R. Polikar, “Domain adaptation bounds for multiple expert systems under concept drift,” in International Joint Conference on Neural Networks, 2014. (Best Student Paper). IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

for a classiﬁer in an ensemble is typically of the form wk = log 1 ✏k ✏k where ✏k is some expected measure of loss for the kth classiﬁer. How do the expert weights affect the loss when being tested on an unknown distribution? Using Domain Adaptation to Clarify the Bound Combining the works by Ben-David et al. (2010) and Ditzler et al. (2014) gives us: ET ⇥` ( H, fT ) ⇤  t X k= 1 wk,t ✓Ek ⇥` ( hk, fk ) ⇤ + T,k + 1 2 ˆ d H H ( UT , Uk ) + O 0 B B B B B @ r⌫ log m m 1 C C C C C A ◆ where T,k is a measure of disagreement between fk and fT ( a bit unfortunate) 1G. Ditzler, G. Rosen, and R. Polikar, “Domain adaptation bounds for multiple expert systems under concept drift,” in International Joint Conference on Neural Networks, 2014. (Best Student Paper). IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

for a classiﬁer in an ensemble is typically of the form wk = log 1 ✏k ✏k where ✏k is some expected measure of loss for the kth classiﬁer. How do the expert weights affect the loss when being tested on an unknown distribution? Using Domain Adaptation to Clarify the Bound Combining the works by Ben-David et al. (2010) and Ditzler et al. (2014) gives us: ET ⇥` ( H, fT ) ⇤  t X k= 1 wk,t ✓Ek ⇥` ( hk, fk ) ⇤ + T,k + 1 2 ˆ d H H ( UT , Uk ) + O 0 B B B B B @ r⌫ log m m 1 C C C C C A ◆ where T,k is a measure of disagreement between fk and fT ( a bit unfortunate) Weighted sum of: training loss + disagreement of fk and fT + divergence of Dk and DT T,k encapsulates real-drift , where are as ˆ d H H is virtual drift. More over, existing algorithms using the loss on the most recent labelled distribution are missing out on the other changes that could occur. 1G. Ditzler, G. Rosen, and R. Polikar, “Domain adaptation bounds for multiple expert systems under concept drift,” in International Joint Conference on Neural Networks, 2014. (Best Student Paper). IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

How do we learn? Setting the Stage Determining the appropriate
weights if we do not assume something about the nature of the drift is a very difﬁcult problem. IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

weights if we do not assume something about the nature of the drift is a very difﬁcult problem. The limited drift assumption! IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

weights if we do not assume something about the nature of the drift is a very difﬁcult problem. The limited drift assumption! Recall the different drift types: sudden, gradual, incremental, or reoccurring IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

weights if we do not assume something about the nature of the drift is a very difﬁcult problem. The limited drift assumption! Recall the different drift types: sudden, gradual, incremental, or reoccurring Concept 1 Concept 2 Concept 3 Concept 4 Concept 2 Concept 3 Concept 4 Concept 4 Training Concepts Testing Concepts Concept 1 Concept 4 Gradual6Change Reoccurring Concept t1 t2 t3 t4 t5 IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

weights if we do not assume something about the nature of the drift is a very difﬁcult problem. The limited drift assumption! Recall the different drift types: sudden, gradual, incremental, or reoccurring Using the recent error could be problematic if there is an abrupt reoccurring change that has not been encountered recently Concept 1 Concept 2 Concept 3 Concept 4 Concept 2 Concept 3 Concept 4 Concept 4 Training Concepts Testing Concepts Concept 1 Concept 4 Gradual6Change Reoccurring Concept t1 t2 t3 t4 t5 IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

What is Spectral Meta-Learning (SML)? Spectral Meta-Learning Spectral Meta-Learning (SML)
provides an approach to rank classiﬁers on data. 1F. Parisi, F. Strino, B. Nadler, and Y. Kluger, “Ranking and combining multiple predictors without labeled data,” Proceedings of the National Academy of Sciences, vol. 111, no. 4, pp. 1253?1258, 2014. IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

provides an approach to rank classifiers on data. SML assumes there are k binary classifiers of unknown reliability (but, better than chance), each providing predicted class labels on unlabeled data. SML can rank the the classifiers based on their accuracy if: 1F. Parisi, F. Strino, B. Nadler, and Y. Kluger, “Ranking and combining multiple predictors without labeled data,” Proceedings of the National Academy of Sciences, vol. 111, no. 4, pp. 1253?1258, 2014. IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

provides an approach to rank classifiers on data. SML assumes there are k binary classifiers of unknown reliability (but, better than chance), each providing predicted class labels on unlabeled data. SML can rank the the classifiers based on their accuracy if: The evaluation set are sampled i.i.d. from DT . 1F. Parisi, F. Strino, B. Nadler, and Y. Kluger, “Ranking and combining multiple predictors without labeled data,” Proceedings of the National Academy of Sciences, vol. 111, no. 4, pp. 1253?1258, 2014. IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

provides an approach to rank classifiers on data. SML assumes there are k binary classifiers of unknown reliability (but, better than chance), each providing predicted class labels on unlabeled data. SML can rank the the classifiers based on their accuracy if: The evaluation set are sampled i.i.d. from DT . The classifiers are conditionally independent 1F. Parisi, F. Strino, B. Nadler, and Y. Kluger, “Ranking and combining multiple predictors without labeled data,” Proceedings of the National Academy of Sciences, vol. 111, no. 4, pp. 1253?1258, 2014. IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

provides an approach to rank classifiers on data. SML assumes there are k binary classifiers of unknown reliability (but, better than chance), each providing predicted class labels on unlabeled data. SML can rank the the classifiers based on their accuracy if: The evaluation set are sampled i.i.d. from DT . Tough to prove in NSE :( The classifiers are conditionally independent 1F. Parisi, F. Strino, B. Nadler, and Y. Kluger, “Ranking and combining multiple predictors without labeled data,” Proceedings of the National Academy of Sciences, vol. 111, no. 4, pp. 1253?1258, 2014. IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

provides an approach to rank classifiers on data. SML assumes there are k binary classifiers of unknown reliability (but, better than chance), each providing predicted class labels on unlabeled data. SML can rank the the classifiers based on their accuracy if: The evaluation set are sampled i.i.d. from DT . Tough to prove in NSE :( The classifiers are conditionally independent Let us assume the evaluation data are i.i.d. 1F. Parisi, F. Strino, B. Nadler, and Y. Kluger, “Ranking and combining multiple predictors without labeled data,” Proceedings of the National Academy of Sciences, vol. 111, no. 4, pp. 1253?1258, 2014. IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

provides an approach to rank classifiers on data. SML assumes there are k binary classifiers of unknown reliability (but, better than chance), each providing predicted class labels on unlabeled data. SML can rank the the classifiers based on their accuracy if: The evaluation set are sampled i.i.d. from DT . Tough to prove in NSE :( The classifiers are conditionally independent Let us assume the evaluation data are i.i.d. A Couple of Notes SML does not estimate the 0-1 error, rather it estimates a balanced error The combinations are determined in an unsupervised fashion 1F. Parisi, F. Strino, B. Nadler, and Y. Kluger, “Ranking and combining multiple predictors without labeled data,” Proceedings of the National Academy of Sciences, vol. 111, no. 4, pp. 1253?1258, 2014. IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

provides an approach to rank classifiers on data. SML assumes there are k binary classifiers of unknown reliability (but, better than chance), each providing predicted class labels on unlabeled data. SML can rank the the classifiers based on their accuracy if: The evaluation set are sampled i.i.d. from DT . Tough to prove in NSE :( The classifiers are conditionally independent Let us assume the evaluation data are i.i.d. A Couple of Notes SML does not estimate the 0-1 error, rather it estimates a balanced error The combinations are determined in an unsupervised fashion Evaluating SML in NSE Build diverse classifiers on a stream of data Estimate the classifier voting weights solely from SML’s error estimates on unlabeled data 1F. Parisi, F. Strino, B. Nadler, and Y. Kluger, “Ranking and combining multiple predictors without labeled data,” Proceedings of the National Academy of Sciences, vol. 111, no. 4, pp. 1253?1258, 2014. IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Spectral Meta-Learning for Streaming Data IJCNN 2016 A Study of
an Incremental Spectral Meta-Learner for Nonstationary Environments

Spectral Meta-Learning for Streaming Data Lets skip to the highlights
reel IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Spectral Meta-Learning for Streaming Data Overview The proposed study implements
a wrapper around the SML weighting strategy How well does it perform against Learn++.NSE, averaging, and FTL? Train Expert ht Update Weights Predict St+ 1 Measure ` ( H, ft+ 1) S 1 , . . . , St ` ( H, ft+ 1) 1: Input: labeled data At = { ( xi, yi ) }mt i= 1 , unlabeled test data Bt = n⇣xj ⌘ont j= 1 , and Learn base classiﬁcation algorithm 2: for t = 1 , 2 , . . . do 3: Call Learn with At and receive classiﬁer ht 4: Predict class labels for xj 2 Bt to construct Ht 2 Rt⇥nt 5: ft SML ( Ht ) // See Parisi et al. (2014) 6: end for 7: Output: Prediction’s ft at each time t. IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Evaluation Scenarios Test-then-Train Data are used for testing (unlabeled data)
prior to training on it (once the labels are received) Test-then-train strategy is not ideal for the data streams if we want to evaluate the learning scenario with abrupt reoccurring concepts Test-on-Last Consider cyclical environments such as weather classiﬁcation where seasons change from winter ! spring ! summer ! fall ! winter. Evaluating on one season would couple capture the abrupt reoccurring environment !1 !2 !3 !%−1 … !% Data%Sets Train Train Train Test Test Test !1 !2 !3 !%−1 … !% Data%Sets Train Train Train Test Test Test t1 t2 tN 1 IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Results Table: Error averaged over all time points in each
of the experiments. The number in parenthesis represents the rank on the algorithm on a data sets (lower is better). Errors are measured in the test-then-train scenario. Sense AVG NSE FTL poker 20.16 (3) 22.23 (4) 17.99 (2) 16.91 (1) noaa 27.30 (3) 24.11 (1) 24.65 (2) 36.91 (4) elec2 35.27 (3) 35.23 (2) 31.57 (1) 36.88 (4) spam 9.54 (1) 9.84 (2) 12.15 (3) 20.51 (4) sea 6.62 (2) 6.72 (3) 3.76 (1) 7.33 (4) air 37.42 (3) 37.11 (2) 36.07 (1) 42.68 (4) ﬁnal 2.5 2.33 1.67 3.5 Table: -statistic averaged over all time points in each of the experiments. -statistics are measured in the test-then-train scenario. Sense AVG NSE FTL poker 60.08 (1) 55.30 (4) 59.73 (2) 57.61 (3) noaa 36.03 (1) 35.93 (2) 35.83 (3) 17.80 (4) elec2 28.83 (2) 23.13 (3) 30.23 (1) 21.76 (4) spam 79.51 (1) 78.66 (2) 73.98 (3) 56.58 (4) sea 85.61 (2) 85.38 (3) 91.73 (1) 83.75 (4) air 20.06 (1) 18.91 (3) 19.71 (2) 9.54 (4) ﬁnal 1.33 2.83 2.00 3.83 IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Results Table: Error averaged over all time points in each
of the experiments. Errors are measured in the test-on-hold-out scenario. Sense AVG NSE FTL poker 16.34 (2) 16.43 (3) 15.29 (1) 19.05 (4) noaa 44.58 (1) 46.61 (2) 47.14 (3) 50.60 (4) elec2 18.08 (1) 23.14 (2) 29.99 (3) 40.32 (4) spam 9.00 (1) 9.28 (2) 10.84 (3) 18.84 (4) sea 9.50 (2) 9.85 (3) 8.79 (1) 10.78 (4) air 42.12 (1) 44.23 (3) 42.47 (2) 46.73 (4) 1.3333 2.5 2.1667 4 Table: -statistic averaged over all time points in each of the experiments. -statistics are measured in the test-on-hold-out scenario. Sense AVG NSE FTL poker 67.72 (2) 67.48 (3) 69.00 (1) 59.80 (4) noaa 12.34 (3) 15.82 (1) 14.140 (2) 5.18 (4) elec2 63.82 (1) 54.69 (2) 42.18 (3) 20.91 (4) spam 78.93 (1) 77.94 (2) 74.60 (3) 57.10 (4) sea 80.07 (2) 79.32 (3) 81.52 (1) 77.37 (4) air 21.40 (1) 18.43 (3) 19.27 (2) 7.48 (4) 1.6667 2.3333 2 4 IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Evaluation Time 0 50 100 150 200 250 Time Stamp
0 20 40 60 80 100 Evaluation Time sense avgc nse ftl (a) elec2 0 50 100 150 Time Stamp 0 5 10 15 20 25 30 35 Evaluation Time sense avgc nse ftl (b) noaa 0 10 20 30 40 50 Time Stamp 0 0.5 1 1.5 2 2.5 Evaluation Time sense avgc nse ftl (c) spam 0 50 100 150 200 Time Stamp 0 20 40 60 80 100 120 Evaluation Time sense avgc nse ftl (d) sea 0 100 200 300 400 500 Time Stamp 0 500 1000 1500 2000 2500 3000 3500 4000 Evaluation Time sense avgc nse ftl (e) air 0 100 200 300 400 Time Stamp 0 500 1000 1500 2000 2500 3000 3500 Evaluation Time sense avgc nse ftl (f) poker Figure: Cumulative evaluation time of the benchmark algorithms. The measurements are represented in seconds. IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Conclusions & Future Work Final Talking Points The SML-based learning
algorithm worked well, and provided promising results, yet more diverse data sets will be required to fully benchmark its efﬁcacy. Furthermore, combine SML with a supervised strategy could be quite beneﬁcial. Are we violating some of the assumptions in the original SML, yes! Do the preliminary results for the SML look promising for passive strategies in NSE, yes! IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Conclusions & Future Work Final Talking Points The SML-based learning
algorithm worked well, and provided promising results, yet more diverse data sets will be required to fully benchmark its efﬁcacy. Furthermore, combine SML with a supervised strategy could be quite beneﬁcial. Are we violating some of the assumptions in the original SML, yes! Do the preliminary results for the SML look promising for passive strategies in NSE, yes! Future Work: Data Dependent Regularizers Idea: use labeled and unlabeled data to optimize the weights ✓⇤ = arg min ✓2⇥ Loss( H, ✓, St ) + ⌦ ( H, ✓, Ut+ 1) What does ⌦ look like if we do not have labeled data from ˆ t = t + 1 ? H should generalize on St and St+ 1 Let SML estimate ⌦ and the Loss come from traditional passive NSE approaches. IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

Getting the Code The Matlab code can be found at:
https://github.com/gditzler/SML-NSE IJCNN 2016 A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

A Study of an Incremental Spectral Meta-Learn...

A Study of an Incremental Spectral Meta-Learner for Nonstationary Environments

More Decks by Gregory Ditzler

Other Decks in Research

Featured

Transcript