MOA and WEKA - And the case of evolving data streams

MOA and WEKA And the case of evolving data streams
Luqman Abdul Mushawwir

Track List Side A: Explanation - What and why? -
Data mining and machine learning using WEKA - Datastream environment and the analysis using MOA - What’s the difference? Side B: Showcase - WEKA and MOA as a library - WEKA GUI - MOA GUI

Side A: Explanation

Why I choose this topic: Introduction - I like clustering
(unsupervised learning) - It has a problem with streaming data - Not really a problem, but - Needed extra care - Learned some stream clustering algorithm (BIRCH, StreamLS, StreamKM++) - About to implement the algorithm, suddenly remembered WEKA - A little bit of reading then found MOA - More at http://www.cs.waikato.ac.nz/ml/weka/ and http://moa.cms.waikato.ac.nz/

WEKA “Found only on the islands of New Zealand, the
Weka is a flightless bird with an inquisitive nature.” “Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.”

Tasks in WEKA - Preprocess - Classify - Cluster -
Associate - Select Attributes - Visualize - Experiment - Simple CLI

MOA (Bifet et al, 2010) “Massive Online Analysis (MOA) is
a software environment for implementing algorithms and running experiments for online learning from evolving data streams.” “It contains collection of offline and online for both classification and clustering as well as tools for evaluation.”

Tasks in MOA - Data Generators - Classifiers - Stream
Clustering - Outlier Detection - Recommender Systems

What is the difference? - WEKA is designed for learning
with fixed dataset - More suitable for experiment around a dataset - MOA is designed for handling stream dataset - A dataset will be seen as stream dataset - It is not very good handling small dataset - The techniques and algorithms are different

Evolving Data Streams - In the real world data -
More and more data is organized as data streams - Data comes from complex environment, and it is evolving over time - Concept drift = underlying distribution of data is changing - Attention to handling concept drift is increasing, since predictions become less accurate over time - To prevent that, learning models need to adapt to changes quickly and accurately

Evolving Data Streams

Adapting Learning Strategies Detectors Forgetting Contextual Dynamic ensemble Single Classifier
Ensemble Triggering Evolving

Ensemble Triggering Evolving Forget old data and retrain it at a fixed rate

Adapting Learning Strategies: Forgetting

Ensemble Triggering Evolving Detect a change and cut

Adapting Learning Strategies: Detectors

Ensemble Triggering Evolving Build many models, dynamically combine

Adapting Learning Strategies: Dynamic Ensemble

Ensemble Triggering Evolving Build many models, switch models according to the observed incoming data

Adapting Learning Strategies: Contextual

Side B: Showcase

Notes - WEKA is more suitable for exploring a dataset
(the GUI will help a lot and the library is easy to use) - But it is slow when handling a lot of data - Needs some kind of tweaking when used against big data - WEKA can be used in Pentaho - MOA, in the other hand, is designed specifically to use large, streaming data - It should have no problem doing it - The data generator is good for demonstration - MOA is used in Apache SAMOA and ADAMS - Albert Bifet contributed to all of these softwares development

The ARFF File Format

MOA and WEKA as a library

Add to Maven <dependency> <groupId>nz.ac.waikato.cms.weka</groupId> <artifactId>weka-stable</artifactId> <version>3.8.1</version> </dependency> <dependency> <groupId>nz.ac.waikato.cms.moa</groupId>
<artifactId>moa</artifactId> <version>2016.04</version> </dependency>

Applying Naive-Bayes Classifier from Weka try { DataSource source =
new DataSource("D:\\weka_data\\waveform-5000.arff"); Instances data = source.getDataSet(); if(data.classIndex() == -1) data.setClassIndex(data.numAttributes()-1); NaiveBayes nb = new NaiveBayes(); nb.buildClassifier(data); Evaluation eval = new Evaluation(data); eval.crossValidateModel(nb, data, 10, new Random(1)); } catch(Exception e) { System.out.println("Exception " + e.getMessage()); }

MOA GUI

WEKA GUI

MOA and WEKA - And the case of evolving data st...

MOA and WEKA - And the case of evolving data streams

KMKLabs

More Decks by KMKLabs

Other Decks in Programming

Featured

Transcript