MOA and WEKA - And the case of evolving data streams

Fb3d3dce19ce52e4323f13ffb3930a21?s=47 KMKLabs
March 09, 2017

MOA and WEKA - And the case of evolving data streams

WEKA dan MOA adalah kakas untuk melakukan data mining. Kakas ini tersedia sebagai library java dan sebagai aplikasi desktop. Pada techtalk kali ini dibahas mengenai MOA, WEKA, perbedaan di antara keduanya, serta sedikit penjelasan mengenai penanganan data stream yang merupakan latar belakang dibuatnya MOA.

Fb3d3dce19ce52e4323f13ffb3930a21?s=128

KMKLabs

March 09, 2017
Tweet

Transcript

  1. MOA and WEKA And the case of evolving data streams

    Luqman Abdul Mushawwir
  2. Track List Side A: Explanation - What and why? -

    Data mining and machine learning using WEKA - Datastream environment and the analysis using MOA - What’s the difference? Side B: Showcase - WEKA and MOA as a library - WEKA GUI - MOA GUI
  3. Side A: Explanation

  4. Why I choose this topic: Introduction - I like clustering

    (unsupervised learning) - It has a problem with streaming data - Not really a problem, but - Needed extra care - Learned some stream clustering algorithm (BIRCH, StreamLS, StreamKM++) - About to implement the algorithm, suddenly remembered WEKA - A little bit of reading then found MOA - More at http://www.cs.waikato.ac.nz/ml/weka/ and http://moa.cms.waikato.ac.nz/
  5. WEKA “Found only on the islands of New Zealand, the

    Weka is a flightless bird with an inquisitive nature.” “Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.”
  6. Tasks in WEKA - Preprocess - Classify - Cluster -

    Associate - Select Attributes - Visualize - Experiment - Simple CLI
  7. MOA (Bifet et al, 2010) “Massive Online Analysis (MOA) is

    a software environment for implementing algorithms and running experiments for online learning from evolving data streams.” “It contains collection of offline and online for both classification and clustering as well as tools for evaluation.”
  8. Tasks in MOA - Data Generators - Classifiers - Stream

    Clustering - Outlier Detection - Recommender Systems
  9. What is the difference? - WEKA is designed for learning

    with fixed dataset - More suitable for experiment around a dataset - MOA is designed for handling stream dataset - A dataset will be seen as stream dataset - It is not very good handling small dataset - The techniques and algorithms are different
  10. Evolving Data Streams - In the real world data -

    More and more data is organized as data streams - Data comes from complex environment, and it is evolving over time - Concept drift = underlying distribution of data is changing - Attention to handling concept drift is increasing, since predictions become less accurate over time - To prevent that, learning models need to adapt to changes quickly and accurately
  11. Evolving Data Streams

  12. Evolving Data Streams

  13. Adapting Learning Strategies Detectors Forgetting Contextual Dynamic ensemble Single Classifier

    Ensemble Triggering Evolving
  14. Adapting Learning Strategies Detectors Forgetting Contextual Dynamic ensemble Single Classifier

    Ensemble Triggering Evolving Forget old data and retrain it at a fixed rate
  15. Adapting Learning Strategies: Forgetting

  16. Adapting Learning Strategies Detectors Forgetting Contextual Dynamic ensemble Single Classifier

    Ensemble Triggering Evolving Detect a change and cut
  17. Adapting Learning Strategies: Detectors

  18. Adapting Learning Strategies Detectors Forgetting Contextual Dynamic ensemble Single Classifier

    Ensemble Triggering Evolving Build many models, dynamically combine
  19. Adapting Learning Strategies: Dynamic Ensemble

  20. Adapting Learning Strategies Detectors Forgetting Contextual Dynamic ensemble Single Classifier

    Ensemble Triggering Evolving Build many models, switch models according to the observed incoming data
  21. Adapting Learning Strategies: Contextual

  22. Side B: Showcase

  23. Notes - WEKA is more suitable for exploring a dataset

    (the GUI will help a lot and the library is easy to use) - But it is slow when handling a lot of data - Needs some kind of tweaking when used against big data - WEKA can be used in Pentaho - MOA, in the other hand, is designed specifically to use large, streaming data - It should have no problem doing it - The data generator is good for demonstration - MOA is used in Apache SAMOA and ADAMS - Albert Bifet contributed to all of these softwares development
  24. The ARFF File Format

  25. MOA and WEKA as a library

  26. Add to Maven <dependency> <groupId>nz.ac.waikato.cms.weka</groupId> <artifactId>weka-stable</artifactId> <version>3.8.1</version> </dependency> <dependency> <groupId>nz.ac.waikato.cms.moa</groupId>

    <artifactId>moa</artifactId> <version>2016.04</version> </dependency>
  27. Applying Naive-Bayes Classifier from Weka try { DataSource source =

    new DataSource("D:\\weka_data\\waveform-5000.arff"); Instances data = source.getDataSet(); if(data.classIndex() == -1) data.setClassIndex(data.numAttributes()-1); NaiveBayes nb = new NaiveBayes(); nb.buildClassifier(data); Evaluation eval = new Evaluation(data); eval.crossValidateModel(nb, data, 10, new Random(1)); } catch(Exception e) { System.out.println("Exception " + e.getMessage()); }
  28. MOA GUI

  29. WEKA GUI

  30. None