Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MOA and WEKA - And the case of evolving data streams

KMKLabs
March 09, 2017

MOA and WEKA - And the case of evolving data streams

WEKA dan MOA adalah kakas untuk melakukan data mining. Kakas ini tersedia sebagai library java dan sebagai aplikasi desktop. Pada techtalk kali ini dibahas mengenai MOA, WEKA, perbedaan di antara keduanya, serta sedikit penjelasan mengenai penanganan data stream yang merupakan latar belakang dibuatnya MOA.

KMKLabs

March 09, 2017
Tweet

More Decks by KMKLabs

Other Decks in Programming

Transcript

  1. Track List Side A: Explanation - What and why? -

    Data mining and machine learning using WEKA - Datastream environment and the analysis using MOA - What’s the difference? Side B: Showcase - WEKA and MOA as a library - WEKA GUI - MOA GUI
  2. Why I choose this topic: Introduction - I like clustering

    (unsupervised learning) - It has a problem with streaming data - Not really a problem, but - Needed extra care - Learned some stream clustering algorithm (BIRCH, StreamLS, StreamKM++) - About to implement the algorithm, suddenly remembered WEKA - A little bit of reading then found MOA - More at http://www.cs.waikato.ac.nz/ml/weka/ and http://moa.cms.waikato.ac.nz/
  3. WEKA “Found only on the islands of New Zealand, the

    Weka is a flightless bird with an inquisitive nature.” “Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.”
  4. Tasks in WEKA - Preprocess - Classify - Cluster -

    Associate - Select Attributes - Visualize - Experiment - Simple CLI
  5. MOA (Bifet et al, 2010) “Massive Online Analysis (MOA) is

    a software environment for implementing algorithms and running experiments for online learning from evolving data streams.” “It contains collection of offline and online for both classification and clustering as well as tools for evaluation.”
  6. Tasks in MOA - Data Generators - Classifiers - Stream

    Clustering - Outlier Detection - Recommender Systems
  7. What is the difference? - WEKA is designed for learning

    with fixed dataset - More suitable for experiment around a dataset - MOA is designed for handling stream dataset - A dataset will be seen as stream dataset - It is not very good handling small dataset - The techniques and algorithms are different
  8. Evolving Data Streams - In the real world data -

    More and more data is organized as data streams - Data comes from complex environment, and it is evolving over time - Concept drift = underlying distribution of data is changing - Attention to handling concept drift is increasing, since predictions become less accurate over time - To prevent that, learning models need to adapt to changes quickly and accurately
  9. Adapting Learning Strategies Detectors Forgetting Contextual Dynamic ensemble Single Classifier

    Ensemble Triggering Evolving Forget old data and retrain it at a fixed rate
  10. Adapting Learning Strategies Detectors Forgetting Contextual Dynamic ensemble Single Classifier

    Ensemble Triggering Evolving Build many models, dynamically combine
  11. Adapting Learning Strategies Detectors Forgetting Contextual Dynamic ensemble Single Classifier

    Ensemble Triggering Evolving Build many models, switch models according to the observed incoming data
  12. Notes - WEKA is more suitable for exploring a dataset

    (the GUI will help a lot and the library is easy to use) - But it is slow when handling a lot of data - Needs some kind of tweaking when used against big data - WEKA can be used in Pentaho - MOA, in the other hand, is designed specifically to use large, streaming data - It should have no problem doing it - The data generator is good for demonstration - MOA is used in Apache SAMOA and ADAMS - Albert Bifet contributed to all of these softwares development
  13. Applying Naive-Bayes Classifier from Weka try { DataSource source =

    new DataSource("D:\\weka_data\\waveform-5000.arff"); Instances data = source.getDataSet(); if(data.classIndex() == -1) data.setClassIndex(data.numAttributes()-1); NaiveBayes nb = new NaiveBayes(); nb.buildClassifier(data); Evaluation eval = new Evaluation(data); eval.crossValidateModel(nb, data, 10, new Random(1)); } catch(Exception e) { System.out.println("Exception " + e.getMessage()); }