Big Data Intelligence & Analytics Part 1: Massive Online Analysis

Big Data Intelligence & Analytics Session 3 MOA: Massive Online
Analysis Mário Cordeiro | Porto 2019 [email protected]

Big Data Intelligence & Analytics Who am I • Hometown:
• Miranda do Douro, Bragança, Portugal

Big Data Intelligence & Analytics Who am I • Education:
• 2000: Degree in Electrical and Computer Engineering, FEUP • Digital Television and Digital Broadcast • 2008: Master in Sciences in Informatics Engineering, FEUP • Information Retrieval • Since 2011: PhD student in Doctoral Program in Informatics Engineering, FEUP • Event detection on social network data streams • Dynamic Complex Networks • Supervision: João Gama, LIAAD

Big Data Intelligence & Analytics Work and research interests •
Professional Work • Senior Engineer at Critical Manufacturing (CM was acquired by ASM Pacific Technology) | 5 Source: https://www.forbes.com/sites/bernardmarr/2018/09/02/what-is-industry-4-0-heres-a-super-easy-explanation-for-anyone/

Big Data Intelligence & Analytics Work and research interests Real-time
enterprise-wide visualization and monitoring is crucial for high tech industries: | 6 Critical Manufacturing FabLive: https://www.criticalmanufacturing.com/en/critical-manufacturing-mes/fablive

Lecturer: • Invited assistant at ISEP-DEI, since 2010 • Computer Networks (RCOMP), Data Structures (ESINF), Algorithmic and Programming (APROG) • Former invited assistant at FEUP-DEI, 2012-2016 • Computing Theory (TCOM), Computers Laboratory (LCOM) | 7

Selected papers: • Cordeiro, M., Sarmento, R. P., Brazdil, P., Kimura, M., Gama, J., Identifying, Ranking and Tracking Community Leaders in Evolving Social Networks. International Conference on Complex Networks and their Applications (COMPLEX NETWORKS), 2019 • Cordeiro, M., Sarmento, R. P., Brazdil, P., Gama, J., Evolving Networks and Social Network Analysis: Methods and Techniques. Journalism and Social Media - New Trends and Cultural Implications (IntechOpen Book Chapter), 2018 • Sarmento, R. P., Cordeiro, M., Brazdil, P., Gama, J., Efficient Incremental Laplace Centrality Algorithm for Dynamic Networks. International Conference on Complex Networks and their Applications (COMPLEX NETWORKS), 2017 • Cordeiro, M., Sarmento, R. P., Gama, J., Evolving Networks Dynamic Community Detection using Locality Modularity Optimization. Social Network Analysis and Mining (SNAM), 2016 • M. Cordeiro, Twitter event detection: combining wavelet analysis and topic inference summarization, DSIE’12, the Doctoral Symposium on Informatics Engineering, 2012 | 8

Big Data Intelligence & Analytics Work and research interests Identifying,
Ranking and Tracking Community Leaders in Evolving Social Networks: | 9 Bali Bombing 2002/Jemaah Islamiyah: classical vs hierarchical community detection

Ranking and Tracking Community Leaders in Evolving Social Networks: | 10 Bali Bombing 2002/Jemaah Islamiyah: classical vs hierarchical community detection Samudra was the attack strategy, Idris was the logistics commander and Imron was the team's gofer. Size of nodes reflect centralities: [('IMRON', 1.1584839492851362), ('SAMUDRA', 0.3603221704111912), ('IDRIS', 0.3062931134147751)]

Ranking and Tracking Community Leaders in Evolving Social Networks: | 11 Temporal collaboration network of Jure Leskovec and Andrews Ng Temporal Zachary karate club

Road Map Session 3 & 4 | 12

Big Data Intelligence & Analytics Road Map Session 1: •
Big data science • Issues with (small or big) data quality (examples in healthcare data) • Streaming data sources (examples in energy providers data) • Approximate vs exact computations (practical examples) Session 2: • From streaming to ubiquitous data sources • Distributed streaming versions of state-of-the-art data mining algorithms • Real-world application examples of such algorithms Session 3: • MOA: Massive Online Analysis Session 4: • SAMOA: Scalable Advanced Massive Online Analysis | 13

Big Data Intelligence & Analytics The journey Data Data Streams
Big Data Big Data Stream | 14 Single Node Multi Node Real-time Analytics Batch Analytics

Big Data Intelligence & Analytics The journey Data Data Streams
Big Data Big Data Stream | 15 Single Node Multi Node Real-time Analytics Batch Analytics

Big Data Intelligence & Analytics About this presentation • Adapted
from 2018 Albert Bifet Big Data Intelligence & Analytics slides | 16

Data Science | 17

Big Data Intelligence & Analytics Data Science Data Science is
an interdisciplinary field focused on extracting knowledge or insights from large volumes of data. | 18 Data Science In 5 Minutes: https://www.youtube.com/watch?v=X3paOmcrTjQ

Big Data Intelligence & Analytics Data Scientist | 19 Figures:
http://www.marketingdistillery.com/2014/11/29/is-data-science-a-buzzword-modern-data-scientist-defined/

Big Data Intelligence & Analytics Data Science Drew Convay’s Venn
diagram | 20 The Data Science Venn Diagram: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Big Data Intelligence & Analytics Data Science | 21 Figure:
https://blog.fastforwardlabs.com/2017/01/10/five-trends-we-expect-to-come-to-fruition-in-2017.html

Big Data Intelligence & Analytics “Many of the ideas of
deep learning (neural networks) have been around for decades. Why are these ideas taking off now? Two of the biggest drivers of recent progress have been: • Data availability. People are now spending more time on digital devices (laptops, mobile devices). Their digital activities generate huge amounts of data that we can feed to our learning algorithms. • Computational scale. We started just a few years ago to be able to train neural networks that are big enough to take advantage of the huge datasets we now have.” Scale drives machine learning progress | 22 Source: https://www.mlyearning.org/

Big Data Intelligence & Analytics Scale drives machine learning progress
| 23 Source: https://www.mlyearning.org/

Big Data Intelligence & Analytics • Why Deep Learning? •
Artificial Neural Networks (ANN) again? • 50’s and 60’s • Today’s Quality of Data + Quantity of Data + Computation Power • Better training + faster training • Allow us to build complex models Scale drives machine learning progress | 24 Mário Cordeiro, Dive deep into machine learning with TensorFlow, 2018: https://speakerdeck.com/mmfcordeiro/dive-deep-into-machine-learning-with-tensorflow-at-checkmarx-braga-2018 + + Penn Treebank WebTreebank

Big Data Intelligence & Analytics • Why Deep Learning? •
Artificial Neural Networks (ANN) again? • 50’s and 60’s • Today’s Quality of Data + Quantity of Data + Computation Power • Better training + faster training • Allow us to build complex models • Inception-v3: Scale drives machine learning progress | 25 “trained from scratch on a desktop with 8 NVIDIA Tesla K40s in about 2 weeks” https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html 150,000 photographs 1000 object categories

Data Science add value to the organisation | 26

Big Data Intelligence & Analytics Data Science Process Tips (Kirill
Eremenko) | 27 Source: https://confidentdataskills.com

Big Data Intelligence & Analytics The Data Science Process (Joe
Blitzstein) | 28 Harvard CS109 Data Science: http://cs109.github.io/2015/

Big Data Intelligence & Analytics Challenges | 29 Hal Varian
on how the Web challenges managers: https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/hal-varian-on-how-the-web-challenges-managers “The ability to take data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it – that’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data.” – Hal Varian (Google), January 2009

Big Data Intelligence & Analytics How the Web challenges managers
| 30 Artificial Intelligence, Economics, and Industrial Organization: https://www.nber.org/chapters/c14017.pdf “In my experience, the problem is not lack of resources, but is lack of skills. A company that has data but no one to analyze it is in a poor position to take advantage of that data. If there is no existing expertise internally, it is hard to make intelligent choices about what skills are needed and how to find and hire people with those skills. Hiring good people has always been a critical issue for competitive advantage. But since the widespread availability of data is comparatively recent, this problem is particularly acute.” – Hal Varian (Google), June 2018

Data Science success formula | 31

Big Data Intelligence & Analytics Data Science Success Formula Passion
• doing what you love Brand • service Networking • finding a mentor + meetups and conferences | 32 How to Network and Build a Personal Brand in Data Science: https://www.kdnuggets.com/2016/05/how-network-build-personal-brand-data-science.html

MOA: Massive Online Analytics A short course in Data Stream
Mining | 33

Big Data Intelligence & Analytics Real time analytics | 34
Richard Gerver book cover: https://www.richardgerver.com/change

“To be successful, you have to be able to adapt to change” – Sir Alex Ferguson

Chaplin Modern Times Factory Scene: https://www.youtube.com/watch?v=ANXGJe6i3G8

Data Streams introduction | 37

Big Data Intelligence & Analytics Data Streams Books | 38
Machine Learning for Data Streams with Practical Examples in MOA By Albert Bifet, Ricard Gavaldà, Geoff Holmes and Bernhard Pfahringer Online: https://moa.cms.waikato.ac.nz/book/ Mining of Massive Datasets, 2nd Edition By Jure Leskovec, Anand Rajaraman, Jeff Ullman Online: http://www.mmds.org/#ver21v Knowledge Discovery from Data Streams By João Gama Online: http://www.liaad.up.pt/area/jgama/DataStre amsCRC.pdf

Big Data Intelligence & Analytics Data Streams • Characteristics of
Data Streams • Sequence is potentially infinite • High amount of data: sublinear space • High speed of arrival: sublinear time per example • Once an element from a data stream has been processed it is discarded or archived | 39

Big Data Intelligence & Analytics Data Streams • Example: Finding
Missing Numbers: • Let be a permutation of 1, … , . • Let be with one element missing. • arrives in increasing order. • Task: Determine the missing number | 40 S. Muthukrishnan, Data Streams: Algorithms and Applications: http://ce.sharif.edu/courses/90-91/1/ce797-1/resources/root/Data_Streams_-_Algorithms_and_Applications.pdf 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

Missing Numbers: • Let be a permutation of 1, … , . • Let be with one element missing. • arrives in increasing order. • Task: Determine the missing number • Using a n-bit array to memorize all numbers: O(n) space | 41 S. Muthukrishnan, Data Streams: Algorithms and Applications: http://ce.sharif.edu/courses/90-91/1/ce797-1/resources/root/Data_Streams_-_Algorithms_and_Applications.pdf 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9

Missing Numbers: • Let be a permutation of 1, … , . • Let be with one element missing. • arrives in increasing order. • Task: Determine the missing number • Using Data Streams: O(log(n)) space | 51 S. Muthukrishnan, Data Streams: Algorithms and Applications: http://ce.sharif.edu/courses/90-91/1/ce797-1/resources/root/Data_Streams_-_Algorithms_and_Applications.pdf 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

Missing Numbers: • Let be a permutation of 1, … , . • Let be with one element missing. • arrives in increasing order. • Task: Determine the missing number • Using Data Streams: O(log(n)) space | 52 S. Muthukrishnan, Data Streams: Algorithms and Applications: http://ce.sharif.edu/courses/90-91/1/ce797-1/resources/root/Data_Streams_-_Algorithms_and_Applications.pdf

Missing Numbers: • Let be a permutation of 1, … , . • Let be with one element missing. • arrives in increasing order. • Task: Determine the missing number • Using Data Streams: O(log(n)) space • Uses O(log (n)) bits to bits to store the partial sum. the partial sum • Performs one sum operation (+) each time. Takes O(log (n)) time per number. • At the end, computes the missing number with one subtraction. Takes O(log (n)) time for final computation | 55 S. Muthukrishnan, Data Streams: Algorithms and Applications: http://ce.sharif.edu/courses/90-91/1/ce797-1/resources/root/Data_Streams_-_Algorithms_and_Applications.pdf

Missing Numbers: • Let be a permutation of 1, … , . • Let be with two elements missing. • arrives in increasing order. • Task: Determine two missing number • Using Data Streams: | 56 S. Muthukrishnan, Data Streams: Algorithms and Applications: http://ce.sharif.edu/courses/90-91/1/ce797-1/resources/root/Data_Streams_-_Algorithms_and_Applications.pdf

Big Data Intelligence & Analytics Data Streams • Characteristics of
Data Streams • Sequence is potentially infinite • High amount of data: sublinear space • High speed of arrival: sublinear time per example • Once an element from a data stream has been processed it is discarded or archived • Tools: • approximation • randomization, sampling • sketching | 59

Big Data Intelligence & Analytics Data Streams: approximation algorithms •
Small error rate with high probability • An algorithm , -approximate if it outputs for which | 60 M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. 2002: http://www-cs-students.stanford.edu/~datar/papers/sicomp_streams.pdf For example, to say that the output of an algorithm is an absolute (0.1, 0.01)-approximation of a desired function , we require that, for every , ( ) is within ± 0.1 of ( ) at least 99% of the times we run the algorithm. The value 1 is usually called confidence.

Small error rate with high probability • An algorithm , -approximate if it outputs for which • Sliding Window: • We can maintain simple statistics over sliding windows, using O( ) space, where is the length of the sliding window and is the accuracy parameter. | 61 M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. 2002: http://www-cs-students.stanford.edu/~datar/papers/sicomp_streams.pdf 1 0 1 1 0 0 0 1 1 1 1 0 1 0 1 0 1 1 1 0 1 0

Big Data Intelligence & Analytics Data Streams: requirements for a
stream mining The most significant requirements for a stream mining algorithm are the same for predictors, clusterers, and frequent pattern miners: Requirement 1: Process an instance at a time, and inspect it (at most) once. Requirement 2: Use a limited amount of time to process each instance. Requirement 3: Use a limited amount of memory. Requirement 4: Be ready to give an answer (prediction, clustering, patterns) at any time. Requirement 5: Adapt to temporal changes. | 67 Machine Learning for Data Streams with Practical Examples in MOA Bifet et al.: https://www.cms.waikato.ac.nz/~abifet/book/chapter_2.html

What is MOA? Massive Online Analysis | 68

Big Data Intelligence & Analytics What is MOA? • Massive
Online Analysis is a framework for online learning from data streams. • It is closely related to WEKA • It includes a collection of offline and online as well as tools for evaluation: • classification • clustering • Easy to extend • Easy to design and run experiments | 69 Albert Bifet, et all (2010); MOA: Massive Online Analysis; Journal of Machine Learning Research 11: 1601-1604: http://www.jmlr.org/papers/volume11/bifet10a/bifet10a.pdf

Big Data Intelligence & Analytics What is MOA? • History
– timeline | 70 WEKA : project starts (Ian Witten) WEKA 3 (100% Java) released MOA: Concept Drift MOA Outliers, and Recommender System MOA Graph Mining, Multi-label classification, Twitter Reader, Active Learning MOA: Clustering MOA: Concept Drift Classification First public release of MOA: Richard Kirkby, Geoff Holmes and Bernhard Pfahringer 1993 Mid 1999 Nov 2007 2009 2010 2011 2014 2014

Big Data Intelligence & Analytics WEKA • Waikato Environment for
Knowledge Analysis • Collection of state-of-the-art machine learning algorithms and data processing tools implemented in Java • Released under the GPL • Support for the whole process of experimental data mining • Preparation of input data • Statistical evaluation of learning schemes • Visualization of input data and the result of learning • Used for education, research and applications • Complements “Data Mining” by Witten & Frank & Hall | 71 Data Mining: Practical Machine Learning Tools and Techniques: https://www.cs.waikato.ac.nz/~ml/weka/book.html

Big Data Intelligence & Analytics WEKA WEKA the bird |
72

Big Data Intelligence & Analytics WEKA MOA the bird |
73 The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct.

Big Data Intelligence & Analytics MOA User Interface • WEKA
compared to MOA • Classification Experimental Setting | 74 Weka GUI MOA GUI

Big Data Intelligence & Analytics MOA User Interface • WEKA
compared to MOA • Classification Experimental Setting | 75 Weka GUI MOA GUI evaluation outputs one value evaluation output a serie of values Advanced Data Mining with Weka (2.3: The MOA interface): https://www.youtube.com/watch?v=84UtlBzdasU

Big Data Intelligence & Analytics WEKA WEKA compared to MOA
| 76

Big Data Intelligence & Analytics MOA • Evaluation procedures for
Data Streams • Holdout • Interleaved Test-Then-Train or Prequential • Data sources: • Random Tree Generator • Random RBF Generator • LED Generator • Waveform Generator • Hyperplane • SEA Generator • STAGGER Generator | 77 Requirement 1: Process an instance at a time, and inspect it (at most) once. Requirement 2: Use a limited amount of time to process each instance. Requirement 3: Use a limited amount of memory. Requirement 4: Be ready to give an answer (prediction, clustering, patterns) at any time. Requirement 5: Adapt to temporal changes.

Big Data Intelligence & Analytics MOA • Classifiers • Naive
Bayes • Decision stumps • Hoeffding Tree • Hoeffding Option Tree • Bagging and Boosting • ADWIN Bagging and Leveraging • Bagging | 78 Requirement 1: Process an instance at a time, and inspect it (at most) once. Requirement 2: Use a limited amount of time to process each instance. Requirement 3: Use a limited amount of memory. Requirement 4: Be ready to give an answer (prediction, clustering, patterns) at any time. Requirement 5: Adapt to temporal changes.

Big Data Intelligence & Analytics RAM-Hours • RAM-Hour • Every
GB of RAM deployed for 1 hour • Cloud Computing Rental Cost Options | 79

Big Data Intelligence & Analytics MOA • Clustering Experimental Setting
| 80

| 81 Internal measures External measures Gamma C Index Point-Biserial Log Likelihood Dunn’s Index Tau Tau A Tau C Somer’s Gamma Ratio of Repetition Modified Ratio of Repetition Adjusted Ratio of Clustering Fagan’s Index Deviation Index Z-Score Index D Index Silhouette coefficient Rand statistic Jaccard coefficient Folkes and Mallow Index Hubert Γ statistics Minkowski score Purity van Dongen criterion V-measure Completeness Homogeneity Variation of information Mutual information Class-based entropy Cluster-based entropy Precision Recall F-measure

• StreamKM++ • CluStream • ClusTree • Den-Stream • D-Stream • CobWeb | 82

Big Data Intelligence & Analytics MOA http://www.moa.cms.waikato.ac.nz | 83

Big Data Intelligence & Analytics MOA • Easy Design of
a MOA classifier • void resetLearningImpl () • void trainOnInstanceImpl (Instance inst) • double[] getVotesForInstance (Instance i) • Easy Design of a MOA clusterer • void resetLearningImpl () • void trainOnInstanceImpl (Instance inst) • Clustering getClusteringResult() • Extensions of MOA • Multi-label Classification • Active Learning • Regression • Closed Frequent Graph Mining • Twitter Sentiment Analysis | 84

MOA: Classification | 85

Big Data Intelligence & Analytics MOA: Classification • Definition •
Given different classes, a classifier algorithm builds a model that predicts for every unlabelled ! instance the class " to which it belongs with accuracy. • Examples: • A spam filter: • Classes: Spam/Not Spam • Twitter Sentiment analysis: analyze tweets with positive or negative feelings • Classes: positive feeling / negative feeling / neutral • Payment fraud detection: • Classes: valid/not valid transaction | 86

Big Data Intelligence & Analytics MOA: Classification • Data stream
classification cycle 1. Process an example at a time, and inspect it only once (at most) 2. Use a limited amount of memory 3. Work in a limited amount of time 4. Be ready to predict at any point | 87

Big Data Intelligence & Analytics MOA: Classification • Data set
that describes e-mail features for deciding if it is spam. • Assume we have to classify the following new instance: | 88 Contains “Money” Domain Type Has attachment Time Received Spam yes com yes night yes yes edu no night yes no com yes night yes no edu no day no no com no day No yes cat no day yes Contains “Money” Domain Type Has attachment Time Received Spam yes edu yes day ?

Big Data Intelligence & Analytics MOA: Classification • Bayes Classifiers
• Naïve Bayes • Based on Bayes Theorem: # $ % # $|# $ ' ()* % ' , -* - $ *. $* #* • Estimates the probability of observing attribute a and the prior probability # • Probability of class # given an instance $: # $ % / 0 ∏ / 2|0 3∈5 / 6 | 89

Big Data Intelligence & Analytics MOA: Classification • Is the
new instance spam: 7 8 9: *;! ()< #* % 7 8 9: . 7 : *>: >*(|8 9: . 7 @ A< : *$B|8 9: . 7 9))<#ℎA* ): >*(|8 9: . 7 D A*: $<>|8 9: . | 90 Contains “Money” Domain Type Has attachment Time Received Spam yes edu yes day ?

Big Data Intelligence & Analytics MOA: Classification • Bayes Classifiers
• Multinomial Naïve Bayes • Considers a document as a bag-of-words • Estimates the probability of observing word ; and the prior probability # • Probability of class # given an instance $: # $ % / 0 ∏ / E|0 FG5 G∈5 / 6 | 91

Big Data Intelligence & Analytics MOA: Classification • Perceptron B)'B)
% ℎE ⃗ • Data stream: ⃗ , > • Classical perceptron: ℎE ⃗ % sgn ;L ⃗ • Minimize Mean-square error: M ; % ∑ > ℎE ⃗ | 92

Big Data Intelligence & Analytics MOA: Classification • Perceptron B)'B)
% ℎE ⃗ • We use sigmoid function: ℎE % O ;L ⃗ where: O % 1 1 P * Q O′ % O 1 O | 93

Big Data Intelligence & Analytics MOA: Classification • Perceptron •
Minimize Mean-square error: M ; % ∑ > ℎE ⃗ • Stochastic Gradient Descent: ; % ; η∇J ⃗ • Gradient of the error function: ∇J % T > ℎE ⃗ ∇ℎE ⃗ ∇ℎE ⃗ % ℎE ⃗ 1 ℎE ⃗ • Weight update rule: ; % ; P η T > ℎE ⃗ ℎE ⃗ 1 ℎE ⃗ ⃗ | 94

Big Data Intelligence & Analytics MOA: Classification • Perceptron PERCEPTRONLEARNING(Stream,
η) 1. for each class 2. do PERCEPTRONLEARNING(Stream, class, η) PERCEPTRONLEARNING(Stream, class, η) 1. ⪧ Let ;V and ; be randomly initialized 2. for each example ⃗, > in Stream: 3. do if # <(( % > 4. then % 1 ℎE ⃗ . ℎE ⃗ . 1 ℎE ⃗ 5. else % 0 ℎE ⃗ . ℎE ⃗ . 1 ℎE ⃗ 6. ; % ; P η. . ⃗ PERCEPTRONPREDICTION( ⃗) 1. return arg A< 0Z2[[ ℎE\]3^^ ⃗ | 97

Big Data Intelligence & Analytics MOA: Classification • Data set
that describes e-mail features for deciding if it is spam. • Assume we have to classify the following new instance: | 98 Contains “Money” Domain Type Has attachment Time Received Spam yes com yes night yes yes edu no night yes no com yes night yes no edu no day no no com no day No yes cat no day yes Contains “Money” Domain Type Has attachment Time Received Spam yes edu yes day ?

Big Data Intelligence & Analytics MOA: Classification • Assume we
have to classify the following new instance: | 99 Contains “Money” Domain Type Has attachment Time Received Spam yes edu yes day ? Contains “Money” Time SPAM: YES SPAM: NO SPAM: YES Night Day Yes No

Big Data Intelligence & Analytics MOA: Classification • Decision Trees
• Basic induction strategy: • A ⟵ the “best” decision attribute for next node • Assign A as decision attribute for node • For each value of A, create new descendant of node • Sort training examples to leaf nodes • If training examples perfectly classified, Then STOP, Else iterate over new leaf nodes | 100 Contains “Money” Time SPAM: YES SPAM: NO SPAM: YES Night Day Yes No Contains “Money” YES YES No Contains “Money” NO Day Night

Big Data Intelligence & Analytics MOA: Classification • Hoeffding Tree:
VFDT • Pedro Domingos and Geoff Hulten. Mining high-speed data streams. 2000 • With high probability, constructs an identical model that a traditional (greedy) method would learn • With theoretical guarantees on the error rate • Hoeffding Bound Inequality: Probability of deviation of its expected value. | 101 Pedro Domingos and Geoff Hulten. Mining high-speed data streams. 2000: https://homes.cs.washington.edu/~pedrod/papers/kdd00.pdf

Big Data Intelligence & Analytics MOA: Classification • Hoeffding Bound
Inequality: • Let ` % ∑ ` , where ` ,…, `a areindependent and identically distributed in [0,1]. Then 1. Chernoff For each < 1 ` 1 P bc`d e exp 3 bc`d 2. Hoeffding For each t > 0 ` > b ` P ) e exp 2) 3. Bernstein Let O % ∑ O the variance of X. if ` b ` e s for each ∈ c d then for each ) > 0 ` > b ` P ) e exp ) 2O P 2 3 s) | 102

Big Data Intelligence & Analytics MOA: Classification • Hoeffding Tree:
VFDT HT(Stream, ) 1. ⪧ Let HT be a tree with a single leaf(root) 2. ⪧ Init counts t at root 3. for each example , > in Stream: 4. do HTGROW( , > , HT, ) HTGROW( , > , HT, ) 1. ⪧ Sort , > to leaf using HT 2. ⪧ Update counts t at leaf 3. if examples seen so far at are not all of the same class 4. then ⪧ Compute G for each attribute 5. if u s*() <)) . u 2a6 s*() <)) . > vw xyz { a 6. then ⪧ Split leaf on best attribute 7. for each branch 8. do ⪧ Start new leaf and initialized counts | 103 R is the range of the random variable is the desired probability of the estimate not being within | of its expected value

Big Data Intelligence & Analytics MOA: Classification • Hoeffding Tree
Features • With high probability, constructs an identical model that a traditional (greedy) method would learn • Ties: when two attributes have similar G, split if u s*() <)) . u 2a6 s*() <)) . } ln 1 2 < • • Compute G every € a instances • Memory: deactivate least promising nodes with lower 'Z , *Z • 'Z is the probability to reach leaf • *Z is the error in the node | 104

Big Data Intelligence & Analytics MOA: Classification • Hoeffding Tree
real-time visualization: progress of tree learning | 105 Matúš Cimerman, Hoeffding adaptive tree real-time visualization: https://www.youtube.com/watch?v=Fk3tJRkh9Ag Visualization built using MOA and D3.js

Big Data Intelligence & Analytics MOA: Classification • Hoeffding Naive
Bayes Tree: • Majority Class learner at leave • G. Holmes, R. Kirkby, and B. Pfahringer. Stress-testing Hoeffding trees, 2005. • monitors accuracy of a Majority Class learner • monitors accuracy of a Naive Bayes learner • predicts using the most accurate method | 106 G. Holmes, R. Kirkby, and B. Pfahringer. Stress-testing Hoeffding trees, 2005: https://link.springer.com/chapter/10.1007/11564126_50

Big Data Intelligence & Analytics MOA: Ensemble (Bagging) • Bagging
builds a set of M base models, with a bootstrap sample created by drawing random samples with replacement. | 107 Bootstrap aggregating bagging, Udacity course "Machine Learning for Trading": https://www.youtube.com/watch?v=2Mg8QD0F1dQ

Big Data Intelligence & Analytics MOA: Bagging • Example: •
Dataset of 4 Instances : A, B, C, D Classifier 1: B, A, C, B Classifier 2: D, B, A, D Classifier 3: B, A, C, B Classifier 4: B, C, B, B Classifier 5: D, C, A, C | 108

Dataset of 4 Instances : A, B, C, D Classifier 1: A, B, B, C Classifier 2: A, B, D, D Classifier 3: A, B, B, C Classifier 4: B, B, B, C Classifier 5: A, C, C, D | 109 A, B, B, C A, B, B, C A, B, D, D B, B, B, C A, C, C, D

Dataset of 4 Instances : A, B, C, D Classifier 1: A, B, B, C: A(1) B(2) C(1) D(0) Classifier 2: A, B, D, D: A(1) B(1) C(0) D(2) Classifier 3: A, B, B, C: A(1) B(2) C(1) D(0) Classifier 4: B, B, B, C: A(0) B(3) C(1) D(0) Classifier 5: A, C, C, D: A(1) B(0) C(2) D(1) • Each base model’s training set contains each of the original training example K times where P(K = k) follows a binomial distribution. • Poisson (•%1) distribution • For large values of , the binomial distribution tends to a Poisson(1) distribution | 110 Poisson Distribution k is the number of occurrences. λ is the expected number of occurrences P(X = k) is the probability of k occurrences given λ.

Big Data Intelligence & Analytics MOA: Bagging • Oza and
Russell’s Online Bagging for M models 1. Initialize base models ℎ€ for all A ∈ {1,2, ...,M} 2. for all training examples do 3. for A % 1, 2, … , : do 4. Set ; % (( 1 5. Update ℎ€ with current example with weight ; 1. anytime output: 2. return hypothesis: ℎ‚ a % arg max „∈… ∑ ! ℎ† % > L †‡ | 111 Oza et al. Online Bagging and Boosting, 2001: https://ti.arc.nasa.gov/m/profile/oza/files/ozru01a.pdf

Big Data Intelligence & Analytics MOA: Evolving Stream Classification •
Four types of concept drift according to severity and speed of changes, and noisy blips: | 112 Krawczyk et al. Online ensemble learning with abstaining classifiers for drifting and noisy data streams, Applied Soft Computing, 2018: https://www.sciencedirect.com/science/article/pii/S1568494617307238

Big Data Intelligence & Analytics MOA: Evolving Stream Classification Four
types of concept drift according to severity and speed of changes, and noisy blips: • Sudden change occurs when the distribution has remained unchanged for a long time, then changes in a few steps to a significantly different one. It is often called shift . • Gradual or incremental change occurs when, for a long time, the distribution experiences at each time step a tiny, barely noticeable change, but these accumulated changes become significant over time. • Recurrent concepts occur when distributions that have appeared in the past tend to reappear later. An example is seasonality, where summer distributions are similar among themselves and different from winter distributions. A different example is the distortions in city traffic and public transportation due to mass events or accidents, which happen at irregular, unpredictable times. • Change may be global or partial depending on whether it affects all of the item space or just a part of it. In ML terminology, partial change might affect only instances of certain forms, or only some of the instance attributes. | 113 Krawczyk et al. Online ensemble learning with abstaining classifiers for drifting and noisy data streams, Applied Soft Computing, 2018: https://www.sciencedirect.com/science/article/pii/S1568494617307238

Requirements on data stream algorithms that build models (e.g., predictors, clusterers, or pattern miners): 1. Detect change in the stream (and adapt the models, if needed) as soon as possible. 2. At the same time, be robust to noise and outliers. 3. Operate in less than instance arrival time and sublinear memory (ideally, some fixed, preset amount of memory). • Management strategies can be roughly grouped into three families: | 114 adaptive estimators explicit change detectors for model revision model ensembles Dealing with change: https://www.cms.waikato.ac.nz/~abifet/book/chapter_5.html

Optimal Change Detector and Predictor • High accuracy • Low false positives and false negatives ratios • Theoretical guarantees • Fast detection of change • Low computational cost: minimum space and time needed • No parameters needed | 115

Algorithm ADaptive Sliding WINdow The idea behind ADWIN method is simple: whenever two large enough sub-windows of W exhibit distinct enough averages, one can conclude that the corresponding expected values are different, and the older portion of the window is dropped. The meaning of large enough and distinct enough can be made precise again by using the Hoeffding bound. ˆV % ˆ % | 116 Bifet et al., Learning from Time-Changing Data with Adaptive Windowing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2279&rep=rep1&type=pdf 1 0 1 0 1 0 1 1 0 1 1 1 1 1 1 ‰̂‹Œ ‰̂‹z • 0 : Change Detected! ‰̂‹Œ is the (observed) average of the elements in ˆV ‰̂‹z is the (observed) average of the elements in ˆ

Algorithm ADaptive Sliding WINdow Example: ˆ % ˆV % ˆ % • ADWIN: ADAPTIVE WINDOWING ALGORITHM 1. Initialize Window W 2. for each t 0: 3. do ˆ ⟵ ˆ ∪ † ⪧ i.e.: add † to the head of ˆ 4. repeat Drop elements from the tail of ˆ 5. until ‰ •‹Œ ‰ •‹z • 0 holds 6. for every split of ˆ % ˆV . ˆ 7. output ‰ •• | 117 1 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 1 1 1 1 1 Bifet et al., Learning from Time-Changing Data with Adaptive Windowing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2279&rep=rep1&type=pdf ‰̂‹ is the (observed) average of the elements in W ‰‹ the (unknown) average of ‰† for t ∈ W

Algorithm ADaptive Sliding WINdow Example: ˆ % ˆV % ˆ % • ADWIN: ADAPTIVE WINDOWING ALGORITHM 1. Initialize Window W 2. for each t 0: 3. do ˆ ⟵ ˆ ∪ † ⪧ i.e.: add † to the head of ˆ 4. repeat Drop elements from the tail of ˆ 5. until ‰ •‹Œ ‰ •‹z • 0 holds 6. for every split of ˆ % ˆV . ˆ 7. output ‰ •• | 126 1 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 1 1 1 1 1 ‰̂‹Œ ‰̂‹z • 0 : Change Detected! Bifet et al., Learning from Time-Changing Data with Adaptive Windowing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2279&rep=rep1&type=pdf ‰̂‹ is the (observed) average of the elements in W ‰‹ the (unknown) average of ‰† for t ∈ W

Algorithm ADaptive Sliding WINdow Example: ˆ % ˆV % ˆ % • ADWIN: ADAPTIVE WINDOWING ALGORITHM 1. Initialize Window W 2. for each t 0: 3. do ˆ ⟵ ˆ ∪ † ⪧ i.e.: add † to the head of ˆ 4. repeat Drop elements from the tail of ˆ 5. until ‰ •‹Œ ‰ •‹z • 0 holds 6. for every split of ˆ % ˆV . ˆ 7. output ‰ •• | 127 1 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 1 1 1 1 1 Drop elements from the tail of ˆ Bifet et al., Learning from Time-Changing Data with Adaptive Windowing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2279&rep=rep1&type=pdf ‰̂‹ is the (observed) average of the elements in W ‰‹ the (unknown) average of ‰† for t ∈ W

Algorithm ADaptive Sliding WINdow Example: ˆ % ˆV % ˆ % • ADWIN: ADAPTIVE WINDOWING ALGORITHM 1. Initialize Window W 2. for each t 0: 3. do ˆ ⟵ ˆ ∪ † ⪧ i.e.: add † to the head of ˆ 4. repeat Drop elements from the tail of ˆ 5. until ‰ •‹Œ ‰ •‹z • 0 holds 6. for every split of ˆ % ˆV . ˆ 7. output ‰ •• | 128 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 1 1 1 1 1 Drop elements from the tail of ˆ Bifet et al., Learning from Time-Changing Data with Adaptive Windowing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2279&rep=rep1&type=pdf ‰̂‹ is the (observed) average of the elements in W ‰‹ the (unknown) average of ‰† for t ∈ W

Algorithm ADaptive Sliding WINdow Theorem At every time step we have: 1. (False positive rate bound). If ‰† remains constant within ˆ, the probability that ADWIN shrinks the window at this step is at most . 2. (False negative rate bound). Suppose that for some partition of ˆ in two parts ˆV ˆ (where ˆ contains the most recent items) we have ‰‹Œ ‰‹z • 2 0 . Then with probability 1 ADWIN shrinks ˆ to ˆ , or shorter. | 129 Bifet et al., Learning from Time-Changing Data with Adaptive Windowing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2279&rep=rep1&type=pdf

Algorithm ADaptive Sliding WINdow • ADWIN tunes itself to the data stream at hand, with no need for the user to hardwire or precompute parameters. • ADWIN using a Data Stream Sliding Window Model, • can provide the exact counts of 1’s in O(1) time per point. • tries O(log (ˆ)) cutpoints • uses O( log (ˆ)) memory words • the processing time per example is O(log (ˆ)) (amortized and worst-case) • Sliding window model: | 130 1010101 101 11 1 1 Content: 4 2 2 1 1 Capacity 7 3 2 1 1 Bifet et al., Learning from Time-Changing Data with Adaptive Windowing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2279&rep=rep1&type=pdf

Concept-adapting Very Fast Decision Trees: CVFDT G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. 2001 • It keeps its model consistent with a sliding window of examples • Construct “alternative branches” as preparation for changes • If the alternative branch becomes more accurate, switch of tree branches occurs | 131 G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. 2001: https://homes.cs.washington.edu/~pedrod/papers/kdd01b.pdf Contains “Money” Time SPAM: YES SPAM: NO SPAM: YES Night Day Yes No

Concept-adapting Very Fast Decision Trees: CVFDT No theoretical guarantees on the error rate of CVFDT CVFDT parameters : 1. ˆ: is the example window size. 2. DV : number of examples used to check at each node if the splitting attribute is still the best. 3. D : number of examples used to build the alternate tree. 4. D : number of examples used to test the accuracy of the alternate tree. | 132 G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. 2001: https://homes.cs.washington.edu/~pedrod/papers/kdd01b.pdf

Decision Trees: Hoeffding Adaptive Tree • replace frequency statistics counters by estimators • don’t need a window to store examples, due to the fact that we maintain the statistics data needed with estimators • change the way of checking the substitution of alternate subtrees, using a change detector with theoretical guarantees • Advantages over CVFDT: 1. Theoretical guarantees 2. No Parameters | 133

ADWIN Bagging (KDD’09) • ADWIN • An adaptive sliding window whose size is recomputed online according to the rate of change observed. • ADWIN has rigorous guarantees (theorems) • On ratio of false positives and negatives • On the relation of the size of the current window and change rates • ADWIN Bagging • When a change is detected, the worst classifier is removed and a new classifier is added. | 134

Leveraging Bagging for Evolving Data Streams (ECML-PKDD’10) • Randomization as a powerful tool to increase accuracy and diversity • There are three ways of using randomization: • Manipulating the input data • Manipulating the classifier algorithms • Manipulating the output targets | 135

Leveraging Bagging for Evolving Data Streams (ECML-PKDD’10) • Leveraging Bagging • Using (( • • Leveraging Bagging MC • Using (( • and Random Output Codes • Leveraging Bagging ME • if an instance is misclassified: ;* ℎ) % 1 • if not: ;* ℎ) % ’“ ’“ | 136

Empirical evaluation | 137 Accuracy RAM-Hours Hoeffding Tree 74.03% 0.01 Online Bagging 77.15% 2.98 ADWIN Bagging 79.24% 1.48 Leveraging Bagging 85.54% 20.17 Leveraging Bagging MC 85.37% 22.04 Leveraging Bagging ME 80.77% 0.87

Exercises Tutorial 1: Introduction to MOA

Big Data Intelligence & Analytics Tutorial 1. Introduction to MOA
https://moa.cms.waikato.ac.nz/tutorial-1-introduction-to-moa/ Exercise 1 Compare the accuracy of the Hoeffding Tree with the Naive Bayes classifier, for a RandomTreeGenerator stream of 1,000,000 instances using Interleaved Test-Then-Train evaluation. Use for all exercises a sample frequency of 10,000. 1. Classification tab 2. Configure Hoeffding Tree Task: a) Task: Interleaved Test-Then-Train evaluation b) Learner: Hoeffding Tree c) Streamer: RandomTreeGenerator d) Instance Limit: 1 000 000 e) Sample Frequency: 10 000 | 139

https://moa.cms.waikato.ac.nz/tutorial-1-introduction-to-moa/ Exercise 1 Compare the accuracy of the Hoeffding Tree with the Naive Bayes classifier, for a RandomTreeGenerator stream of 1,000,000 instances using Interleaved Test-Then-Train evaluation. Use for all exercises a sample frequency of 10,000. 3. Configure Naive Bayes Task: a) Task: Interleaved Test-Then-Train evaluation b) Learner: Hoeffding Tree c) Streamer: RandomTreeGenerator d) Instance Limit: 1 000 000 e) Sample Frequency: 10 000 | 140

| 141

https://moa.cms.waikato.ac.nz/tutorial-1-introduction-to-moa/ Exercise 2 Compare and discuss the accuracy for the same stream of the previous exercise using three different evaluations with a Hoeffding Tree: • Periodic Held Out with 1,000 instances for testing • Interleaved Test Then Train • Prequential with a sliding window of 1,000 instances. | 142

https://moa.cms.waikato.ac.nz/tutorial-1-introduction-to-moa/ Exercise 3 Compare the accuracy of the Hoeffding Tree with the Naive Bayes classifier, for a RandomRBFGeneratorDrift stream of 1,000,000 instances with speed change of 0,001 using Interleaved Test-Then-Train evaluation. | 143

https://moa.cms.waikato.ac.nz/tutorial-1-introduction-to-moa/ Exercise 4 Compare the accuracy for the same stream of the previous exercise using three different classifiers: • Hoeffding Tree with Majority Class at the leaves • Hoeffding Adaptive Tree • OzaBagAdwin with 10 HoeffdingTree | 144

MOA: Clustering | 145

Big Data Intelligence & Analytics MOA: Clustering • Definition: Clustering
is the distribution of a set of instances of examples into non-known groups according to some common relations or affinities. Examples: • Market segmentation of customers • Social network communities | 146

Big Data Intelligence & Analytics MOA: Clustering • Definition: Given
• a set of instances ! • a number of clusters ” • an objective function # () ! a clustering algorithm computes an assignment of a cluster for each instance: • ∶ ! → 1, … , ” that minimizes the objective function # () ! | 147

Big Data Intelligence & Analytics MOA: Clustering • Definition: Given
• a set of instances ! • a number of clusters ” • an objective function # () ", ! a clustering algorithm computes a set " of instances with |"| % ” that minimizes the objective function: # () ", ! % T $ Q∈˜ , " where • $ , # is the distance between and # • $ , " % A 0∈ $ , # is the distance from to the nearest point in " | 148

Big Data Intelligence & Analytics MOA: Clustering • K-means: 1.
choose - initial centers " % # , … , #t 2. while stopping criterion has not been met 3. for % 1, . . . , 4. find closest center #t ∈ " to each instance ' 5. assign instance ' to cluster "t 6. for - % 1, … , ” 7. set #t to be the center of mass of all points in " | 149

Big Data Intelligence & Analytics MOA: Clustering • K-means: •
” % 3 | 150 Source: https://blog.floydhub.com/introduction-to-k-means-clustering-in-python-with-scikit-learn/

Big Data Intelligence & Analytics MOA: Clustering • K-means++: 1.
choose initial center # 2. for - % 2, … , ” 3. select #t % ' ∈ ! with probability 6w ™, 0š[† ,˜ 4. while stopping criterion has not been met 5. for % 1, . . . , 6. find closest center #t ∈ " to each instance ' 7. assign instance ' to cluster "t 8. for - % 1, … , ” 9. set #t to be the center of mass of all points in " The intuition behind this approach is that spreading out the k initial cluster centers is a good thing: the first cluster center is chosen uniformly at random from the data points that are being clustered, after which each subsequent cluster center is chosen from the remaining data points with probability proportional to its squared distance from the point's closest existing cluster center. | 151

Big Data Intelligence & Analytics MOA: Clustering • Performance Measures
• Internal Measures • Sum square distance • Dunn index @ % 6›œF 6›3• • C-Index " % ž ž›œF ž›3• ž›œF • External Measures • Rand Measure • F Measure • Jaccard • Purity | 152

Big Data Intelligence & Analytics MOA: Clustering BIRCH • Balanced
Iterative Reducing And Clustering Using Hierarchies • Clustering Features tuples " % , Ÿ8, 88 • : number of data points • Ÿ8: linear sum of the data points • 88: square sum of the data points • Properties: • Additivity: " P " % P , Ÿ8 P Ÿ8 , 88 P 88 • Easy to compute: average inter-cluster distance and average intra-cluster distance • Uses CF tree • Height-balanced tree with two parameters • B: branching factor • T: radius leaf threshold | 153

Big Data Intelligence & Analytics MOA: Clustering BIRCH • Balanced
Iterative Reducing And Clustering Using Hierarchies • Phase 1: Scan all data and build an initial in-memory CF tree • Phase 2: Condense into desirable range by building a smaller CF tree (optional) • Phase 3: Global clustering • Phase 4: Cluster refining (optional and off line, as requires more passes) | 154

Big Data Intelligence & Analytics MOA: Clustering Clu-Stream • Uses
micro-clusters to store statistics on-line • Clustering Features " % , Ÿ8, 88, ŸD, 8D • : number of data points • Ÿ8: linear sum of the data points • 88: square sum of the data points • ŸD: linear sum of the time stamps • 8D: square sum of the time stamps • Uses pyramidal time frame | 155

Big Data Intelligence & Analytics MOA: Clustering Clu-Stream • On-line
Phase • For each new point that arrives • the point is absorbed by a micro-cluster • the point starts a new micro-cluster of its own • delete oldest micro-cluster • merge two of the oldest micro-cluster • Off-line Phase • Apply k-means using microclusters as points | 156

Big Data Intelligence & Analytics MOA: Clustering StreamKM++: Coresets Coreset
of a set with respect to some problem Small subset that approximates the original set . • Solving the problem for the coreset provides an approximate solution for the problem on . -, coreset A -, coreset 8 of is a subset of that for each " of size - 1 # () , " e # ()E 8, " e 1 P # () , " | 157 Ackermann, Lammersen et al. StreamKM++: A Clustering Algorithm for Data Streams. (ALENEX 2010): https://dl.acm.org/citation.cfm?id=2184450

Big Data Intelligence & Analytics MOA: Clustering Coreset Tree •
Choose a leaf node at random • Choose a new sample point denoted by †¡ from Z according to $ • Based on Z and †¡ , split Z into two subclusters and create two child nodes StreamKM++ • Maintain Ÿ % log a € P 2 buckets ¢V , ¢ , … , ¢£ | 158 Ackermann, Lammersen et al. StreamKM++: A Clustering Algorithm for Data Streams. (ALENEX 2010): https://dl.acm.org/citation.cfm?id=2184450

Exercises Tutorial 3: Introduction to MOA Clustering

Big Data Intelligence & Analytics Tutorial 3: Introduction to MOA
Clustering https://moa.cms.waikato.ac.nz/tutorial-3-introduction-to-moa-clustering/ Exercise 1 Try to figure out how the different aspects of the cluster algorithm are visualised: • Briefly note down how micro-clusters are visualised • Briefly note down how actual clusters are visualised • Briefly note down how the ground truth is visualised 1. Clustering tab 2. Algorithm 1: select Clustream 3. Algorithm 2: select StreamKM 4. Evaluation Measures: SSQ | 160

Clustering https://moa.cms.waikato.ac.nz/tutorial-3-introduction-to-moa-clustering/ • Exercise 2 Double the speed to 1000 and set speedRange to 10. Also increase the noise level to 0.333, which means that about every third data item is randomly generated. • Now try to find settings that keep the SSQ metric low. Note down your observations, i.e. what changes do you observe when you increase the number of micro-clusters, or when you change the radius. | 161

Clustering https://moa.cms.waikato.ac.nz/tutorial-3-introduction-to-moa-clustering/ • Exercise 3 Now use the same settings for the data stream as you used in Task 2, and set CluStream as Algorithm 1 with the same settings that have achieved the best result in Task 2. But this time you use for “Algorithm 2″ Clustree. This is a hierarchical clustering algorithm and allows adjusting two parameters, the horizon and the height of the hierarchy. Try to optimise the settings for Clustree. | 162

MOA: Frequent Pattern Mining | 163

Big Data Intelligence & Analytics MOA: Frequent Pattern Mining •
Suppose ¤ is a dataset of patterns, ) ∈ ¤, and A [¥™ is a constant. Definition: ¦§¨¨©ª« ) : number of patterns in ¤ that are super-patterns of ). Definition: Pattern ) is frequent if 8B'' ) ) • A [¥™ Frequent sub-pattern problem Given ¤ and A [¥™ , find all frequent sub-patterns of patterns in ¤. | 164

Dataset example: | 165 Doc. Patterns d1 abce d2 cde d3 abce d4 acde d5 abcde d6 bcd Support Frequent d1,d2,d3,d4,d5,d6 c d1,d2,d3,d4,d5 e,ce d1,d3,d4,d5 a,ac,ae,ace d1,d3,d5,d6 b,bc d2,d4,d5,d6 d,cd d1,d3,d5 ab,abc,abe be,bce,abce d2,d4,d5 de,cde A [¥™ % 3

Dataset example: | 166 Doc. Patterns d1 abce d2 cde d3 abce d4 acde d5 abcde d6 bcd Support Frequent 6 c 5 e,ce 4 a,ac,ae,ace 4 b,bc 4 d,cd 3 ab,abc,abe be,bce,abce 3 de,cde A [¥™ % 3

Closed and Maximal Patterns • The monotonicity property of support suggests a compressed representation of the set of frequent itemsets: • Maximal frequent itemsets: An item set is maximal if it is frequent, but none of its proper supersets is frequent. • Closed frequent itemsets: A frequent set is called closed if it has no frequent supersets with the same frequency. • The following relationship holds between these sets: :< A< ⊆ " (*$ ⊆ * B* ) The maximal itemsets are a subset of the closed itemsets. From the maximal itemsets it is possible to derive all frequent itemsets (not their support) by computing all non-empty intersections. | 167

Dataset example: | 168 Doc. Patterns d1 abce d2 cde d3 abce d4 acde d5 abcde d6 bcd Support Frequent Gen Closed d1,d2,d3,d4,d5,d6 c c c d1,d2,d3,d4,d5 e,ce e ce d1,d3,d4,d5 a,ac,ae,ace a ace d1,d3,d5,d6 b,bc b bc d2,d4,d5,d6 d,cd d cd d1,d3,d5 ab,abc,abe be,bce,abce ab abce d2,d4,d5 de,cde de cde

Dataset example: | 169 Doc. Patterns d1 abce d2 cde d3 abce d4 acde d5 abcde d6 bcd Support Frequent Gen Closed Max d1,d2,d3,d4,d5,d6 c c c d1,d2,d3,d4,d5 e,ce e ce d1,d3,d4,d5 a,ac,ae,ace a ace d1,d3,d5,d6 b,bc b bc d2,d4,d5,d6 d,cd d cd d1,d3,d5 ab,abc,abe be,bce,abce ab abce abce d2,d4,d5 de,cde de cde cde

Closed Patterns: Usually, there are too many frequent patterns. We can compute a smaller set, while keeping the same information. • Example: A set of 1000 items, has 2 VVV - 10®V subsets, that is more than the number of atoms in the universe (- 10¯°) | 170

Closed Patterns: • A priori property: If )′ is a sub-pattern of ), then 8B'' ) )± • 8B'' ) )± • Definition : A frequent pattern ) is closed if none of its proper super-patterns has the same support as it has. Frequent sub-patterns and their supports can be generated from closed patterns. | 171

Maximal Patterns: • Definition : A frequent pattern ) is maximal if none of its proper super-patterns is frequent. Frequent sub-patterns can be generated from maximal patterns, but not with their support. All maximal patterns are closed, but not all closed patterns are maximal. | 172

Big Data Intelligence & Analytics MOA: Frequent Pattern Mining Non
streaming frequent itemset miners: • Representation: • Horizontal layout T1: a, b, c T2: b, c, e T3: b, d, e • Vertical layout a: 1 0 0 b: 1 1 1 c: 1 1 0 • Search: • Breadth-first (levelwise): Apriori • Depth-first: Eclat, FP-Growth | 173

Big Data Intelligence & Analytics MOA: Frequent Pattern Mining Mining
Patterns over Data Streams: • Requirements: fast, use small amount of memory and adaptive: • Type: • Exact • Approximate • Per batch, per transaction • Incremental, Sliding Window, Adaptive • Frequent, Closed, Maximal patterns | 174

Big Data Intelligence & Analytics MOA: Frequent Pattern Mining Moment:
• Computes closed frequents itemsets in a sliding window • Uses Closed Enumeration Tree • Uses 4 type of Nodes: • Closed Nodes • Intermediate Nodes • Unpromising Gateway Nodes • Infrequent Gateway Nodes • Adding transactions: closed items remains closed • Removing transactions: infrequent items remains infrequent | 175

Big Data Intelligence & Analytics MOA: Frequent Pattern Mining FP-Stream:
• Mining Frequent Itemsets at Multiple Time Granularities • Based in FP-Growth • Maintains • pattern tree • tilted-time window • Allows to answer time-sensitive queries • Places greater information to recent data • Drawback: time and memory complexity | 176

Big Data Intelligence & Analytics MOA: Frequent Pattern Mining Tree
and Graph Mining: Dealing with time changes: • Keep a window on recent stream elements • Actually, just its lattice of closed sets! • Keep track of number of closed patterns in lattice, N • Use some change detector on N • When change is detected: • Drop stale part of the window • Update lattice to reflect this deletion, using deletion rule Alternatively, sliding window of some fixed size | 177

Big Data Intelligence & Analytics Part 1: Massi...

Big Data Intelligence & Analytics Part 1: Massive Online Analysis

More Decks by Mário Cordeiro

Other Decks in Research

Featured

Transcript