$30 off During Our Annual Pro Sale. View Details »

Masters Thesis Defense Slides

Masters Thesis Defense Slides

I defended my MSc thesis on April 20, 2011.

Gregory Ditzler

July 12, 2013
Tweet

More Decks by Gregory Ditzler

Other Decks in Research

Transcript

  1. Incremental Learning of Concept
    Drift from Imbalanced Data
    Master’s Thesis Defense
    Gregory Ditzler
    Thesis Committee
    Robi Polikar, Ph.D.
    Shreekanth Mandayam, Ph.D.
    Nancy Tinkham, Ph.D.
    Dept. of Electrical & Computer Engineering

    View Slide

  2. Contents
    • Introduction
    • Approach
    • Experiments
    • Conclusions
    Introduction Approach Experiments Conclusions

    View Slide

  3. Contents
    • Introduction
    • Approach
    • Experiments
    • Conclusions
    Introduction Approach Experiments Conclusions

    View Slide

  4. Issues Addressed in this Work
    • Incremental Learning: learning data over time without access to old
    datasets
    o OCR classifier trained on the English language applied to learning different languages
    • Identify characters not in the English language: ç, â, ê, î, ô, û, ë, ï, ü, ÿ, æ
    • Concept Drift: underlying data distribution change with time
    o Consumer ad relevance, spam detection, weather prediction
    • Class Imbalance: one or more classes are under-represented in the
    training data
    o Credit card fraud detection, cancer detection, financial data
    • Incremental Learning + Concept Drift + Class Imbalance
    o Many concept drift scenarios contain class imbalance
    • weather prediction, credit card fraud detection …
    Introduction Approach Experiments Conclusions

    View Slide

  5. Definitions
    Concept Drift: joint probability distribution, (, Ω), changes over
    time, i.e.
    , Ω ≠ +
    , Ω .
    • Drift can be caused by changes in (
    ),
    | , or (|
    )
    • Real vs. Virtual (perceived) drift
    • Drift severity
    o Slow, fast, abrupt, random,…
    o We would like an algorithm robust to any change regardless of the severity
    Introduction Approach Experiments Conclusions
    Bayes Theorem

    |
    posterior
    =
    (|
    )
    likelihood
    (
    )
    prior
    ()
    evidence

    View Slide

  6. Definitions
    • Types of concept drift (1
    & 2
    are sources that generate data)
    o Sudden Drift (Concept Change): Occurs at a point in time when source changes from 1
    to 2.
    o Gradual Drift: Data are sampled from multiple sources within a single time stamp.
    Generally, as time passes the probability of sampling from 1 decreases as the probability
    of sampling from 2 increases.
    o Incremental Drift: Data are sampled from a single source at each time stamp and the
    sources can slightly different between multiple time stamps. Drift can be observed
    globally
    o Reoccurring Drift: Reoccurring concepts appear when several different sources are used
    to generate data over time (similar to incremental and gradual drift)
    • Concept drift is the combination of a few different research areas
    Introduction Approach Experiments Conclusions
    Learning from Time
    Series Data (time-
    dependent)
    Knowledge
    transfer /
    Transfer learning
    Model Adaptation
    Concept
    Drift

    View Slide

  7. Definitions
    Class Imbalance: one (or more) classes are severely under-represented
    in the training data
    • Minority class is typically of more importance
    Incremental Learning: learn new knowledge, preserve old knowledge
    and no access to old data
    • Desired algorithm should find a balance between prior knowledge
    (stability) and new knowledge (plasticity) [2]
    • Ensembles have been shown to provide a good balance between
    stability and plasticity
    Introduction Approach Experiments Conclusions
    0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
    -600
    -400
    -200
    0
    200
    400
    600
    800
    1000
    feature 1
    feature 2
    Benign
    Malignant

    View Slide

  8. • Traditional Machine Learning Algorithms
    o Assume data are drawn from a fixed yet unknown distribution and a balanced dataset is
    available in its entirety
    • Concept Drift
    o Old knowledge can become irrelevant at a future point in time
    o Learners must dynamically adapt to the environment to remain strong predictors on new
    and future environments
    • Class Imbalance
    o Learners tend to bias themselves towards the majority class
    • Minority class is typically of great importance
    o Many concept drift algorithms tend to use error or a figure of merit derived from error to
    adapt to a nonstationary environment
    • Incremental Learning
    o If old data become irrelevant, how will the ensemble adapt to new data (environments)?
    o Existing approaches do not adhere to the incremental learning assumption
    • Combined Problem
    o Individual components have been addressed, but the combination of incremental
    learning, concept drift and class imbalance has been sparsely researched
    Introduction Approach Experiments Conclusions
    Challenges in Machine Learning

    View Slide

  9. Contents
    • Introduction
    • Approach
    • Experiments
    • Conclusions
    Introduction Approach Experiments Conclusions

    View Slide

  10. Prior Work
    • Learn++.NSE: incremental learning algorithm for concept drift [3,4]
    o Generate a classifier with each new batch of data, compute a pseudo error for each
    classifier, apply time-adjusted weighting mechanism, and call a weighted majority vote
    for an ensemble decision
    • Recent pseudo errors are weighted heavier than old errors
    o Works very well on a broad range of concept drift problems
    o Shortcomings of Learn++.NSE
    • No mechanism to learn a minority class
    • Uncorrelated Bagging (UCB): bagging inspired approach for
    learning concept drift from unbalanced data [5]
    o Accumulate old minority data and train classifiers using all the old minority data with a
    subset of the newest majority class data. Call a majority vote for the ensemble decision.
    o Shortcomings of UCB
    • What happens when there are accumulated data begins to become a
    “majority” class?
    • Explicit assumption the minority class does not drift
    • Violates the one-pass learning requirement of incremental learning [9]
    Introduction Approach Experiments Conclusions

    View Slide

  11. Prior Work
    • Selectively Recursive Approaches: select old minority data that are
    most “similar” to population of minority data [6-8]
    o Like UCB, selectively recursive approaches accumulates old minority class data.
    Accumulated minority instances are placed into a training set by selecting the instances
    that are the most similar to the newest minority data. Classifiers are trained and
    combined using a combination rule set forth by the specific approach.
    • The Mahalanobis distance is selected as the measure to quantify similarity
    o Shortcomings of SERA
    • What happens if the mean of the minority data does not change over time?
    • Mahalanobis distance works well for a Gaussian distribution, but what about
    non-Gaussian data?
    • Violates the one-pass learning requirement of incremental learning [9]
    Introduction Approach Experiments Conclusions

    View Slide

  12. Learn++ Solution
    • Batch-based incremental learning approaches for learning new
    knowledge and preserving old knowledge
    o Retaining classifiers increases stability without the need to require old data be
    accumulated
    • Learn++.CDS[10] {Concept Drift with SMOTE}
    o Apply SMOTE to Learn++.NSE
    • Learn++.NSE works well on problems involving concept drift
    • SMOTE works well at increasing the recall of a minority class
    • Learn++.NIE[11] {Nonstationary and Imbalanced Environments}
    o Classifiers are replaced with sub-ensembles
    • Sub-ensemble is applied to learn a minority class
    o Voting weights are assigned based on figures of merit besides a class independent error
    • All Learn++ based approaches use weighted majority voting
    Introduction Approach Experiments Conclusions

    View Slide

  13. Learn++.CDS
    Introduction Approach Experiments Conclusions
    ()
    Call BaseClassifier
    Compute a pseudo error,


    ()
    for = 1,2, … ,
    Determine time-
    adjusted weights
    WMV
    Call SMOTE
    Evaluate (−1) and
    form a penalty
    distribution over ()

    View Slide

  14. Learn++.CDS
    • Evaluate ensemble when new labeled
    data are presented
    o Determine instances that have not been learned
    from past experience
    o Maintain a penalty distribution over the new data
    • Call SMOTE with the minority data in

    o SMOTE percentage and number nearest
    neighbors
    o SMOTE reduces imbalance can provide more
    robust predictors on the minority class
    • SMOTE can also increase other figures
    of merit like F-measure or AUC
    • Train a new classifier using and
    the synthetic data generated with
    SMOTE
    Introduction Approach Experiments Conclusions
    Input: Training data () = {
    ∈ ;
    ∈ Ω} where = 1,2, … , ()
    Supervised learning algorithm BaseClassifier
    Sigmoid parameters: &
    for = 1,2, … do
    1. Compute error of the existing ensemble
    () =
    1
    ()
    ⋅ (−1)(
    ) ≠
    ()
    =1
    (1)
    2. Update and normalize instance weights
    ()() =
    1
    ()
    ⋅ () (−1)(
    ) =
    1
    (2)
    ()() = ()() ()()
    ()
    =1
    (3)
    3. Call SMOTE on () minority instances to obtian ()
    4. Call BaseClassifier with () and () to obtain:
    : →
    5. Evaluate existing classifiers on () and obtain pseudo error


    () = ()() ⋅
    (
    ) ≠
    ()
    =1
    (4)
    if
    =
    () > 1 2 , Generate new
    , end if
    if
    <
    () > 1 2 , Set
    <
    () = 1 2 end if


    () =

    () 1 −

    () (5)
    6. Compute weighted sum of normalized error where
    = 1,2, … ,


    () = 1/(1 + exp(−( − − ))) (6)


    () =

    ()

    (− )

    =0
    (7)


    () =

    (− )

    (−)

    =0
    (8)
    7. Calculate voting weight


    () = log
    1


    ()
    (9)
    8. Compute ensemble decision
    ()() = arg max
    ∈Ω


    ()
    (
    ) =

    =1
    (10)
    end for
    Output: Call WMV to compute ()()

    View Slide

  15. Learn++.CDS
    • Evaluate ensemble when new labeled
    data are presented
    o Determine instances that have not been learned
    from past experience
    o Maintain a penalty distribution over the new data
    • Call SMOTE with the minority data in

    o SMOTE percentage and number nearest
    neighbors
    o SMOTE reduces imbalance can provide more
    robust predictors on the minority class
    • SMOTE can also increase other figures
    of merit like F-measure or AUC
    • Train a new classifier using and
    the synthetic data generated with
    SMOTE
    Introduction Approach Experiments Conclusions
    for = 1,2, … do
    1. Compute error of the existing ensemble
    () =
    1
    ()
    ⋅ (−1)(
    ) ≠
    ()
    =1
    (1)
    2. Update and normalize instance weights
    ()() =
    1
    ()
    ⋅ () (−1)(
    ) =
    1
    (2)
    ()() = ()() ()()
    ()
    =1
    (3)
    3. Call SMOTE on () minority instances to
    obtian ()
    4. Call BaseClassifier with () and () to
    obtain:
    : →

    View Slide

  16. SMOTE
    • Synthetic Minority Over-sampling
    TEchnique [19]
    o Generate “synthetic” instances on the line
    segment connect two neighboring minority
    class instances
    o Avoids issues commonly encountered with
    random under/over-sampling of the
    majority/minority class
    • Select one of the -nearest neighbors
    of a minority class instance
    o Generate a synthetic instance given by

    + − where is the nearest neighbor
    of and is the “gap” parameter
    o Gap controls where the synthetic instance lies
    on the segment between two nearest neighbors
    • Synthetic samples lie within the
    convex hull of the original minority
    class sample
    Introduction Approach Experiments Conclusions
    Input: Minority data () = {
    ∈ } where =
    1,2, … ,
    Number of minority instances (), SMOTE
    percentage (), number of nearest neighbors ()
    for = 1,2, … , do
    1. Find the nearest (minority class) neighbors of

    2. = /100
    while ≠ 0 do
    1. Select one of the nearest neighbors, call this

    2. Select a random number ∈ [0,1]
    3. =
    + ( −
    )
    4. Append to
    5. = − 1
    end while
    end for
    Output: Return synthetic data

    View Slide

  17. Learn++.CDS
    • Evaluate all classifiers on the new data
    and compute a pseudo error
    o Apply the penalty distribution () to compute
    pseudo error
    • Some instances incur more of a
    misclassification penalty than others
    o If a new classifier’s error is greater than ½
     generate new classifier
    o If an old classifier’s error is greater than ½
     set to ½
    o Normalize pseudo error
    • Compute age-adjusted weighted sum of a
    classifiers errors
     Apply normalized logistic sigmoid
    o Recent weighted errors are weighted heavier
    o Voting weight is proportional to the weighted
    sum
    • Final Hypothesis is made with WMV
    Introduction Approach Experiments Conclusions
    5. Evaluate existing classifiers on () and
    obtain pseudo error


    () = ()() ⋅
    (
    ) ≠
    ()
    =1
    (4)
    if
    =
    () > 1 2 , Generate new
    , end if
    if
    <
    () > 1 2 , Set
    <
    () = 1 2 end if


    () =

    () 1 −

    () (5)
    6. Compute weighted sum of normalized error
    where = 1,2, … ,


    () = 1/(1 + exp(−( − − ))) (6)


    () =

    ()

    (−)

    =0
    (7)


    () =

    (−)

    (−)

    =0
    (8)
    7. Calculate voting weight


    () = log
    1


    ()
    (9)
    8. Compute ensemble decision
    ()() = arg max
    ∈Ω


    ()
    (
    ) =

    =1
    (10)
    end for
    Output: Call WMV to compute ()()

    View Slide

  18. Learn++.CDS
    • Evaluate all classifiers on the new data
    and compute a pseudo error
    o Apply the penalty distribution () to compute
    pseudo error
    • Some instances incur more of a
    misclassification penalty than others
    o If a new classifier’s error is greater than ½
     generate new classifier
    o If an old classifier’s error is greater than ½
     set to ½
    o Normalize pseudo error
    • Compute age-adjusted weighted sum of a
    classifiers errors
     Apply normalized logistic sigmoid
    o Recent weighted errors are weighted heavier
    o Voting weight is proportional to the weighted
    sum
    • Final Hypothesis is made with WMV
    Introduction Approach Experiments Conclusions
    Input: Training data () = {
    ∈ ;
    ∈ Ω} where = 1,2, … , ()
    Supervised learning algorithm BaseClassifier
    Sigmoid parameters: &
    for = 1,2, … do
    1. Compute error of the existing ensemble
    () =
    1
    ()
    ⋅ (−1)(
    ) ≠
    ()
    =1
    (1)
    2. Update and normalize instance weights
    ()() =
    1
    ()
    ⋅ () (−1)(
    ) =
    1
    (2)
    ()() = ()() ()()
    ()
    =1
    (3)
    3. Call SMOTE on () minority instances to obtian ()
    4. Call BaseClassifier with () and () to obtain:
    : →
    5. Evaluate existing classifiers on () and obtain pseudo error


    () = ()() ⋅
    (
    ) ≠
    ()
    =1
    (4)
    if
    =
    () > 1 2 , Generate new
    , end if
    if
    <
    () > 1 2 , Set
    <
    () = 1 2 end if


    () =

    () 1 −

    () (5)
    6. Compute weighted sum of normalized error where
    = 1,2, … ,


    () = 1/(1 + exp(−( − − ))) (6)


    () =

    ()

    (− )

    =0
    (7)


    () =

    (− )

    (−)

    =0
    (8)
    7. Calculate voting weight


    () = log
    1


    ()
    (9)
    8. Compute ensemble decision
    ()() = arg max
    ∈Ω


    ()
    (
    ) =

    =1
    (10)
    end for
    Output: Call WMV to compute ()()

    View Slide

  19. View Slide

  20. Learn++.NIE
    • Ensembles have been popular for learning unbalanced data
    o Ensemble approaches can increase the recall and several other figures of merit when
    facing an unbalanced data problem
    o BEV[12], SMOTEBoost[13], DataBoost-IM[14], and RAMOBoost[15]
    • Like Learn++.CDS, Learn++.NIE uses many of the fundamental
    principles to learn in nonstationary environments
    o Ensemble classifier approach
    o Time-adjusted weighting mechanism
    o Classifiers are combined with a weighted majority vote
    • Unlike Learn++.CDS, Learn++.NIE uses several new components to
    learn concept drift from unbalanced data
    o Multiple classifiers are generated at each time stamp
    o New figures of merit are applied to determine a sub-ensemble voting weight
    • Strategy: track concept drift using figures of merit other than class
    independent error to combine sub-ensembles using a time-adjusted
    weighting scheme
    Introduction Approach Experiments Conclusions

    View Slide

  21. Learn++.NIE
    Introduction Approach Experiments Conclusions
    ()
    1,
    2,
    ,
    SMV 1
    , … ,
    , … ,
    Compute

    ()
    for = 1,2, … ,
    Determine time-
    adjusted weights
    WMV

    View Slide

  22. Learn++.NIE
    • Ensemble of classifiers are created at
    each time step
    o Train classifiers all minority data +
    randomly sampled subsets of the newest
    majority data
    o Sub-ensemble combination rule is a
    majority vote
    • Compute

    ()
    as a figure of merit for
    each sub-ensemble on ()
    o Replacement of the pseudo error
    o

    ()
    should reflect the performance on all
    classes
    • Learn++.NIE follows Learn++.CDS
    from this point
    Introduction Approach Experiments Conclusions
    Input: Training data () = {
    ∈ ;
    ∈ Ω} where = 1,2, … , ()
    Supervised learning algorithm BaseClassifier
    Sigmoid parameters: &
    Ensemble size:
    for = 1,2, … do
    1. Call
    = , (),
    2. Evaluate all existing sub-ensembles on () to produce
    instance labels,

    ()
    where = 1,2, … , . Determine classifier
    weight measure

    ()
    using (17), (18), or (19).
    if
    =
    () > 1 2 , Generate new sub-ensemble; end if
    <
    () > 1 2 ,
    Set
    <
    () = 1 2 end if


    () =

    () 1 −

    () (11)
    3. Compute weighted sum of normalized error where =
    1,2, … ,


    () = 1/(1 + exp(−( − − ))) (12)


    () =

    ()

    (−)

    =0
    (13)


    () =

    (−)

    (−)

    =0
    (14)
    4. Calculate voting weight


    () = log
    1


    ()
    (15)
    5. Compute ensemble decision
    ()() = arg max
    ∈Ω


    ()
    (
    ) =

    =1
    (16)
    end for
    Output: Call WMV to compute
    ()

    View Slide

  23. Computing

    ()
    • F-measure {Learn++.NIE (fm)}
    o Combination of precision and recall.
    • Precision: fraction of retrieved documents relevant to the search
    • Recall: fraction of relevant documents that were successfully retrieved
    o 1-measure is implied with F-measure

    = 1 − 2
    precision × recall
    precision + recall
    = 1 − 1
    • Weight Recall Measure {Learn++.NIE (wavg)}
    o Convex combination of the majority class error, ,, and minority class error, ,.

    = ,
    + 1 − ,
    o ∈ [0,1] controls the weight given to the majority and minority class
    • Geometric Mean {Learn++.NIE (gm)}
    o Classifiers performing poorly on one or more classes will have a low G-mean to reflect this
    performance

    = 1 − 1 − ,
    1


    =1
    Introduction Approach Experiments Conclusions
    (17)
    (18)
    (19)

    View Slide

  24. Contents
    • Introduction
    • Approach
    • Experiments
    • Conclusions
    Introduction Approach Experiments Conclusions

    View Slide

  25. Figures of Merit
    • Raw Classification Accuracy
    =
    1

    =

    =1
    =
    +
    + + +
    • Precision
    precision =

    +
    • Recall
    recall =

    +
    • Geometric Mean
    =
    1


    =1
    • F-measure
    1
    = 2
    precision × recall
    precision + recall
    Introduction Approach Experiments Conclusions

    View Slide

  26. Figures of Merit
    • Area Under the ROC Curve (AUC)
    o ROC curves depict the tradeoff between false positives and true positives
    o AUC is equivalent to the probability that a classifier will rank a randomly chosen
    positive instance higher than a randomly chosen negative instance [16]
    • AUC=0.5  randomly computing labels
    Introduction Approach Experiments Conclusions
    Fig.: A naïve Bayes classifier with a Gaussian kernel was
    generated on 10,000 random instances drawn from a
    standardized Gaussian distribution. The class labels are produced
    by computing the sign(N(0, 1)). The AUC for w1 (left) is 0.50185
    and w2 (right) is 0.50295.
    Fig.: A naïve Bayes classifier with a Gaussian kernel was
    generated on 10,000 randomly selected instances and tested on
    6,000 randomly selected instances. The ROC curve was generated
    using 200 thresholds. The AUC for w1 (right) is 0.7905 and w3
    (left) is 0.9229.

    View Slide

  27. Figures of Merit
    • Overall Performance Measure (OPM)
    o OPM is a convex combination of RCA, -measure, AUC and recall
    o For the purpose of this study, 1
    = 2
    = 3
    = 4
    = 1
    4
    = 1
    × RCA + 2
    1
    + 3
    × AUC + 4
    × recall
    • Ranking Algorithms
    o The average of RCA, -measure, AUC and recall is computed over the entire experiment
    o Classifiers are ranked from (1) to (k), where k is the number of classifiers used in the
    comparison
    • Fractional based ranks are applied in the scenario of a tie
    • (1)  best performing
    • (k)  worst performing
    Introduction Approach Experiments Conclusions
    Measure 1 Measure 2 …
    Algorithm 1 90±1.2 (1) 85±1.2 (1.5) …
    Algorithm 2 85±1.0 (k) 85±1.0 (1.5) …
    ⋮ ⋮ ⋮ …
    Algorithm k 89±1.5 (2) 60±1.5 (k) …

    View Slide

  28. Datasets Used in Experiments
    Synthetic Datasets
    • Rotating Spiral
    • Rotating Checkerboard
    • Drifting Gaussian Data
    • Shifting Hyperplane
    Real-World Datasets
    • Australia Electricity Pricing
    • NOAA Weather Data
    Introduction Approach Experiments Conclusions
    [DA] http://hottavainen.deviantart.com/art/Rainy-day-gif-animation-182893258
    [TMI] http://en.wikipedia.org/wiki/File:Three_Mile_Island_(color)-2.jpg
    [DA]
    [TMI]

    View Slide

  29. Synthetic Datasets
    Rotating Spiral Dataset
    • Generated with four spirals belonging to one of two classes
    o Data are generated for 300 time stamps with a reoccurring environment beginning at
    t=150
    o Interesting properties: mean of the data are not changing, reoccurring environments
    • Data are generated such that ≈5% class imbalance is present
    Introduction Approach Experiments Conclusions

    View Slide

  30. Synthetic Datasets
    Rotating Checkerboard Dataset
    • Two class problem with a reoccurring environment and a constant
    drift rate
    o Experiment is carried out over 200 time stamps with the reoccurring environment
    beginning at t=100
    • Data are generated such that ≈5% class imbalance is present
    Introduction Approach Experiments Conclusions

    View Slide

  31. Synthetic Datasets
    Drifting Gaussian Dataset
    • Linear combination of four Gaussian components
    o 3 majority + 1 minority
    o Drift is found in the mean and covariance throughout the duration of the experiment
    • Data are generated such that ≈3% class imbalance is present
    Introduction Approach Experiments Conclusions

    View Slide

  32. Synthetic Datasets
    Shifting Hyperplane
    • Hyperplane changes location at three points in time
    o Three features only two of which are relevant
    o Class imbalance changes as the plane shift. Thus, change in | and changes.
    • Dual change
    • Data are generated such that ≈7-25% class imbalance is present
    Introduction Approach Experiments Conclusions

    View Slide

  33. Real-World Datasets
    • Nebraska Weather Dataset
    o Predict whether it rained on any given day
    o ≈50 years of daily recordings
    o Features: minimum/average/maximum temperature, average/maximum wind speed,
    visibility, sea level pressure, and dew point
    o Imbalance: ≈ 30% with a minimum of ≈ 10%
    • Australia Electricity Pricing Dataset
    o Predict whether the price in electricity went up or down
    o Features: day, period, NSW demand, VIC demand and the scheduled transfer between
    the two states
    o Imbalance: ≈ 5% (achieve through under sampling)
    Introduction Approach Experiments Conclusions

    View Slide

  34. Algorithm Comparisons
    • Proposed Approaches
    o Learn++.NIE(fm), Learn++.NIE(gm), Learn++.NIE(wavg), and Learn++.CDS
    • Streaming Ensemble Algorithm (SEA)[17]
    • Learn++.NSE[3]
    • Selectively Recursive Approach[6]
    • Uncorrelated Bagging[5]
    • Making the comparisons
    o Base classifier is a CART decision tree algorithm for all algorithms
    o All algorithm parameters are selected in the same manner and remain constant, unless
    the parameter must be adjusted for each dataset, i.e. SMOTE depends on level of
    imbalance
    o Specific algorithm parameters have been selected based on conclusions reached in the
    authors comments
    Introduction Approach Experiments Conclusions

    View Slide

  35. Key Observations
    1. Learn++.NIE (fm) and Learn++.CDS consistently provide ranks near
    the top three for OPM on nearly all datasets tested.
    a) Results are significant compared to Learn++.NSE, SERA, and SEA
    2. Learn++.NIE (fm) and Learn++.CDS typically provide a significant
    increase in recall, AUC, FM, and OPM compared to their
    predecessors.
    3. UCB’s increase in recall comes at the cost of the OA and FM.
    4. Learn++.CDS improves the OPM rank over Learn++.NSE on every
    dataset tested
    5. Learn++.NIE (fm) typically provides better results than the (gm) or
    (wavg).
    Introduction Approach Experiments Conclusions

    View Slide

  36. Rotating Spiral Dataset
    Introduction Approach Experiments Conclusions
    RCA F-measure AUC Recall OPM Mean Rank
    Learn++.NSE 97.76±0.11 (1) 86.13±0.76 (1) 91.33±0.49 (6) 76.96±1.17 (6) 88.05±0.63 (6) 4.0
    SEA 96.65±0.12 (4) 78.97±0.84 (7) 88.91±0.50 (7) 69.49±1.15 (7) 83.51±0.65 (7) 6.4
    Learn++.NIE(fm) 97.30±0.13 (2) 85.87±0.65 (2) 97.34±0.26 (2) 89.87±0.73 (3) 92.60±0.44 (1) 2.0
    Learn++.NIE(gm) 96.11±0.16 (6) 80.57±0.70 (5) 93.11±0.38 (4) 87.21±0.80 (4) 89.25±0.51 (4) 4.6
    Learn++.NIE(wavg) 96.08±0.16 (7) 80.46±0.70 (6) 93.09±0.39 (5) 87.20±0.80 (5) 89.21±0.51 (5) 5.6
    Learn++.CDS 96.81±0.15 (3) 84.15±0.65 (3) 96.15±0.31 (3) 91.77±0.71 (2) 92.22±0.46 (3) 2.8
    SERA 92.73±0.32 (8) 62.67±1.66 (8) 80.96±1.10 (8) 66.57±2.17 (8) 75.73±1.45 (8) 8.0
    UCB 96.42±0.16 (5) 82.57±0.69 (4) 98.18±0.19 (1) 92.74±0.65 (1) 92.48±0.42 (2) 2.6
    0 50 100 150 200 250 300
    0.88
    0.9
    0.92
    0.94
    0.96
    0.98
    1
    AUC
    0 50 100 150 200 250 300
    0.8
    0.82
    0.84
    0.86
    0.88
    0.9
    0.92
    0.94
    Recall
    time step
    0 50 100 150 200 250 300
    0.65
    0.7
    0.75
    0.8
    0.85
    0.9
    F-measure
    time step
    (c) (d)
    (b)
    0 50 100 150 200 250 300
    0.93
    0.94
    0.95
    0.96
    0.97
    0.98
    0.99
    RCA
    Learn++.NIE (FM)
    Learn++.NIE (GM)
    Learn++.NIE (WAVG)
    (a)
    0 50 100 150 200 250 300
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    F-measure
    time step
    0 50 100 150 200 250 300
    0.75
    0.8
    0.85
    0.9
    0.95
    1
    AUC
    0 50 100 150 200 250 300
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    Recall
    time step
    (c) (d)
    (b)
    0 50 100 150 200 250 300
    0.88
    0.9
    0.92
    0.94
    0.96
    0.98
    1
    RCA
    UCB
    SERA
    Learn++.CDS
    Learn++.NIE (FM)
    (a)

    View Slide

  37. Rotating Checkerboard Dataset
    Introduction Approach Experiments Conclusions
    RCA F-measure AUC Recall OPM Mean Rank
    Learn++.NSE 97.45±0.17 (1) 68.25±2.14 (2) 83.76±1.17 (4) 56.55±2.48 (7) 76.50±1.49 (3) 3.4
    SEA 87.41±0.63 (7) 21.93±1.63 (8) 65.75±1.29 (8) 31.87±2.18 (8) 51.74±1.43 (8) 7.8
    Learn++.NIE(fm) 95.06±0.47 (3) 61.45±2.51 (3) 92.62±0.85 (1) 74.32±2.20 (3) 80.86±1.51 (2) 2.4
    Learn++.NIE(gm) 90.02±0.51 (5) 42.11±1.94 (5) 83.37±1.13 (5) 66.76±2.20 (5) 70.57±1.45 (6) 5.2
    Learn++.NIE(wavg) 89.89±0.51 (6) 41.15±1.86 (6) 82.75±1.12 (6) 65.91±2.16 (6) 69.93±1.41 (7) 6.2
    Learn++.CDS 97.18±0.21 (2) 72.93±1.82 (1) 90.89±0.96 (3) 74.50±2.19 (2) 83.88±1.30 (1) 1.8
    SERA 92.89±0.43 (4) 52.57±2.29 (4) 80.80±1.29 (7) 67.39±2.55 (4) 73.41±1.64 (5) 4.8
    UCB 85.78±0.51 (8) 38.26±1.44 (7) 91.89±0.70 (2) 82.33±1.75 (1) 74.57±1.10 (4) 4.4
    0 50 100 150 200
    0.7
    0.75
    0.8
    0.85
    0.9
    0.95
    1
    AUC
    0 50 100 150 200
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    Recall
    time step
    0 50 100 150 200
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    F-measure
    time step
    (b)
    (c) (d)
    0 50 100 150 200
    0.8
    0.85
    0.9
    0.95
    1
    RCA
    Learn++.NIE (FM)
    Learn++.NIE (GM)
    Learn++.NIE (WAVG)
    (a)
    0 50 100 150 200
    0.7
    0.75
    0.8
    0.85
    0.9
    0.95
    1
    AUC
    0 50 100 150 200
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    Recall
    time step
    0 50 100 150 200
    0
    0.2
    0.4
    0.6
    0.8
    1
    F-measure
    time step
    (b)
    (c) (d)
    0 50 100 150 200
    0.65
    0.7
    0.75
    0.8
    0.85
    0.9
    0.95
    1
    RCA
    UCB
    SERA
    Learn++.CDS
    Learn++.NIE (FM)
    (a)

    View Slide

  38. Drifting Gaussian Dataset
    RCA F-measure AUC Recall OPM Mean Rank
    Learn++.NSE 97.63±0.18 (1) 66.30±2.62 (4) 83.65±1.43 (7) 58.33±3.15 (7) 76.48±1.85 (7) 5.2
    SEA 97.46±0.18 (3) 64.39±2.44 (5) 82.97±1.31 (8) 56.40±2.84 (8) 75.31±1.69 (8) 6.4
    Learn++.NIE(fm) 96.11±0.27 (5) 67.30±1.95 (3) 95.80±0.67 (2) 86.74±2.01 (2) 86.45±0.99 (2) 2.8
    Learn++.NIE(gm) 95.24±0.27 (6) 63.37±1.86 (7) 92.12±0.89 (4) 86.51±1.90 (3) 84.31±1.23 (4) 4.8
    Learn++.NIE(wavg) 95.20±0.28 (8) 62.93±1.91 (8) 91.60±0.94 (5) 85.42±1.97 (4) 83.79±1.28 (5) 6.0
    Learn++.CDS 97.50±0.20 (2) 74.21±1.90 (1) 92.19±1.07 (3) 80.85±2.45 (5) 86.19±1.41 (3) 2.8
    SERA 97.37±0.22 (4) 70.76±2.28 (2) 85.99±1.46 (6) 73.52±2.96 (6) 81.91±1.73 (6) 4.8
    UCB 95.22±0.30 (7) 63.74±1.94 (6) 96.84±0.54 (1) 92.02±1.56 (1) 86.96±1.09 (1) 3.2
    Introduction Approach Experiments Conclusions
    0 20 40 60 80 100
    0.75
    0.8
    0.85
    0.9
    0.95
    1
    AUC
    0 20 40 60 80 100
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    Recall
    time step
    0 20 40 60 80 100
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    F-measure
    time step
    (c) (d)
    (b)
    0 20 40 60 80 100
    0.88
    0.9
    0.92
    0.94
    0.96
    0.98
    1
    RCA
    Learn++.NIE (FM)
    Learn++.NIE (GM)
    Learn++.NIE (WAVG)
    (a)
    0 20 40 60 80 100
    0.9
    0.92
    0.94
    0.96
    0.98
    1
    Performance
    UCB
    SERA
    Learn++.CDS
    Learn++.NIE (FM)
    0 20 40 60 80 100
    0.75
    0.8
    0.85
    0.9
    0.95
    1
    AUC
    0 20 40 60 80 100
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    Recall
    time step
    0 20 40 60 80 100
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    F-measure
    time step
    (c) (d)
    (a) (b)

    View Slide

  39. Shifting Hyperplane Dataset
    RCA F-measure AUC Recall OPM Mean Rank
    Learn++.NSE 94.98±0.26 (1) 71.98±1.57 (2) 83.30±0.90 (6) 62.87±1.96 (7) 78.28±1.17 (5) 4.2
    SEA 94.00±0.26 (3) 68.13±1.48 (3) 82.00±0.85 (7) 60.28±1.77 (8) 76.10±1.09 (7) 5.6
    Learn++.NIE(fm) 92.38±0.46 (7) 67.27±1.62 (6) 85.93±0.90 (1) 74.83±1.60 (1) 80.10±1.15 (2) 3.4
    Learn++.NIE(gm) 93.03±0.31 (5) 67.90±1.36 (5) 84.51±0.81 (4) 72.17±1.61 (3) 79.40±1.02 (3) 4.0
    Learn++.NIE(wavg) 93.25±0.30 (4) 67.94±1.39 (4) 84.08±0.83 (5) 70.65±1.65 (4) 78.98±1.07 (4) 4.2
    Learn++.CDS 94.75±0.28 (2) 72.24±1.46 (1) 85.16±0.84 (3) 68.80±1.79 (5) 80.24±1.09 (1) 2.4
    SERA 92.47±0.44 (6) 63.01±1.84 (7) 80.11±1.08 (8) 64.68±2.17 (6) 75.07±1.38 (8) 7.0
    UCB 90.77±0.45 (8) 62.05±1.44 (8) 85.84±0.95 (2) 73.34±1.66 (2) 78.00±1.13 (6) 5.2
    Introduction Approach Experiments Conclusions
    0 50 100 150 200
    0.65
    0.7
    0.75
    0.8
    0.85
    0.9
    0.95
    1
    AUC
    0 50 100 150 200
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    Recall
    time step
    0 50 100 150 200
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    F-measure
    time step
    (b)
    (c) (d)
    0 50 100 150 200
    0.75
    0.8
    0.85
    0.9
    0.95
    1
    RCA
    Learn++.NIE (FM)
    Learn++.NIE (GM)
    Learn++.NIE (WAVG)
    (a)
    0 50 100 150 200
    0.7
    0.75
    0.8
    0.85
    0.9
    0.95
    AUC
    0 50 100 150 200
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    Recall
    time step
    0 50 100 150 200
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    F-measure
    time step
    (c) (d)
    (b)
    0 50 100 150 200
    0.75
    0.8
    0.85
    0.9
    0.95
    1
    RCA
    UCB
    SERA
    Learn++.CDS
    Learn++.NIE (FM)
    (a)

    View Slide

  40. Electricity Pricing Dataset
    RCA F-measure AUC Recall OPM Mean Rank
    Learn++.NSE 90.75±0.86 (2) 15.40±3.05 (7) 59.66±2.04 (7) 16.87±3.31 (7) 45.67±2.32 (7) 6.0
    SEA 92.15±0.60 (1) 9.37±2.15 (8) 58.48±1.55 (8) 10.53±2.19 (8) 42.63±1.62 (8) 6.6
    Learn++.NIE(fm) 82.60±1.80 (6) 20.79±2.55 (3) 72.45±2.15 (1) 38.72±4.93 (3) 53.64±2.86 (3) 3.2
    Learn++.NIE(gm) 83.60±1.30 (5) 22.29±2.64 (1) 70.70±2.34 (2) 38.37±4.68 (4) 53.74±2.74 (2) 2.8
    Learn++.NIE(wavg) 84.70±1.15 (4) 21.88±2.61 (2) 69.54±2.23 (4) 35.61±4.28 (5) 52.93±2.57 (4) 3.8
    Learn++.CDS 88.48±1.12 (3) 18.09±3.05 (6) 60.58±2.27 (6) 22.91±4.07 (6) 47.52±2.63 (6) 5.4
    SERA 76.42±1.70 (7) 19.91±2.06 (4) 62.42±2.22 (2) 46.46±4.70 (2) 51.30±2.67 (5) 4.6
    UCB 68.23±1.72 (8) 18.68±1.75 (5) 69.74±2.34 (3) 58.87±4.47 (1) 53.88±2.57 (1) 3.6
    Introduction Approach Experiments Conclusions
    0 10 20 30 40 50 60
    0.55
    0.6
    0.65
    0.7
    0.75
    0.8
    0.85
    0.9
    0.95
    AUC
    0 10 20 30 40 50 60
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    Recall
    time step
    0 10 20 30 40 50 60
    0.05
    0.1
    0.15
    0.2
    0.25
    0.3
    0.35
    0.4
    F-measure
    time step
    (c) (d)
    (b)
    0 10 20 30 40 50 60
    0.7
    0.75
    0.8
    0.85
    0.9
    0.95
    RCA
    Learn++.NIE (FM)
    Learn++.NIE (GM)
    Learn++.NIE (WAVG)
    (a)
    0 10 20 30 40 50 60
    0.5
    0.55
    0.6
    0.65
    0.7
    0.75
    0.8
    0.85
    0.9
    AUC
    0 10 20 30 40 50 60
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    Recall
    time step
    0 10 20 30 40 50 60
    0.05
    0.1
    0.15
    0.2
    0.25
    0.3
    0.35
    0.4
    F-measure
    time step
    (b)
    (c) (d)
    0 10 20 30 40 50 60
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    RCA
    UCB
    SERA
    Learn++.CDS
    Learn++.NIE (FM)
    (a)

    View Slide

  41. Weather Dataset
    RCA F-measure AUC Recall OPM Mean Rank
    Learn++.NSE 73.35±0.00 (4) 51.27±0.00 (5) 72.08±0.00 (6) 49.38±0.00 (6) 61.52±0.00 (5) 5.2
    SEA 75.81±0.00 (1) 50.43±0.00 (6) 73.37±0.00 (4) 42.86±0.00 (8) 60.62±0.00 (6) 5.0
    Learn++.NIE(fm) 70.54±1.08 (7) 59.19±1.31 (3) 77.84±0.79 (1) 72.48±2.19 (1) 70.01±1.34 (2) 2.8
    Learn++.NIE(gm) 73.53±0.80 (3) 60.78±1.12 (2) 76.83±0.69 (2) 69.27±1.84 (2) 70.10±1.11 (1) 2.0
    Learn++.NIE(wavg) 74.07±0.74 (2) 60.94±1.04 (1) 76.42±0.66 (3) 68.04±1.71 (3) 69.87±1.04 (3) 2.4
    Learn++.CDS 73.05±0.93 (5) 52.89±1.74 (4) 72.91±1.03 (5) 53.75±2.69 (4) 63.15±1.60 (4) 4.6
    SERA 65.17±1.83 (8) 48.38±2.30 (7) 63.54±1.48 (8) 58.49±4.16 (7) 58.90±2.44 (7) 6.8
    UCB 70.82±1.43 (6) 46.40±3.18 (8) 71.07±1.57 (7) 45.54±4.77 (8) 58.46±2.74 (8) 7.2
    Introduction Approach Experiments Conclusions
    0 50 100 150
    0.5
    0.55
    0.6
    0.65
    0.7
    0.75
    0.8
    0.85
    0.9
    AUC
    0 50 100 150
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    Recall
    time step
    0 50 100 150
    0.35
    0.4
    0.45
    0.5
    0.55
    0.6
    0.65
    0.7
    0.75
    F-measure
    time step
    (b)
    (c) (d)
    0 50 100 150
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    RCA
    Learn++.NIE (FM)
    Learn++.NIE (GM)
    Learn++.NIE (WAVG)
    (a)
    0 50 100 150
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    AUC
    0 50 100 150
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    Recall
    time step
    0 50 100 150
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    F-measure
    time step
    (b)
    (c) (d)
    0 50 100 150
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    RCA
    UCB
    SERA
    Learn++.CDS
    Learn++.NIE (FM)
    (a)

    View Slide

  42. Overall Results
    Introduction Approach Experiments Conclusions
    gauss checker spiral hyper elec noaa mean
    Learn++.NSE 7 3 5 6 7 5 5.50
    SEA 8 8 7 7 8 6 7.33
    Learn++.NIE (fm) 2 2 2 1 3 2 2.00
    Learn++.NIE (gm) 4 6 3 4 2 1 3.33
    Learn++.NIE (wavg) 5 7 4 5 4 3 4.67
    Learn++.CDS 3 1 1 3 6 4 3.00
    SERA 6 5 8 8 5 7 6.50
    UCB 1 4 6 2 1 8 3.67
    gauss checker spiral hyper elec noaa mean
    Learn++.NSE 7 4 6 6 7 6 6.00
    SEA 8 8 7 7 8 4 7.00
    Learn++.NIE (fm) 2 1 2 1 1 1 1.33
    Learn++.NIE (gm) 4 5 4 4 2 2 3.50
    Learn++.NIE (wavg) 5 6 5 5 4 3 4.67
    Learn++.CDS 3 3 3 3 6 5 3.83
    SERA 6 7 8 8 5 8 7.00
    UCB 1 2 1 2 3 7 2.67
    Table: OPM ranks over all datasets
    Table: AUC ranks over all datasets
    gauss checker spiral hyper elec noaa mean
    Learn++.NSE 4 2 1 2 7 5 3.50
    SEA 5 8 7 3 8 6 6.17
    Learn++.NIE (fm) 3 3 2 6 3 3 3.33
    Learn++.NIE (gm) 7 5 5 5 1 2 4.17
    Learn++.NIE (wavg) 8 6 6 4 2 1 4.50
    Learn++.CDS 1 1 3 1 6 4 2.67
    SERA 2 4 8 7 4 7 5.33
    UCB 6 7 4 8 5 8 6.33
    Table: FM ranks over all datasets
    (1)
    (2)
    (3)
    (1)
    (2)
    (3)
    (1)
    (2)
    (3)

    View Slide

  43. Comparing Multiple Classifiers
    • Comparing multiple classifiers on multiple datasets is not a trivial
    problem
    o Confidence intervals will only allow for the comparison of multiple classifiers on a single
    dataset
    • The rank based Friedman test can determine if classifiers are
    performing equally across multiple dataset [18]
    o Apply ranks to the average of each measure on a dataset
    o Standard deviation of the measure is not used in the Friedman test

    2 =
    12
    + 1

    2

    =1

    + 1 2
    4

    =
    − 1
    2
    − 1 −
    2
    • z-scores can be computed from the ranks in the Friedman test
    o The -level or critical value must be adjusted based on the multiple comparisons being made
    o Bonferroni-Dunn procedure adjusts to − 1 [18]

    , =


    ()
    + 1
    6
    Introduction Approach Experiments Conclusions

    View Slide

  44. Friedman Test Results
    Hypothesis test comparing NIE(fm) [◊] and CDS [●] to other algorithms (only significant improvement is marked)
    • Friedman test rejects the null hypothesis on all figures of merit
    o Good! But which algorithm(s) are performing better/worse than others?
    • Learn++.CDS and Learn++.NIE(fm) provide a significant improvement
    over SERA and UCB
    o UCB lacks rejection in several measures; however, UCB does not offer significant
    improvement over Learn++.CDS or Learn++.NIE
    • Learn++.CDS and Learn++.NIE(fm) offer improvement on several
    measures compared to concept drift algorithms
    Introduction Approach Experiments Conclusions
    L++.NSE SEA SERA UCB
    RCA ● ●
    FM ◊● ◊●
    AUC ◊● ◊● ◊●
    Recall ◊● ◊● ◊
    OPM ◊● ◊● ◊●

    View Slide

  45. Contents
    • Introduction
    • Approach
    • Experiments
    • Conclusions
    Introduction Approach Experiments Conclusions

    View Slide

  46. Conclusions
    • Learn++.NIE(fm) and Learn++.CDS provide significant improvement
    in several figures of merit over concept drift algorithms
    o Boost in recall, AUC and OPM
    o No surprise that Learn++.NSE and SEA have a strong raw classification accuracy
    • Learn++.NIE framework improves a few figures of merit compared
    to SERA and UCB
    o Learn++.NIE improves the F-measure and Learn++.CDS improves F-measure and RCA
    over UCB
    • Existing literature requires access to old data in order to learn
    concept drift from imbalanced data
    o Using old data for training can be detrimental to certain performance metrics
    • UCB: train on all accumulated minority class data
    • SERA: train on a selected subset of accumulated minority class data, which
    is the most similar to the newest minority class distribution
    • Proposed approaches consistently perform well as demonstrated on
    a variety of problems
    Introduction Approach Experiments Conclusions

    View Slide

  47. Future Work
    • Data Stream Mining
    o Learning massive data streams with imbalanced classes
    • The theory of learning harsh environments
    o Less heuristics more statistics**
    • Semi-supervised learning in nonstationary environments
    o How can we best utilize unlabeled data to learn from an unknown source?
    o What SSL theory can be applied to help use learn in nonstationary environments?
    Introduction Approach Experiments Conclusions
    ** Inspired by a recent plenary lecture by Dr. Gavin Brown

    View Slide

  48. Publications
    Publications in Submission
    1. G. Ditzler and R. Polikar. “Incremental Learning of Concept Drift from Streaming Imbalanced Data." IEEE
    Transactions on Knowledge and Data Engineering
    Publications in Press
    1. G. Ditzler and R. Polikar. “Semi-Supervised Learning in Nonstationary Environments." IEEE/INNS
    International Joint Conference on Neural Networks. to appear. 2011.
    2. G. Ditzler and R. Polikar. "Hellinger Distance Based Drift Detection Algorithm." in Proceedings of IEEE
    Symposium on Computational Intelligence in Dynamic and Uncertain Environments. pp. 41-48. 2011.
    3. G. Ditzler, J. Ethridge, R. Polikar, and R. Ramachandran. "Fusion Methods for Boosting Performance of Speaker
    Identification Systems." in Proceedings of the Asia Pacific Conference on Circuits and Systems. pp. 116-119. 2010.
    4. G. Ditzler, R. Polikar, and N. V. Chawla. "An Incremental Learning Algorithm for Non-stationary Environments
    and Imbalanced Data." in Proceedings of the International Conference on Pattern Recognition. pp. 2997-3000. 2010.
    5. J. Ethridge, G. Ditzler, and R. Polikar. "Optimal -SVM Parameter Estimation using Multi-Objective
    Evolutionary Algorithms." in Proceedings of the IEEE Congress on Evolutionary Computing. pp. 3570-3577. 2010.
    6. G. Ditzler, and R. Polikar. "An Incremental Learning Framework for Concept Drift and Class Imbalance."
    in Proceedings of the IEEE/INNS International Joint Conference on Neural Networks. pp. 736-743. 2010.
    7. G. Ditzler, M. Muhlbaier and R. Polikar. "Incremental Learning of New Classes in Unbalanced Data:
    Learn++.UDNC." in International Workshop on Multiple Classifier Systems. Lecture Notes in Computer Science. vol
    5997. pp. 33-42. 2010.
    Introduction Approach Experiments Conclusions

    View Slide

  49. Acknowledgements
    Special thanks go out to Robi Polikar, Shreekanth Mandayam,
    Nancy Tinkham, Loretta Brewer, Ravi Ramachandran, Ryan Elwell,
    James Ethridge, Mike Russell, George Lecakes, Karl Dyer, Metin
    Ahiskali, Richard Calvert, my family, Rowan’s ECE faculty, the
    NSF, the anonymous reviewers of my conference publications, and
    all the other people I forgot to mention
    Introduction Approach Experiments Conclusions

    View Slide

  50. References
    1. R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. John Wiley & Sons, Inc., 2001
    2. S. Grossberg, “Nonlinear neural networks: Principles, mechanisms and architectures,” Neural Networks, vol. 1, no. 1, pp.
    17-61, 1988.
    3. M. Muhlbaier and R. Polikar, “Multiple classifiers based incremental learning algorithm for learning nonstationary
    environments,” in IEEE International Conference on Machine Learning and Cybernetics, 2007, pp. 3618–3623.
    4. R. Elwell, “An ensemble-based computational approach for incremental learning in non-stationary environments related
    to schema and scaffolding-based human learning,” Master’s thesis, Rowan University, 2010.
    5. J. Gao, W. Fan, J. Han, and P. S. Yu, “A general framework for mining concept-drifting data streams with skewed
    distributions,” in SIAM International Conference on Data Mining, 2007, pp. 203–208.
    6. S. Chen and H. He, “SERA: Selectively recursive approach towards nonstationary imbalanced stream data mining,” in
    International Joint Conference on Neural Networks, 2009, pp. 552–529.
    7. S. Chen, H. He, K. Li, and S. Sesai, “MuSERA: Multiple selectively recursive approach towards imbalanced stream data
    mining,” in International Joint Conference on Neural Networks, 2010, pp. 2857–2864.
    8. S. Chen and H. He, “Towards incremental learning of nonstationary imbalanced data streams: a multiple selectively
    recursive approach,” Evolving Systems, in press, 2011.
    9. R. Polikar, L. Udpa, S.S. Udpa and V. Honavar, “Learn++: an incremental learning algorithm for supervised neural
    networks,” IEEE Transactions on Systems, Man and Cybernetics, vol. 31, no. 4, pp. 497–508, 2001.
    10. G. Ditzler, N. V. Chawla, and R. Polikar, “An incremental learning algorithm for nonstationary environments and class
    imbalance,” in International Conference on Pattern Recognition, 2010, pp. 2997–3000.
    11. G. Ditzler and R. Polikar, “An incremental learning framework for concept drift and class imbalance,” in International Joint
    Conference on Neural Networks, 2010, pp. 736–473.
    12. C. Li, “Classifying imbalanced data using a bagging ensemble variation (BEV),” in ACMSE, 2007, pp. 203–208.
    13. N. V. Chawla, A. Lazarevic, L. O. Hall and K. W. Bowyer, “SMOTEBoost: Improving prediction of the minority class in
    boosting,” in 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, 2003, pp. 1–10.
    14. H. Guo and H. L. Viktor, “Learning from imbalanced data sets with boosting and data generation: The Databoost-IM
    approach,” Sigkdd Explorations, vol. 6, no. 1, pp. 30–39, 2004.
    15. S. Chen and H. He, “RAMOBoost: Ranked Minority Oversampling in Boosting,” IEEE Transactions on Neural Networks,
    vol. 21, no. 10, pp. 1624-1642, 2010.
    16. T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, pp. 861–874, 2006.
    17. W. N. Street and Y. Kim, “A streaming ensemble algorithm (SEA) for large scale classification,” in Proceedings to the 7th
    ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2001, pp. 377–382.
    18. J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research, vol. 7, pp. 1–
    30, 2006.
    19. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,”
    Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
    Introduction Approach Experiments Conclusions

    View Slide

  51. Questions

    View Slide