Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Data Stream Mining

Albert Bifet
August 25, 2012

Introduction to Data Stream Mining

Albert Bifet

August 25, 2012
Tweet

More Decks by Albert Bifet

Other Decks in Research

Transcript

  1. Introduction to Data Stream Mining
    Albert Bifet
    March 2012

    View full-size slide

  2. Motivation
    Source: IDC’s Digital Universe Study (EMC), June 2011
    Data is growing

    View full-size slide

  3. Motivation
    Memory unit Size Binary size
    kilobyte (kB/KB) 103 210
    megabyte (MB) 106 220
    gigabyte (GB) 109 230
    terabyte (TB) 1012 240
    petabyte (PB) 1015 250
    exabyte (EB) 1018 260
    zettabyte (ZB) 1021 270
    yottabyte (YB) 1024 280
    Data is growing

    View full-size slide

  4. Motivation
    Source: IDC’s Digital Universe Study (EMC), June 2011
    Data is growing

    View full-size slide

  5. Motivation
    Source: IDC’s Digital Universe Study (EMC), June 2011
    Data is growing

    View full-size slide

  6. Motivation
    Source: IDC’s Digital Universe Study (EMC), June 2011
    Data is growing

    View full-size slide

  7. Streaming Data
    Big Data & Real Time

    View full-size slide

  8. Big Data
    McKinsey Global Institute (MGI) Report on Big Data, 2011.
    Big data refers to datasets whose size is beyond
    the ability of typical database software tools to
    capture, store, manage, and analyze.

    View full-size slide

  9. Big Data
    McKinsey Global Institute (MGI) Report on Big Data, 2011.
    Big data refers to datasets whose size is beyond
    the ability of typical database software tools to
    capture, store, manage, and analyze.

    View full-size slide

  10. Methodology
    Sampling and distributed systems

    View full-size slide

  11. Methodology
    Paolo Boldi
    Big Data does not need big machines,
    it needs big intelligence

    View full-size slide

  12. Real time analytics
    We want to analyze what is happening now.

    View full-size slide

  13. Real time analytics
    We want to analyze what is happening now.

    View full-size slide

  14. Time and Memory
    Number 8 Wire Mentality
    Time and memory are the resource dimensions of
    the process.

    View full-size slide

  15. Time and Memory
    Time and memory are the resource dimensions of
    the process.

    View full-size slide

  16. Algorithms
    Classification, Regression, Clustering, Frequent
    Pattern Mining.

    View full-size slide

  17. Applications
    sensor data: industry, cities
    telecomm data
    social networks: twitter, facebook, yahoo
    marketing: sales business
    Data may come from: humans, sensors, or
    machines.

    View full-size slide

  18. Data Streams
    Big Data & Real Time

    View full-size slide