Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Intro StreamProcessing.be Meetup #1

Intro StreamProcessing.be Meetup #1

* Introduction to first StreamProcessing.be meetup
* Why Stream Processing

Peter Vandenabeele

May 27, 2015
Tweet

More Decks by Peter Vandenabeele

Other Decks in Technology

Transcript

  1. Agenda 15’ Intro (Peter) 35’ Azure Stream Analytics and ML

    (Jan) 5’ short break 35’ Google Cloud DataFlow (Alex) 35’ Amazon AWS ML (Nils)
  2. Many thanks to Microsoft Belux Jan, Alex, Nils @maasg, @svendfx

    BigData.be, DataScience.be, AWS Belgium you !
  3. Next StreamProcessing.be Meetup Thu, June 25, 2015, near Mechelen station

    (looking for a location +/- 50 ppl) • Introduction to Apache Kafka (Svend) • Akka Streams and Kinesis (Peter) • Understanding Spark Streaming (Gerard)
  4. whoami : Peter Vandenabeele @peter_v All Things Data (my consultancy)

    current clients: Real Impact Analytics Telecom Analytics (emerging markets) “Green” start-up (stealth mode) IoT project (see next Meetup)
  5. E.g. collaborative research (2013) UniProt (180 GB) monthly update consumer

    update cost ≅ freq (1/month) * size (180 GB) * # consumers (5) fetch + load + index FULL data set
  6. solution: Stream of updates (CDC) Users table continuous updates consumer

    update cost ≅ Rate of Change (10% / month) * size * # consumers fetch + load ONLY updates stream 3M entries 300k updates/month (independent of consumer update frequency)
  7. Why Stream Processing ? Real-time * Big Data * Distributed

    processing (“many collaborators”)
  8. Stream becomes the “master data” • see stream as the

    master data (not the DB) • allows real-time, distributed processing • allows unification between: ◦ operational teams ◦ analytics teams ◦ security, ... • e.g. Kafka at LinkedIn (Kappa architecture)
  9. Kafka (LinkedIn) : Jay Kreps source: Jay Kreps on slideshare

    “I ♥ Log” Real-time Data and Apache Kafka
  10. Why Stream Processing ? Peter : real-time * (big data

    * distributed proc.) Nathan Marz : recovery from human error + ... Jay Kreps : organizational scalability + ... Martin Kleppmann : data agility + … YOU : ??? let’s discuss at beer ...
  11. Speakers for today • Jan Tielens (Microsoft) @jantielens • Alex

    Van Boxel (Vente-Exclusive.com) @alexvb • Nils De Moor (Woorank) @ndemoor