Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Squirrels and Elephants - The InnoGames Big Dat...

Volker Janz
February 05, 2019

Squirrels and Elephants - The InnoGames Big Data and Streaming Infrastructure

Life doesn't happen in batches. You have to process data on time as it happens to make use of the time-value of information.

Apart from a general introduction to stream processing and Apache Flink, this presentation shows how we as a successful data-driven gaming company apply this concept with our data architecture and how we use Apache Flink to run several streaming applications.

Volker Janz

February 05, 2019
Tweet

More Decks by Volker Janz

Other Decks in Technology

Transcript

  1. SIMILARITIES THE FIRST IMPRESSION COUNTS The moment the customer enters

    the shop or the player plays his first session is crucial HALO EFFECT When one trait of a person or thing is used to make an overall judgment of that person or thing
  2. TIME IN STREAMING EPISODE I EPISODE II EPISODE III EPISODE

    IV EPISODE V EPISODE VI EPISODE VII EPISODE VIII EPISODE IX 1999 2002 2005 1977 1980 1983 2015 2017 2019 The Phantom Menace Attack of the Clones Revenge of the Sith A New Hope The Empire Strikes Back Return of the Jedi The Force Awakens The Last Jedi ? ORDERED BY EVENT TIME PROCESSING TIME
  3. TIME IN STREAMING EPISODE I EPISODE II EPISODE III EPISODE

    IV EPISODE V EPISODE VI EPISODE VII EPISODE VIII EPISODE IX 1999 2002 2005 1977 1980 1983 2015 2017 2019 The Phantom Menace Attack of the Clones Revenge of the Sith A New Hope The Empire Strikes Back Return of the Jedi The Force Awakens The Last Jedi ? EVENT TIME ORDERED BY PROCESSING TIME
  4. TUMBLING WINDOWS 9 1 3 2 6 8 1 3

    9 8 4 5 9 1 3 2 6 8 1 3 9 8 4 5 15 18 26 SENSOR SUM
  5. SLIDING WINDOWS 9 1 3 2 6 8 1 3

    9 8 4 5 9 1 3 2 6 8 1 3 9 8 4 5 9 1 3 2 6 8 1 3 9 8 4 5 15 18 26 19 21 SENSOR SUM
  6. BUILDING BLOCKS SQL / TABLE API DataStream API ProcessFunction APIs

    (dynamic tables) (streams, windows) (events, state, time) HIGH LEVEL ANALYTICS API STREAM AND BATCH DATA PROCESSING STATEFUL EVENT- DRIVEN APPLICATIONS CONCISENESS EXPRESSIVENESS
  7. LET‘S HAVE A CLOSER LOOK final StreamExecutionEnvironment env = getExecutionEnvironment();

    final DataStreamSource<Integer> stream = env.fromElements(1, 2, 3, 4); stream .map((MapFunction<Integer, Integer>) i -> i + 2) .filter((FilterFunction<Integer>) i -> i % 2 == 0) .print(); env.execute(); DATA SOURCE TRANSFORMATION DATA SINK
  8. DATA ARCHITECTURE EVENT CLIENT EVENT CLIENT EVENT CLIENT EVENT GATEWAY

    EVENT BUS STREAM PROCESSING DISTRIBUTED DATA STORE DISTRIBUTED BATCH PROCESSING BUSINESS INTELLIGENCE
  9. DATA ARCHITECTURE EVENT CLIENT EVENT CLIENT EVENT CLIENT EVENT GATEWAY

    EVENT BUS STREAM PROCESSING DISTRIBUTED DATA STORE DISTRIBUTED BATCH PROCESSING BUSINESS INTELLIGENCE
  10. DATA ARCHITECTURE EVENT CLIENT EVENT CLIENT EVENT CLIENT EVENT GATEWAY

    EVENT BUS STREAM PROCESSING DISTRIBUTED DATA STORE DISTRIBUTED BATCH PROCESSING BUSINESS INTELLIGENCE STREAM PROCESSING
  11. Pattern<StreamEvent, StreamEvent> pattern = Pattern.<StreamEvent>begin("reg").where(new SimpleCondition<StreamEvent>() { @Override public boolean

    filter(StreamEvent event) { return event.getEventName().equals("reg"); } }).followedBy("login").where(new SimpleCondition<StreamEvent>() { @Override public boolean filter(StreamEvent event) { return event.getEventName().equals("login"); } }).within(Time.seconds(60)); Log00.java
  12. USE CASE NTCRM EVENT BUS EVENT CLIENT EVENT GATEWAY PLAYER

    DATA NTCRM React to events with interstitials in less than 10 seconds
  13. USE CASE NTCRM Elvenar has a trading feature that sometimes

    causes confusion. With NTCRM we can react to this and show more details within interstitials exactly when the player needs it.
  14. JUST DO IT DEMO TIME Check it out on Github:

    https://github.com/prenomenon/codetalks-flinkdemo
  15. GET IN TOUCH InnoGames GmbH Friesenstrasse 13 20097 Hamburg https://www.innogames.com

    Volker Janz Senior Software Developer Corporate Systems - Analytics
  16. NEXT UP EVENT CLIENT EVENT CLIENT EVENT CLIENT EVENT GATEWAY

    EVENT BUS STREAM PROCESSING DISTRIBUTED DATA STORE DISTRIBUTED BATCH PROCESSING BI BUSINESS INTELLIGENCE
  17. BACKUP / DETAILS The following slides are not part of

    my talk but might give the reader more insights later
  18. COMPANY SNAPSHOT More than 400 employees Founded 2007 in Germany

    Headquarter in Hamburg +160m EUR revenue made in 2017 7 live games >30 language versions
  19. RUNTIME SOURCE MAP PRINT FILTER STREAMING DATAFLOW (CONDENSED VIEW) OPERATOR

    CHAIN OPERATOR OPERATOR TASK TASK TASK SOURCE MAP PRINT FILTER OPERATOR CHAIN OPERATOR OPERATOR SUBTASK SUBTASK TASK SOURCE MAP FILTER OPERATOR CHAIN OPERATOR SUBTASK SUBTASK STREAM PARTITIONS STREAMING DATAFLOW (PARALLELIZED VIEW)
  20. RUNTIME SOURCE MAP PRINT FILTER OPERATOR CHAIN OPERATOR OPERATOR SUBTASK

    SUBTASK TASK SOURCE MAP FILTER OPERATOR CHAIN OPERATOR SUBTASK SUBTASK STREAM PARTITIONS STREAMING DATAFLOW (PARALLELIZED VIEW) A Flink cluster has a JOB MANAGER and multiple TASK MANAGERS. Each of those is a JVM.
  21. RUNTIME Each Task Manager can manage MULTIPLE THREADS executing TASKS

    / SUBTASKS. SOURCE MAP PRINT FILTER OPERATOR CHAIN OPERATOR OPERATOR THREAD THREAD THREAD SUBTASK SUBTASK TASK SOURCE MAP FILTER OPERATOR CHAIN OPERATOR THREAD THREAD SUBTASK SUBTASK STREAM PARTITIONS STREAMING DATAFLOW (PARALLELIZED VIEW)
  22. CHECKPOINTING checkpoint barrier n checkpoint barrier n-1 checkpoint n+1 checkpoint

    n checkpoint n-1 Consistent, incremental snapshots of distributed data stream and operator state Based on a paper from 1985, inspired by the Chandy-Lamport-Algorithm
  23. STATE OPERATOR STATE KEYED STATE Bound only to an operator

    Bound to an operator and key PLUGGABLE BACKEND MULTIPLE PRIMITIVES SUPPORTED GUARANTEED CONSISTENCY IN CASE OF A FAILURE