Slide 1

Slide 1 text

SQUIRRELS AND ELEPHANTS Big Data and Streaming at InnoGames

Slide 2

Slide 2 text

LET‘S GO ON A ROADTRIP

Slide 3

Slide 3 text

IMAGINE You are driving with your family on the backseat

Slide 4

Slide 4 text

IMAGINE There is a lot of traffic, you have to concentrate

Slide 5

Slide 5 text

IMAGINE And now, while driving, you are closing your eyes

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

HOW DO YOU FEEL?

Slide 8

Slide 8 text

WOULD YOU EVER DO THAT? NOPE

Slide 9

Slide 9 text

METAPHOR The car is your company, team or project The passengers are your colleagues

Slide 10

Slide 10 text

WOULD YOU EVER DO THAT? NOPE IN A METAPHORICAL SENSE

Slide 11

Slide 11 text

YOU HAVE TO PROCESS DATA ON TIME AS IT HAPPENS

Slide 12

Slide 12 text

BATCH PROCESSING… …might cause accidents Because…

Slide 13

Slide 13 text

LIFE DOESN’T HAPPEN IN BATCHES https://mapr.com/ebooks/streaming-architecture/chapter-01-why-event-streaming.html © Ellen Friedman, Ted Dunning

Slide 14

Slide 14 text

ICE CREAM AND GAMING?

Slide 15

Slide 15 text

SIMILARITIES MAKE US HAPPY

Slide 16

Slide 16 text

SIMILARITIES WHICH TASTES BETTER?

Slide 17

Slide 17 text

SIMILARITIES WHICH TASTES BETTER?

Slide 18

Slide 18 text

SIMILARITIES THE FIRST IMPRESSION COUNTS The moment the customer enters the shop or the player plays his first session is crucial HALO EFFECT When one trait of a person or thing is used to make an overall judgment of that person or thing

Slide 19

Slide 19 text

IN ORDER TO MAKE A POSITIVE IMPACT A RESPONSE NEEDS TO HAPPEN QUICKLY

Slide 20

Slide 20 text

TIME-VALUE OF INFORMATION

Slide 21

Slide 21 text

REAL-TIME USER REPORTS TRAFFIC GAS PRICE SPEED TRAPS

Slide 22

Slide 22 text

RESPOND TO LIFE AS IT HAPPENS

Slide 23

Slide 23 text

STREAM PROCESSING STREAMS OF DATA GPS DATA WEB INTERACTION SENSOR DATA

Slide 24

Slide 24 text

STREAM PROCESSING PROCESSING DATA IN MOTION

Slide 25

Slide 25 text

STREAM PROCESSING YOUR CODE SOURCE SINK OPERATOR

Slide 26

Slide 26 text

STREAM PROCESSING LAKE

Slide 27

Slide 27 text

STREAM PROCESSING LAKE

Slide 28

Slide 28 text

APACHE FLINK

Slide 29

Slide 29 text

APACHE FLINK Framework and distributed process engine for stateful computations on unbounded and bounded data streams

Slide 30

Slide 30 text

STREAMS TIME WINDOWS

Slide 31

Slide 31 text

EVERYTHING IS A STREAM UNBOUNDED STREAMS BOUNDED STREAMS

Slide 32

Slide 32 text

EVERYTHING IS A STREAM UNBOUNDED STREAMS BOUNDED STREAMS AKA BATCH PROCESSING

Slide 33

Slide 33 text

TIME IN STREAMING EPISODE I EPISODE II EPISODE III EPISODE IV EPISODE V EPISODE VI EPISODE VII EPISODE VIII EPISODE IX 1999 2002 2005 1977 1980 1983 2015 2017 2019 The Phantom Menace Attack of the Clones Revenge of the Sith A New Hope The Empire Strikes Back Return of the Jedi The Force Awakens The Last Jedi ? ORDERED BY EVENT TIME PROCESSING TIME

Slide 34

Slide 34 text

TIME IN STREAMING EPISODE I EPISODE II EPISODE III EPISODE IV EPISODE V EPISODE VI EPISODE VII EPISODE VIII EPISODE IX 1999 2002 2005 1977 1980 1983 2015 2017 2019 The Phantom Menace Attack of the Clones Revenge of the Sith A New Hope The Empire Strikes Back Return of the Jedi The Force Awakens The Last Jedi ? EVENT TIME ORDERED BY PROCESSING TIME

Slide 35

Slide 35 text

TUMBLING WINDOWS 9 1 3 2 6 8 1 3 9 8 4 5 9 1 3 2 6 8 1 3 9 8 4 5 15 18 26 SENSOR SUM

Slide 36

Slide 36 text

SLIDING WINDOWS 9 1 3 2 6 8 1 3 9 8 4 5 9 1 3 2 6 8 1 3 9 8 4 5 9 1 3 2 6 8 1 3 9 8 4 5 15 18 26 19 21 SENSOR SUM

Slide 37

Slide 37 text

BUILDING BLOCKS DATA SOURCE TRANSFORMATION DATA SINK

Slide 38

Slide 38 text

API BUILDING BLOCKS DATA SOURCE TRANSFORMATION DATA SINK

Slide 39

Slide 39 text

BUILDING BLOCKS SQL / TABLE API DataStream API ProcessFunction APIs (dynamic tables) (streams, windows) (events, state, time) HIGH LEVEL ANALYTICS API STREAM AND BATCH DATA PROCESSING STATEFUL EVENT- DRIVEN APPLICATIONS CONCISENESS EXPRESSIVENESS

Slide 40

Slide 40 text

LET‘S HAVE A CLOSER LOOK

Slide 41

Slide 41 text

LET‘S HAVE A CLOSER LOOK final StreamExecutionEnvironment env = getExecutionEnvironment(); final DataStreamSource stream = env.fromElements(1, 2, 3, 4); stream .map((MapFunction) i -> i + 2) .filter((FilterFunction) i -> i % 2 == 0) .print(); env.execute(); DATA SOURCE TRANSFORMATION DATA SINK

Slide 42

Slide 42 text

RUNTIME YOUR FLINK APP FLINK RUNTIME D E P LOY

Slide 43

Slide 43 text

RUNTIME

Slide 44

Slide 44 text

BIG DATA AND STREAMING AT INNOGAMES

Slide 45

Slide 45 text

TEAM

Slide 46

Slide 46 text

TEAM BUSINESS INTELLIGENCE DATA ENGINEERING DATA SCIENCE OPERATIONS

Slide 47

Slide 47 text

EVENT TRACKING quest build fight invite

Slide 48

Slide 48 text

EVENT TRACKING 1.500.000.000 EVENTS PER DAY

Slide 49

Slide 49 text

DATA ARCHITECTURE DATA PIPELINE DATA PLATFORM milliseconds, seconds, minutes hours, days, years

Slide 50

Slide 50 text

DATA ARCHITECTURE SQUIRREL ELEPHANT

Slide 51

Slide 51 text

DATA ARCHITECTURE EVENT CLIENT EVENT CLIENT EVENT CLIENT EVENT GATEWAY EVENT BUS STREAM PROCESSING DISTRIBUTED DATA STORE DISTRIBUTED BATCH PROCESSING BUSINESS INTELLIGENCE

Slide 52

Slide 52 text

DATA ARCHITECTURE EVENT CLIENT EVENT CLIENT EVENT CLIENT EVENT GATEWAY EVENT BUS STREAM PROCESSING DISTRIBUTED DATA STORE DISTRIBUTED BATCH PROCESSING BUSINESS INTELLIGENCE

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

DATA ARCHITECTURE EVENT CLIENT EVENT CLIENT EVENT CLIENT EVENT GATEWAY EVENT BUS STREAM PROCESSING DISTRIBUTED DATA STORE DISTRIBUTED BATCH PROCESSING BUSINESS INTELLIGENCE STREAM PROCESSING

Slide 55

Slide 55 text

USE CASE EVENT METRICS

Slide 56

Slide 56 text

Metrics.java stream .map(streamEvent -> new Tuple2<>(streamEvent.getEventName(), 1)) .keyBy(0) .timeWindow(Time.minutes(1)) .sum(1) .addSink(graphiteSink).setParallelism(1).name("event_counts");

Slide 57

Slide 57 text

USE CASE EVENT METRICS

Slide 58

Slide 58 text

USE CASE LOG00 MONITOR

Slide 59

Slide 59 text

KeyedStream stream = events .filter(event -> Arrays.asList("reg", "login").contains(event.getEventName())) .keyBy((KeySelector) StreamEvent::getPlayerId); Log00.java

Slide 60

Slide 60 text

Pattern pattern = Pattern.begin("reg").where(new SimpleCondition() { @Override public boolean filter(StreamEvent event) { return event.getEventName().equals("reg"); } }).followedBy("login").where(new SimpleCondition() { @Override public boolean filter(StreamEvent event) { return event.getEventName().equals("login"); } }).within(Time.seconds(60)); Log00.java

Slide 61

Slide 61 text

PatternStream patternStream = CEP.pattern(stream, pattern); DataStream> patternResultStream = patternStream.select( (p, ts) -> sendTimeoutToGraphite(p, ts), p -> sendSuccessToGraphite(p) ); Log00.java

Slide 62

Slide 62 text

USE CASE LOG00 MONITOR

Slide 63

Slide 63 text

USE CASE NEAR TIME CRM (NTCRM)

Slide 64

Slide 64 text

USE CASE NTCRM EVENT BUS EVENT CLIENT EVENT GATEWAY PLAYER DATA NTCRM React to events with interstitials in less than 10 seconds

Slide 65

Slide 65 text

USE CASE NTCRM Elvenar has a trading feature that sometimes causes confusion. With NTCRM we can react to this and show more details within interstitials exactly when the player needs it.

Slide 66

Slide 66 text

JUST DO IT DEMO TIME Check it out on Github: https://github.com/prenomenon/codetalks-flinkdemo

Slide 67

Slide 67 text

GET IN TOUCH InnoGames GmbH Friesenstrasse 13 20097 Hamburg https://www.innogames.com Volker Janz Senior Software Developer Corporate Systems - Analytics

Slide 68

Slide 68 text

GET IN TOUCH @prenomenon feedback appreciated

Slide 69

Slide 69 text

LIFE DOESN’T HAPPEN IN BATCHES

Slide 70

Slide 70 text

EAT ICE CREAM AND STREAM ON Great Flink training: http://training.data-artisans.com

Slide 71

Slide 71 text

NEXT UP EVENT CLIENT EVENT CLIENT EVENT CLIENT EVENT GATEWAY EVENT BUS STREAM PROCESSING DISTRIBUTED DATA STORE DISTRIBUTED BATCH PROCESSING BI BUSINESS INTELLIGENCE

Slide 72

Slide 72 text

THAT’S IT FOR NOW…

Slide 73

Slide 73 text

BACKUP / DETAILS The following slides are not part of my talk but might give the reader more insights later

Slide 74

Slide 74 text

COMPANY SNAPSHOT More than 400 employees Founded 2007 in Germany Headquarter in Hamburg +160m EUR revenue made in 2017 7 live games >30 language versions

Slide 75

Slide 75 text

I AM LEGEND OUR PORTFOLIO Simulation Strategy RPG Browser Multi-device Mobile

Slide 76

Slide 76 text

SQUIRREL TESTS org.apache.flink flink-test-utils_2.11 1.6.1

Slide 77

Slide 77 text

WINDOWING KEYED NON-KEYED TASK 1 TASK N SOURCE TASK 1 SOURCE KEY 1 KEY N ALL DATA

Slide 78

Slide 78 text

STATE SOURCE MAP DATA SINK SOURCE MAP SUM(C,D) OFFSET OFFSET SUM(A,B) AB CD

Slide 79

Slide 79 text

RABBIT HOLE

Slide 80

Slide 80 text

RUNTIME SOURCE MAP PRINT FILTER STREAMING DATAFLOW (CONDENSED VIEW) OPERATOR CHAIN OPERATOR OPERATOR TASK TASK TASK SOURCE MAP PRINT FILTER OPERATOR CHAIN OPERATOR OPERATOR SUBTASK SUBTASK TASK SOURCE MAP FILTER OPERATOR CHAIN OPERATOR SUBTASK SUBTASK STREAM PARTITIONS STREAMING DATAFLOW (PARALLELIZED VIEW)

Slide 81

Slide 81 text

RUNTIME SOURCE MAP PRINT FILTER OPERATOR CHAIN OPERATOR OPERATOR SUBTASK SUBTASK TASK SOURCE MAP FILTER OPERATOR CHAIN OPERATOR SUBTASK SUBTASK STREAM PARTITIONS STREAMING DATAFLOW (PARALLELIZED VIEW) A Flink cluster has a JOB MANAGER and multiple TASK MANAGERS. Each of those is a JVM.

Slide 82

Slide 82 text

RUNTIME Each Task Manager can manage MULTIPLE THREADS executing TASKS / SUBTASKS. SOURCE MAP PRINT FILTER OPERATOR CHAIN OPERATOR OPERATOR THREAD THREAD THREAD SUBTASK SUBTASK TASK SOURCE MAP FILTER OPERATOR CHAIN OPERATOR THREAD THREAD SUBTASK SUBTASK STREAM PARTITIONS STREAMING DATAFLOW (PARALLELIZED VIEW)

Slide 83

Slide 83 text

CHECKPOINTING checkpoint barrier n checkpoint barrier n-1 checkpoint n+1 checkpoint n checkpoint n-1 Consistent, incremental snapshots of distributed data stream and operator state Based on a paper from 1985, inspired by the Chandy-Lamport-Algorithm

Slide 84

Slide 84 text

STATE OPERATOR STATE KEYED STATE Bound only to an operator Bound to an operator and key PLUGGABLE BACKEND MULTIPLE PRIMITIVES SUPPORTED GUARANTEED CONSISTENCY IN CASE OF A FAILURE