Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Let's play Flink - Fun with streaming applicati...

Let's play Flink - Fun with streaming applications at InnoGames

Let's play Flink - Fun with streaming applications at InnoGames
Presented at: code.talks 2018

Chocolate, ice cream and games are perhaps 3 of the most popular universally understood words that can bring joy to anyone between 5-60 years of age!

At InnoGames we not only have all three of those things but in addition we build up a powerful data infrastructure because it’s expensive to run your business blind. And being able to evaluate key performance indicators fast to make good decisions and deliver personalized and relevant content to each and every gamer is essential to be successful and it is how a customer becomes a fan.

With a revenue of 130 million Euros in 2016, InnoGames is one of the world's leading developers and providers of online games. InnoGames has more than 200 million registered players and has scored major successes with games such as Tribal Wars, Forge of Empires and Elvenar.

Our data infrastructure mainly consists of a data pipeline that covers the streaming part and a data platform to perform batch processing. The latter is based on the Hadoop ecosystem using technologies such as Hive, Spark, Hue, R and more to give our data scientists a high flexibility. There were several evolutions of the data pipeline, starting with Kestrel and custom streaming applications. Later on we switched the base technologies to Apache Kafka and Apache Storm. Last year we recreated our streaming infrastructure based on Apache Flink which is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications.

Volker Janz

October 19, 2018
Tweet

More Decks by Volker Janz

Other Decks in Programming

Transcript

  1. SIMILARITIES THE FIRST IMPRESSION COUNTS The moment the customer enters

    the shop or the player plays his first session is crucial HALO EFFECT When one trait of a person or thing is used to make an overall judgment of that person or thing
  2. STREAM PROCESSING LAKE Real-Time Processing Explained: A Survey of Storm,

    Samza, Spark & Flink Wolfram Wingerath Cinema 6 15:00
  3. TIME IN STREAMING EPISODE I EPISODE II EPISODE III EPISODE

    IV EPISODE V EPISODE VI EPISODE VII EPISODE VIII EPISODE IX 1999 2002 2005 1977 1980 1983 2015 2017 2019 The Phantom Menace Attack of the Clones Revenge of the Sith A New Hope The Empire Strikes Back Return of the Jedi The Force Awakens The Last Jedi ? ORDERED BY EVENT TIME PROCESSING TIME
  4. TIME IN STREAMING EPISODE I EPISODE II EPISODE III EPISODE

    IV EPISODE V EPISODE VI EPISODE VII EPISODE VIII EPISODE IX 1999 2002 2005 1977 1980 1983 2015 2017 2019 The Phantom Menace Attack of the Clones Revenge of the Sith A New Hope The Empire Strikes Back Return of the Jedi The Force Awakens The Last Jedi ? EVENT TIME ORDERED BY PROCESSING TIME
  5. TUMBLING WINDOWS 9 1 3 2 6 8 1 3

    9 8 4 5 9 1 3 2 6 8 1 3 9 8 4 5 15 18 26 SENSOR SUM
  6. SLIDING WINDOWS 9 1 3 2 6 8 1 3

    9 8 4 5 9 1 3 2 6 8 1 3 9 8 4 5 9 1 3 2 6 8 1 3 9 8 4 5 15 18 26 19 21 SENSOR SUM
  7. EXACTLY-ONCE EXACTLY-ONCE IN FLINK Each incoming event affects the final

    result exactly once It does not necessarily mean that each event gets processed only once Achieved with distributed snapshot/state checkpointing
  8. BUILDING BLOCKS SQL / TABLE API DataStream API ProcessFunction APIs

    (dynamic tables) (streams, windows) (events, state, time) HIGH LEVEL ANALYTICS API STREAM AND BATCH DATA PROCESSING STATEFUL EVENT- DRIVEN APPLICATIONS CONCISENESS EXPRESSIVENESS
  9. LET‘S HAVE A CLOSER LOOK final StreamExecutionEnvironment env = getExecutionEnvironment();

    final DataStreamSource<Integer> stream = env.fromElements(1, 2, 3, 4); stream .map((MapFunction<Integer, Integer>) i -> i + 2) .filter((FilterFunction<Integer>) i -> i % 2 == 0) .print(); env.execute(); DATA SOURCE TRANSFORMATION DATA SINK
  10. RUNTIME SOURCE MAP PRINT FILTER STREAMING DATAFLOW (CONDENSED VIEW) OPERATOR

    CHAIN OPERATOR OPERATOR TASK TASK TASK SOURCE MAP PRINT FILTER OPERATOR CHAIN OPERATOR OPERATOR SUBTASK SUBTASK TASK SOURCE MAP FILTER OPERATOR CHAIN OPERATOR SUBTASK SUBTASK STREAM PARTITIONS STREAMING DATAFLOW (PARALLELIZED VIEW)
  11. RUNTIME SOURCE MAP PRINT FILTER OPERATOR CHAIN OPERATOR OPERATOR SUBTASK

    SUBTASK TASK SOURCE MAP FILTER OPERATOR CHAIN OPERATOR SUBTASK SUBTASK STREAM PARTITIONS STREAMING DATAFLOW (PARALLELIZED VIEW) A Flink cluster has a JOB MANAGER and multiple TASK MANAGERS. Each of those is a JVM.
  12. RUNTIME Each Task Manager can manage MULTIPLE THREADS executing TASKS

    / SUBTASKS. SOURCE MAP PRINT FILTER OPERATOR CHAIN OPERATOR OPERATOR THREAD THREAD THREAD SUBTASK SUBTASK TASK SOURCE MAP FILTER OPERATOR CHAIN OPERATOR THREAD THREAD SUBTASK SUBTASK STREAM PARTITIONS STREAMING DATAFLOW (PARALLELIZED VIEW)
  13. CHECKPOINTING checkpoint barrier n checkpoint barrier n-1 checkpoint n+1 checkpoint

    n checkpoint n-1 Consistent, incremental snapshots of distributed data stream and operator state Based on a paper from 1985, inspired by the Chandy-Lamport-Algorithm
  14. STATE OPERATOR STATE KEYED STATE Bound only to an operator

    Bound to an operator and key PLUGGABLE BACKEND MULTIPLE PRIMITIVES SUPPORTED GUARANTEED CONSISTENCY IN CASE OF A FAILURE
  15. COMPANY SNAPSHOT More than 400 employees Founded 2007 in Germany

    Headquarter in Hamburg +160m EUR revenue made in 2017 7 live games >30 language versions
  16. DATA ARCHITECTURE EVENT CLIENT EVENT CLIENT EVENT CLIENT EVENT GATEWAY

    EVENT BUS STREAM PROCESSING DISTRIBUTED DATA STORE DISTRIBUTED BATCH PROCESSING BI
  17. DATA ARCHITECTURE EVENT CLIENT EVENT CLIENT EVENT CLIENT EVENT GATEWAY

    EVENT BUS STREAM PROCESSING DISTRIBUTED DATA STORE DISTRIBUTED BATCH PROCESSING BI
  18. DATA ARCHITECTURE EVENT CLIENT EVENT CLIENT EVENT CLIENT EVENT GATEWAY

    EVENT BUS DISTRIBUTED DATA STORE DISTRIBUTED BATCH PROCESSING BI STREAM PROCESSING
  19. Pattern<StreamEvent, StreamEvent> pattern = Pattern.<StreamEvent>begin("reg").where(new SimpleCondition<StreamEvent>() { @Override public boolean

    filter(StreamEvent event) { return event.getEventName().equals("reg"); } }).followedBy("login").where(new SimpleCondition<StreamEvent>() { @Override public boolean filter(StreamEvent event) { return event.getEventName().equals("login"); } }).within(Time.seconds(60)); Log00.java
  20. USE CASE NTCRM EVENT BUS EVENT CLIENT EVENT GATEWAY PLAYER

    DATA NTCRM React to events with interstitials in less than 10 seconds
  21. USE CASE NTCRM Elvenar has a trading feature that sometimes

    causes confusion. With NTCRM we can react to this and show more details within interstitials exactly when the player needs it.
  22. JUST DO IT DEMO TIME Check it out on Github:

    https://github.com/prenomenon/codetalks-flinkdemo
  23. GET IN TOUCH InnoGames GmbH Friesenstrasse 13 20097 Hamburg http://www.innogames.com

    Volker Janz Senior Software Developer Corporate Systems - Analytics
  24. BACKUP / DETAILS The following slides are not part of

    my talk but might give the reader more insights later