Slide 1

Slide 1 text

[email protected] www.rittmanmead.com @rittmanmead !1 Visualizing Streams

Slide 2

Slide 2 text

[email protected] www.rittmanmead.com @rittmanmead Francesco Tisiot BI Tech Lead at Rittman Mead Verona, Italy Rittman Mead Blog 10 Years Experience in BI/Analytics [email protected] @FTisiot Oracle ACE !2

Slide 3

Slide 3 text

[email protected] www.rittmanmead.com @rittmanmead About Rittman Mead !3 Rittman Mead is a data and analytics company who specialise in data visualisation, predictive analytics, enterprise reporting and data engineering. We use our skill, experience and know-how to work with organisations across the world to interpret their data. We enable the business, the consumers, the data providers and IT to work towards a common goal, delivering innovative and cost-effective solutions based on our core values of thought leadership, hard work and honesty. We work across multiple verticals on projects that range from mature, large scale implementations to proofs of concept and can provide skills in development, architecture, delivery, training and support.

Slide 4

Slide 4 text

[email protected] www.rittmanmead.com @rittmanmead !4 Visualizing Streams

Slide 5

Slide 5 text

[email protected] www.rittmanmead.com @rittmanmead Let Me Know My Audience !5 Who Likes + ? +

Slide 6

Slide 6 text

[email protected] www.rittmanmead.com @rittmanmead Let Me Know My Audience !6 Who Likes + ? +

Slide 7

Slide 7 text

[email protected] www.rittmanmead.com @rittmanmead !7 The Good Old Days… Photo by Bruno Martins on Unsplash

Slide 8

Slide 8 text

[email protected] www.rittmanmead.com @rittmanmead !8 I Need This “Precise” KPI! Ok! Let’s Create the Model and ETL the Data! Photo by Cristina Gottardi on Unsplash

Slide 9

Slide 9 text

[email protected] www.rittmanmead.com @rittmanmead !9 Predefined KPIs • Structured Reporting Database Batch Processing • Overnight Load

Slide 10

Slide 10 text

[email protected] www.rittmanmead.com @rittmanmead !10 Self Service Analytics Photo by Dominik Scythe on Unsplash

Slide 11

Slide 11 text

[email protected] www.rittmanmead.com @rittmanmead !11 Python R Data Scientist Photo by Lucas Vasques on Unsplash

Slide 12

Slide 12 text

[email protected] www.rittmanmead.com @rittmanmead !12 Visualize Business Analyst Extract Calculate Photo by Craig Garner on Unsplash

Slide 13

Slide 13 text

[email protected] www.rittmanmead.com @rittmanmead !13 IT Driven Organised Pre-Defined OBIEE Photo by Tiago Muraro on Unsplash

Slide 14

Slide 14 text

[email protected] www.rittmanmead.com @rittmanmead Business Intelligence Tools !14 Oracle Analytics

Slide 15

Slide 15 text

[email protected] www.rittmanmead.com @rittmanmead !15 Business Driven Data Discovery No Prebuilt Model Data Visualization Access To Raw Data Photo by Samuel Zeller on Unsplash

Slide 16

Slide 16 text

[email protected] www.rittmanmead.com @rittmanmead Data Visualization !16 • Information Exploration and Discovery - Single Panel Analytics - Data Mashup - Integrated with OBIEE - DataFlow Component

Slide 17

Slide 17 text

[email protected] www.rittmanmead.com @rittmanmead DataFlow Component !17 • Transform/Enrich Data - Filter - Aggregate - Join - Store Locally or Push Back - V4 Release • Machine Learning • Essbase Cube

Slide 18

Slide 18 text

[email protected] www.rittmanmead.com @rittmanmead !18 Predefined KPIs • Structured Reporting Database Batch Processing • Overnight Load

Slide 19

Slide 19 text

[email protected] www.rittmanmead.com @rittmanmead !19 Volume Velocity Variety $$$ BIG DATA

Slide 20

Slide 20 text

[email protected] www.rittmanmead.com @rittmanmead !20

Slide 21

Slide 21 text

[email protected] www.rittmanmead.com @rittmanmead !21 Predefined KPIs • Structured Reporting Database Batch Processing • Overnight Load

Slide 22

Slide 22 text

[email protected] www.rittmanmead.com @rittmanmead !22 Real Time Analytics Photo by Genessa Panainte on Unsplash Batch vs Stream

Slide 23

Slide 23 text

[email protected] www.rittmanmead.com @rittmanmead !23 Any Source Any Target Open Formats Scalable

Slide 24

Slide 24 text

[email protected] www.rittmanmead.com @rittmanmead !24 https://www.confluent.io/product/confluent-platform/ Data Hub

Slide 25

Slide 25 text

[email protected] www.rittmanmead.com @rittmanmead !25 https://www.confluent.io/product/confluent-platform/

Slide 26

Slide 26 text

[email protected] www.rittmanmead.com @rittmanmead !26 Hub!

Slide 27

Slide 27 text

[email protected] www.rittmanmead.com @rittmanmead !27 https://www.confluent.io/product/confluent-platform/ Client Library

Slide 28

Slide 28 text

[email protected] www.rittmanmead.com @rittmanmead !28 https://www.confluent.io/product/confluent-platform/ SQL!

Slide 29

Slide 29 text

[email protected] www.rittmanmead.com @rittmanmead !29

Slide 30

Slide 30 text

[email protected] www.rittmanmead.com @rittmanmead !30

Slide 31

Slide 31 text

[email protected] www.rittmanmead.com @rittmanmead !31 Currently Limited SQL functions Can be Extended with UDF! GA Since March 2018! Enhancements expected KSQL

Slide 32

Slide 32 text

[email protected] www.rittmanmead.com @rittmanmead !32 Sources Targets Transformations Kafka ?

Slide 33

Slide 33 text

[email protected] www.rittmanmead.com @rittmanmead !33 Stream

Slide 34

Slide 34 text

[email protected] www.rittmanmead.com @rittmanmead !34 Time Series Visualization

Slide 35

Slide 35 text

[email protected] www.rittmanmead.com @rittmanmead !35 https://www.rittmanmead.com/blog/2017/11/taking-ksql-for-a-spin-using-real-time-device-data/

Slide 36

Slide 36 text

[email protected] www.rittmanmead.com @rittmanmead !36 Sources Targets Transformations Kafka

Slide 37

Slide 37 text

[email protected] www.rittmanmead.com @rittmanmead Photo by Alexandre Debiève on Unsplash !37 How do I Visualise the Data in Kafka?

Slide 38

Slide 38 text

[email protected] www.rittmanmead.com @rittmanmead !38 KSQL Kafka Consumer

Slide 39

Slide 39 text

[email protected] www.rittmanmead.com @rittmanmead !39 KSQL Limited set of functions Can’t be called from “outside” Kafka (Need special adapter)

Slide 40

Slide 40 text

[email protected] www.rittmanmead.com @rittmanmead !40

Slide 41

Slide 41 text

[email protected] www.rittmanmead.com @rittmanmead !41 https://www.rittmanmead.com/blog/2017/04/sql-on-hadoop-impala-vs-drill/ Big Data Traditional BI Tools SQL-on-Hadoop ODBC JDBC

Slide 42

Slide 42 text

[email protected] www.rittmanmead.com @rittmanmead !42

Slide 43

Slide 43 text

[email protected] www.rittmanmead.com @rittmanmead !43 SQL-on-(almost)Everything

Slide 44

Slide 44 text

[email protected] www.rittmanmead.com @rittmanmead !44 https://www.rittmanmead.com/blog/2017/07/analyzing-wimbledon-twitter-feeds-in-real-time-with-kafka-presto-and-oracle-dvd-v3/

Slide 45

Slide 45 text

[email protected] www.rittmanmead.com @rittmanmead !45 Static List of Streams No AVRO Support Limitations

Slide 46

Slide 46 text

[email protected] www.rittmanmead.com @rittmanmead !46 Dynamic List of Streams No AVRO Support

Slide 47

Slide 47 text

[email protected] www.rittmanmead.com @rittmanmead !47 Define a KSQL Stream as JSON Format CREATE STREAM STREAM_NAME WITH ( VALUE_FORMAT=‘JSON’ ) AS SELECT … FROM …;

Slide 48

Slide 48 text

[email protected] www.rittmanmead.com @rittmanmead !48 Use Kafka Show Tables

Slide 49

Slide 49 text

[email protected] www.rittmanmead.com @rittmanmead !49 Query Data

Slide 50

Slide 50 text

[email protected] www.rittmanmead.com @rittmanmead !50 KSQL vs SQL-on-Hadoop Continuous Query Static Query Data Resides in Kafka External (Static) Tables can be created Limited SQL Rich set of SQL Functions (Views can be created) Can be Accessed via ODBC No JDBC/ODBC access officially Supported Examining Data Streams and Stream Processing Ad-hoc Random Access

Slide 51

Slide 51 text

[email protected] www.rittmanmead.com @rittmanmead !51 Latest Apache Drill Release - Kafka Enhancements Filter Pushdown •PartitionId •MsgOffset •MsgTimestamp

Slide 52

Slide 52 text

[email protected] www.rittmanmead.com @rittmanmead Photo by Alexandre Debiève on Unsplash !52 Why Should I use DV to Visualize Streams?

Slide 53

Slide 53 text

[email protected] www.rittmanmead.com @rittmanmead !53 It’s Data! Self Service Analytics Real Time Insights Mash-up Test and Apply ML

Slide 54

Slide 54 text

[email protected] www.rittmanmead.com @rittmanmead Photo by Alexandre Debiève on Unsplash !54 Does DV Provide a Kafka Connection?

Slide 55

Slide 55 text

[email protected] www.rittmanmead.com @rittmanmead !55 Photo by Gemma Evans on Unsplash But We Can Use Drill!

Slide 56

Slide 56 text

[email protected] www.rittmanmead.com @rittmanmead !56 #obihackers Kafka Connect DVD Example

Slide 57

Slide 57 text

[email protected] www.rittmanmead.com @rittmanmead !57 Use the Tools for What they are Good at! •Stream Keys and Timestamps •Stream in JSON Format •Aggregated Tables •Complex SQL Functions •Combining Data •Ranking/Ordering •Visualize Data •Mashup •Machine Learning/Advanced Analytics

Slide 58

Slide 58 text

[email protected] www.rittmanmead.com @rittmanmead !58 DVD and Drill • Install MapR ODBC Driver • Create Connection • Select Storage • Select Table • In case of Errors replace “…” with `…` "dfs.tmp".localkafka `dfs.tmp`.localkafka

Slide 59

Slide 59 text

[email protected] www.rittmanmead.com @rittmanmead !59 DVD and Drill • Remove Caching • No Data Flows • Logical SQL transformations

Slide 60

Slide 60 text

[email protected] www.rittmanmead.com @rittmanmead !60

Slide 61

Slide 61 text

[email protected] www.rittmanmead.com @rittmanmead !61 Limits • Creation of a Kafka Consumer for each Query • Reads all Data from Topic (now with pushdown filters) • JSON format (until AVRO support) Not Suitable for Massive Dashboard Style Reporting

Slide 62

Slide 62 text

[email protected] www.rittmanmead.com @rittmanmead !62 Suggestion …Use the Tools for What they are Good at! Kafka Connect • Data/Insights Discovery • Self Service Analytics • Data Mashup • Small Datasets • Prototype Building • Monitoring/Alerting • Dashboards • Massive Datasets • Consolidated KPIs • Complex Calculations

Slide 63

Slide 63 text

[email protected] www.rittmanmead.com @rittmanmead !63 Photo by Jason Blackeye on Unsplash #Cloud

Slide 64

Slide 64 text

[email protected] www.rittmanmead.com @rittmanmead !64

Slide 65

Slide 65 text

[email protected] www.rittmanmead.com @rittmanmead !65 Final Suggestions….

Slide 66

Slide 66 text

[email protected] www.rittmanmead.com @rittmanmead !66 • Kafka ‣ Separate Topics for direct Reporting ‣ Include Discovery back to Kafka • Drill ‣ Prototype and Data Mashup • DVD ‣ Visualizations and Personal Data Mashup • Standardised Reporting ‣ Kafka Connect Sink and “Old Days” Tools Modern Analytical Platform Photo by Drew Patrick Miller on Unsplash

Slide 67

Slide 67 text

[email protected] www.rittmanmead.com @rittmanmead !67 Life Is Too Short to Use the WRONG Technology

Slide 68

Slide 68 text

[email protected] www.rittmanmead.com @rittmanmead !68 Visualizing Streams