Slide 1

Slide 1 text

Treading Water In a Stream of Data Jeremy Hinegardner @copiousfreetime [email protected] Monday, March 4, 13

Slide 2

Slide 2 text

Data Junkie Monday, March 4, 13

Slide 3

Slide 3 text

Survey Monday, March 4, 13

Slide 4

Slide 4 text

Streaming? Monday, March 4, 13

Slide 5

Slide 5 text

Wikipedia Says ... "A SEQUENCE OF DATA ELEMENTS MADE AVAILABLE OVER TIME. ... ALLOWS ITEMS TO BE PROCESSED ONE AT A TIME RATHER THAN IN LARGE BATCHES." Monday, March 4, 13

Slide 6

Slide 6 text

Big Data == Streaming? Monday, March 4, 13

Slide 7

Slide 7 text

Big Data Monday, March 4, 13

Slide 8

Slide 8 text

Wikipedia Says ... A COLLECTION OF DATA SETS SO LARGE AND COMPLEX THAT IT BECOMES DIFFICULT TO PROCESS USING ON-HAND DATABASE MANAGEMENT TOOLS OR TRADITIONAL DATA PROCESSING APPLICATIONS. Monday, March 4, 13

Slide 9

Slide 9 text

A LOT of Data Monday, March 4, 13

Slide 10

Slide 10 text

Heading towards you FAST! Monday, March 4, 13

Slide 11

Slide 11 text

All of it needs to be processed Monday, March 4, 13

Slide 12

Slide 12 text

Keep it around forever Monday, March 4, 13

Slide 13

Slide 13 text

Copious’s Definition AN AMOUNT OF DATA AND THE PROCESSING OF IT THAT MAKES YOU FEEL UNCOMFORTABLE. Monday, March 4, 13

Slide 14

Slide 14 text

Wikipedia Says ... A COLLECTION OF DATA SETS SO LARGE AND COMPLEX THAT IT BECOMES DIFFICULT TO PROCESS USING ON-HAND DATABASE MANAGEMENT TOOLS OR TRADITIONAL DATA PROCESSING APPLICATIONS. Monday, March 4, 13

Slide 15

Slide 15 text

This Monday, March 4, 13

Slide 16

Slide 16 text

Here This Monday, March 4, 13

Slide 17

Slide 17 text

Here This That Monday, March 4, 13

Slide 18

Slide 18 text

Here There This That Monday, March 4, 13

Slide 19

Slide 19 text

Here There This That Other + ⬇ Monday, March 4, 13

Slide 20

Slide 20 text

Here There This That Every Where Other + ⬇ Monday, March 4, 13

Slide 21

Slide 21 text

Here There This That Every Where Other + ⬇ $ Monday, March 4, 13

Slide 22

Slide 22 text

First things First Monday, March 4, 13

Slide 23

Slide 23 text

Get This Data Monday, March 4, 13

Slide 24

Slide 24 text

Or ... Getting the “Sequence of Data Elements” Monday, March 4, 13

Slide 25

Slide 25 text

Polling Monday, March 4, 13

Slide 26

Slide 26 text

Notification / Web Hook Monday, March 4, 13

Slide 27

Slide 27 text

Payload Monday, March 4, 13

Slide 28

Slide 28 text

Push Monday, March 4, 13

Slide 29

Slide 29 text

Poll Notify Payload Push VS. Monday, March 4, 13

Slide 30

Slide 30 text

My Ideal Monday, March 4, 13

Slide 31

Slide 31 text

GitHub Events Monday, March 4, 13

Slide 32

Slide 32 text

GitHub Archive Monday, March 4, 13

Slide 33

Slide 33 text

Store This Data Monday, March 4, 13

Slide 34

Slide 34 text

Pre-Storage Processing? Monday, March 4, 13

Slide 35

Slide 35 text

Physical Location Monday, March 4, 13

Slide 36

Slide 36 text

Hadoop Monday, March 4, 13

Slide 37

Slide 37 text

Avro Monday, March 4, 13

Slide 38

Slide 38 text

Why all this trouble? Monday, March 4, 13

Slide 39

Slide 39 text

Fundamental Truth Monday, March 4, 13

Slide 40

Slide 40 text

Future Discovery Monday, March 4, 13

Slide 41

Slide 41 text

Paranoia Monday, March 4, 13

Slide 42

Slide 42 text

https://github.com/copiousfreetime/ghent Monday, March 4, 13

Slide 43

Slide 43 text

Thanks! Jeremy Hinegardner @copiousfreetime [email protected] Monday, March 4, 13

Slide 44

Slide 44 text

What is Old is New Again Bonus Track!! Monday, March 4, 13

Slide 45

Slide 45 text

Monday, March 4, 13

Slide 46

Slide 46 text

'NEARLY EVERY LARGE DATASET HAS UNANTICIPATED VALUE WITHIN IT.' 'ULTIMATELY YOU CAN'T DISCOVER INTERESTING THINGS WITH YOUR DATA UNLESS YOU CAN ASK ARBITRARY QUESTIONS OF IT' - BIG DATA, NATHAN MARZ (2013) 'THE GRANULAR DATA FOUND IN THE DATA WAREHOUSE IS THE KEY TO REUSABILITY, BECAUSE IT CAN BE USED BY MANY PEOPLE IN DIFFERENT WAYS' 'BUT PERHAPS THE LARGEST BENEFIT OF A DATA WAREHOUSE FOUNDATION IS THAT FUTURE UNKNOWN REQUIREMENTS CAN BE ACCOMMODATED' - BUILDING THE DATA WAREHOUSE, 3RD EDITION, W. H. INMON (2002) Monday, March 4, 13

Slide 47

Slide 47 text

Real Time ETL “... REFERS TO SOFTWARE THAT MOVES DATA ASYNCHRONOUSLY INTO A DATA WAREHOUSE WITH SOME URGENCY -- WITHIN MINUTES OF THE EXECUTION OF THE BUSINESS TRANSACTION” - THE DATA WAREHOUSE ETL TOOLKIT RALPH KIMBALL (2004) Monday, March 4, 13

Slide 48

Slide 48 text

Big Data ALL NEW ETL and Data Warehousing Monday, March 4, 13