Treading Water in a Stream of Data

Treading Water in a Stream of Data

A talk given at Big Ruby 2013 on some fundamental concepts of data acquisition.

Cff2d90ae70bbbb5d4865d8412159f85?s=128

Jeremy Hinegardner

March 01, 2013
Tweet

Transcript

  1. Treading Water In a Stream of Data Jeremy Hinegardner @copiousfreetime

    jeremy@copiousfreetime.org Monday, March 4, 13
  2. Data Junkie Monday, March 4, 13

  3. Survey Monday, March 4, 13

  4. Streaming? Monday, March 4, 13

  5. Wikipedia Says ... "A SEQUENCE OF DATA ELEMENTS MADE AVAILABLE

    OVER TIME. ... ALLOWS ITEMS TO BE PROCESSED ONE AT A TIME RATHER THAN IN LARGE BATCHES." Monday, March 4, 13
  6. Big Data == Streaming? Monday, March 4, 13

  7. Big Data Monday, March 4, 13

  8. Wikipedia Says ... A COLLECTION OF DATA SETS SO LARGE

    AND COMPLEX THAT IT BECOMES DIFFICULT TO PROCESS USING ON-HAND DATABASE MANAGEMENT TOOLS OR TRADITIONAL DATA PROCESSING APPLICATIONS. Monday, March 4, 13
  9. A LOT of Data Monday, March 4, 13

  10. Heading towards you FAST! Monday, March 4, 13

  11. All of it needs to be processed Monday, March 4,

    13
  12. Keep it around forever Monday, March 4, 13

  13. Copious’s Definition AN AMOUNT OF DATA AND THE PROCESSING OF

    IT THAT MAKES YOU FEEL UNCOMFORTABLE. Monday, March 4, 13
  14. Wikipedia Says ... A COLLECTION OF DATA SETS SO LARGE

    AND COMPLEX THAT IT BECOMES DIFFICULT TO PROCESS USING ON-HAND DATABASE MANAGEMENT TOOLS OR TRADITIONAL DATA PROCESSING APPLICATIONS. Monday, March 4, 13
  15. This Monday, March 4, 13

  16. Here This Monday, March 4, 13

  17. Here This That Monday, March 4, 13

  18. Here There This That Monday, March 4, 13

  19. Here There This That Other + ⬇ Monday, March 4,

    13
  20. Here There This That Every Where Other + ⬇ Monday,

    March 4, 13
  21. Here There This That Every Where Other + ⬇ $

    Monday, March 4, 13
  22. First things First Monday, March 4, 13

  23. Get This Data Monday, March 4, 13

  24. Or ... Getting the “Sequence of Data Elements” Monday, March

    4, 13
  25. Polling Monday, March 4, 13

  26. Notification / Web Hook Monday, March 4, 13

  27. Payload Monday, March 4, 13

  28. Push Monday, March 4, 13

  29. Poll Notify Payload Push VS. Monday, March 4, 13

  30. My Ideal Monday, March 4, 13

  31. GitHub Events Monday, March 4, 13

  32. GitHub Archive Monday, March 4, 13

  33. Store This Data Monday, March 4, 13

  34. Pre-Storage Processing? Monday, March 4, 13

  35. Physical Location Monday, March 4, 13

  36. Hadoop Monday, March 4, 13

  37. Avro Monday, March 4, 13

  38. Why all this trouble? Monday, March 4, 13

  39. Fundamental Truth Monday, March 4, 13

  40. Future Discovery Monday, March 4, 13

  41. Paranoia Monday, March 4, 13

  42. https://github.com/copiousfreetime/ghent Monday, March 4, 13

  43. Thanks! Jeremy Hinegardner @copiousfreetime jeremy@copiousfreetime.org Monday, March 4, 13

  44. What is Old is New Again Bonus Track!! Monday, March

    4, 13
  45. Monday, March 4, 13

  46. 'NEARLY EVERY LARGE DATASET HAS UNANTICIPATED VALUE WITHIN IT.' 'ULTIMATELY

    YOU CAN'T DISCOVER INTERESTING THINGS WITH YOUR DATA UNLESS YOU CAN ASK ARBITRARY QUESTIONS OF IT' - BIG DATA, NATHAN MARZ (2013) 'THE GRANULAR DATA FOUND IN THE DATA WAREHOUSE IS THE KEY TO REUSABILITY, BECAUSE IT CAN BE USED BY MANY PEOPLE IN DIFFERENT WAYS' 'BUT PERHAPS THE LARGEST BENEFIT OF A DATA WAREHOUSE FOUNDATION IS THAT FUTURE UNKNOWN REQUIREMENTS CAN BE ACCOMMODATED' - BUILDING THE DATA WAREHOUSE, 3RD EDITION, W. H. INMON (2002) Monday, March 4, 13
  47. Real Time ETL “... REFERS TO SOFTWARE THAT MOVES DATA

    ASYNCHRONOUSLY INTO A DATA WAREHOUSE WITH SOME URGENCY -- WITHIN MINUTES OF THE EXECUTION OF THE BUSINESS TRANSACTION” - THE DATA WAREHOUSE ETL TOOLKIT RALPH KIMBALL (2004) Monday, March 4, 13
  48. Big Data ALL NEW ETL and Data Warehousing Monday, March

    4, 13