Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Treading Water in a Stream of Data

Treading Water in a Stream of Data

A talk given at Big Ruby 2013 on some fundamental concepts of data acquisition.

Jeremy Hinegardner

March 01, 2013
Tweet

More Decks by Jeremy Hinegardner

Other Decks in Technology

Transcript

  1. Wikipedia Says ... "A SEQUENCE OF DATA ELEMENTS MADE AVAILABLE

    OVER TIME. ... ALLOWS ITEMS TO BE PROCESSED ONE AT A TIME RATHER THAN IN LARGE BATCHES." Monday, March 4, 13
  2. Wikipedia Says ... A COLLECTION OF DATA SETS SO LARGE

    AND COMPLEX THAT IT BECOMES DIFFICULT TO PROCESS USING ON-HAND DATABASE MANAGEMENT TOOLS OR TRADITIONAL DATA PROCESSING APPLICATIONS. Monday, March 4, 13
  3. Copious’s Definition AN AMOUNT OF DATA AND THE PROCESSING OF

    IT THAT MAKES YOU FEEL UNCOMFORTABLE. Monday, March 4, 13
  4. Wikipedia Says ... A COLLECTION OF DATA SETS SO LARGE

    AND COMPLEX THAT IT BECOMES DIFFICULT TO PROCESS USING ON-HAND DATABASE MANAGEMENT TOOLS OR TRADITIONAL DATA PROCESSING APPLICATIONS. Monday, March 4, 13
  5. 'NEARLY EVERY LARGE DATASET HAS UNANTICIPATED VALUE WITHIN IT.' 'ULTIMATELY

    YOU CAN'T DISCOVER INTERESTING THINGS WITH YOUR DATA UNLESS YOU CAN ASK ARBITRARY QUESTIONS OF IT' - BIG DATA, NATHAN MARZ (2013) 'THE GRANULAR DATA FOUND IN THE DATA WAREHOUSE IS THE KEY TO REUSABILITY, BECAUSE IT CAN BE USED BY MANY PEOPLE IN DIFFERENT WAYS' 'BUT PERHAPS THE LARGEST BENEFIT OF A DATA WAREHOUSE FOUNDATION IS THAT FUTURE UNKNOWN REQUIREMENTS CAN BE ACCOMMODATED' - BUILDING THE DATA WAREHOUSE, 3RD EDITION, W. H. INMON (2002) Monday, March 4, 13
  6. Real Time ETL “... REFERS TO SOFTWARE THAT MOVES DATA

    ASYNCHRONOUSLY INTO A DATA WAREHOUSE WITH SOME URGENCY -- WITHIN MINUTES OF THE EXECUTION OF THE BUSINESS TRANSACTION” - THE DATA WAREHOUSE ETL TOOLKIT RALPH KIMBALL (2004) Monday, March 4, 13