Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Eduardo Arino de la Rubia - Big Data is not Hadoop - LA DW/BI/Analytics Meetup - Febr 2015

Data Science LA
February 10, 2015

Eduardo Arino de la Rubia - Big Data is not Hadoop - LA DW/BI/Analytics Meetup - Febr 2015

Data Science LA

February 10, 2015

More Decks by Data Science LA


  1. N O W W H AT ? ! ! A

    S T O RY A B O U T N O T H A D O O P Y E A H , I ’ M P R E T T Y S U R E T H I S I S B I G D A TA @earino #dsla
  2. We have lots of data (of course we do) Our

    needs are complex (of course they are) We will need an ecosystem of solutions (of course we will)
  3. S O M E H A R D Q U

    E S T I O N S … • What is the size of our “data lake” • What is the makeup of our team? • What kinds of bets can we comfortably make? • What is ready for us, today.
  4. W E H AV E B I G D ATA

    • Oh, some of it is unstructured • Oh, some of it is very slow • Oh, some of it we would never ask questions about… • We have PETABYTES of data
  5. W E H AV E B I G D ATA

    • Oh, somemost of it is unstructured • Oh, somemost of it is very slow • Oh, somemost of it we would never ask questions about… • We have PETATERABYTES of data
  6. O U R T E A M • We are

    not a technology company (though we’d like to say that we are.) • We don’t have the capacity to take on an entire ecosystem • The end users are BI/DW • They are EXPERTS in their field. Their field is BI/DW.
  7. – M I C H A E L S T

    O N E B R A K E R “It is indeed ironic that Hadoop is picking up support in the general community about five years after Google moved on to better things.”
  8. – A D A M D R A K E

    “This pipeline gets us down to a runtime of about 12 seconds, or about 270MB/sec, which is around 235 times faster than the Hadoop implementation.”
  9. – U N I D E N T I F

    I E D H A D O O P S A L E S R E P “The future of Hadoop is Spark. We’ve moved our best and brightest engineers over to it.”
  10. T H I S I S N O T T

    O S AY H A D O O P I S B A D • Hadoop is amazing for large scale ETL • Hadoop supports a wide variety of tools for analysis of unstructured data • Hadoop supports some amazing frameworks (HBase, Hive, Pig, Mahout, etc…)
  11. B U T… I T C O M E S

    W I T H S O M E C O S T
  12. – B I M A N A G E R

    “When evaluating our business needs, we didn’t need an ecosystem, we needed an MPP.”