Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Eduardo Arino de la Rubia - Big Data is not Hadoop - LA DW/BI/Analytics Meetup - Febr 2015

E936a58f495e26123f9f537ea31968f7?s=47 Data Science LA
February 10, 2015
870

Eduardo Arino de la Rubia - Big Data is not Hadoop - LA DW/BI/Analytics Meetup - Febr 2015

E936a58f495e26123f9f537ea31968f7?s=128

Data Science LA

February 10, 2015
Tweet

Transcript

  1. N O W W H AT ? ! ! A

    S T O RY A B O U T N O T H A D O O P Y E A H , I ’ M P R E T T Y S U R E T H I S I S B I G D A TA @earino #dsla
  2. None
  3. None
  4. We have lots of data (of course we do) Our

    needs are complex (of course they are) We will need an ecosystem of solutions (of course we will)
  5. None
  6. None
  7. None
  8. B U T I N R E A L I

    T Y…
  9. None
  10. None
  11. None
  12. S O M E H A R D Q U

    E S T I O N S … • What is the size of our “data lake” • What is the makeup of our team? • What kinds of bets can we comfortably make? • What is ready for us, today.
  13. W E H AV E B I G D ATA

    • Oh, some of it is unstructured • Oh, some of it is very slow • Oh, some of it we would never ask questions about… • We have PETABYTES of data
  14. W E H AV E B I G D ATA

    • Oh, somemost of it is unstructured • Oh, somemost of it is very slow • Oh, somemost of it we would never ask questions about… • We have PETATERABYTES of data
  15. O U R T E A M • We are

    not a technology company (though we’d like to say that we are.) • We don’t have the capacity to take on an entire ecosystem • The end users are BI/DW • They are EXPERTS in their field. Their field is BI/DW.
  16. – M I C H A E L S T

    O N E B R A K E R “It is indeed ironic that Hadoop is picking up support in the general community about five years after Google moved on to better things.”
  17. – A D A M D R A K E

    “This pipeline gets us down to a runtime of about 12 seconds, or about 270MB/sec, which is around 235 times faster than the Hadoop implementation.”
  18. – U N I D E N T I F

    I E D H A D O O P S A L E S R E P “The future of Hadoop is Spark. We’ve moved our best and brightest engineers over to it.”
  19. T H I S I S N O T T

    O S AY H A D O O P I S B A D • Hadoop is amazing for large scale ETL • Hadoop supports a wide variety of tools for analysis of unstructured data • Hadoop supports some amazing frameworks (HBase, Hive, Pig, Mahout, etc…)
  20. B U T… I T C O M E S

    W I T H S O M E C O S T
  21. – B I M A N A G E R

    “When evaluating our business needs, we didn’t need an ecosystem, we needed an MPP.”
  22. C A S E S T U D Y

  23. CONNECT WITH US ON DATASCIENCE.LA