Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Eduardo Arino de la Rubia - Big Data is not Hadoop - LA DW/BI/Analytics Meetup - Febr 2015

Data Science LA
February 10, 2015
940

Eduardo Arino de la Rubia - Big Data is not Hadoop - LA DW/BI/Analytics Meetup - Febr 2015

Data Science LA

February 10, 2015
Tweet

More Decks by Data Science LA

Transcript

  1. N O W W H AT ? !
    !
    A S T O RY A B O U T N O T H A D O O P
    Y E A H , I ’ M P R E T T Y S U R E T H I S I S B I G D A TA
    @earino
    #dsla

    View full-size slide

  2. We have lots of data (of course we do)
    Our needs are complex (of course they are)
    We will need an ecosystem of solutions (of course we will)

    View full-size slide

  3. B U T I N R E A L I T Y…

    View full-size slide

  4. S O M E H A R D
    Q U E S T I O N S …
    • What is the size of our
    “data lake”
    • What is the makeup of our
    team?
    • What kinds of bets can we
    comfortably make?
    • What is ready for us, today.

    View full-size slide

  5. W E H AV E B I G
    D ATA
    • Oh, some of it is
    unstructured
    • Oh, some of it is very slow
    • Oh, some of it we would
    never ask questions
    about…
    • We have PETABYTES of
    data

    View full-size slide

  6. W E H AV E B I G
    D ATA
    • Oh, somemost of it is
    unstructured
    • Oh, somemost of it is very
    slow
    • Oh, somemost of it we
    would never ask questions
    about…
    • We have PETATERABYTES
    of data

    View full-size slide

  7. O U R T E A M
    • We are not a technology
    company (though we’d like
    to say that we are.)
    • We don’t have the capacity
    to take on an entire
    ecosystem
    • The end users are BI/DW
    • They are EXPERTS in their
    field. Their field is BI/DW.

    View full-size slide

  8. – M I C H A E L S T O N E B R A K E R
    “It is indeed ironic that Hadoop is picking up
    support in the general community about five years
    after Google moved on to better things.”

    View full-size slide

  9. – A D A M D R A K E
    “This pipeline gets us down to a runtime of about
    12 seconds, or about 270MB/sec, which is around
    235 times faster than the Hadoop
    implementation.”

    View full-size slide

  10. – U N I D E N T I F I E D H A D O O P S A L E S R E P
    “The future of Hadoop is Spark. We’ve moved our
    best and brightest engineers over to it.”

    View full-size slide

  11. T H I S I S N O T T O S AY H A D O O P I S B A D
    • Hadoop is amazing for large scale ETL
    • Hadoop supports a wide variety of tools for analysis of
    unstructured data
    • Hadoop supports some amazing frameworks (HBase,
    Hive, Pig, Mahout, etc…)

    View full-size slide

  12. B U T… I T C O M E S W I T H S O M E C O S T

    View full-size slide

  13. – B I M A N A G E R
    “When evaluating our business needs, we didn’t
    need an ecosystem, we needed an MPP.”

    View full-size slide

  14. C A S E S T U D Y

    View full-size slide

  15. CONNECT WITH US ON DATASCIENCE.LA

    View full-size slide