Slide 1

Slide 1 text

N O W W H AT ? ! ! A S T O RY A B O U T N O T H A D O O P Y E A H , I ’ M P R E T T Y S U R E T H I S I S B I G D A TA @earino #dsla

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

We have lots of data (of course we do) Our needs are complex (of course they are) We will need an ecosystem of solutions (of course we will)

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

B U T I N R E A L I T Y…

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

S O M E H A R D Q U E S T I O N S … • What is the size of our “data lake” • What is the makeup of our team? • What kinds of bets can we comfortably make? • What is ready for us, today.

Slide 13

Slide 13 text

W E H AV E B I G D ATA • Oh, some of it is unstructured • Oh, some of it is very slow • Oh, some of it we would never ask questions about… • We have PETABYTES of data

Slide 14

Slide 14 text

W E H AV E B I G D ATA • Oh, somemost of it is unstructured • Oh, somemost of it is very slow • Oh, somemost of it we would never ask questions about… • We have PETATERABYTES of data

Slide 15

Slide 15 text

O U R T E A M • We are not a technology company (though we’d like to say that we are.) • We don’t have the capacity to take on an entire ecosystem • The end users are BI/DW • They are EXPERTS in their field. Their field is BI/DW.

Slide 16

Slide 16 text

– M I C H A E L S T O N E B R A K E R “It is indeed ironic that Hadoop is picking up support in the general community about five years after Google moved on to better things.”

Slide 17

Slide 17 text

– A D A M D R A K E “This pipeline gets us down to a runtime of about 12 seconds, or about 270MB/sec, which is around 235 times faster than the Hadoop implementation.”

Slide 18

Slide 18 text

– U N I D E N T I F I E D H A D O O P S A L E S R E P “The future of Hadoop is Spark. We’ve moved our best and brightest engineers over to it.”

Slide 19

Slide 19 text

T H I S I S N O T T O S AY H A D O O P I S B A D • Hadoop is amazing for large scale ETL • Hadoop supports a wide variety of tools for analysis of unstructured data • Hadoop supports some amazing frameworks (HBase, Hive, Pig, Mahout, etc…)

Slide 20

Slide 20 text

B U T… I T C O M E S W I T H S O M E C O S T

Slide 21

Slide 21 text

– B I M A N A G E R “When evaluating our business needs, we didn’t need an ecosystem, we needed an MPP.”

Slide 22

Slide 22 text

C A S E S T U D Y

Slide 23

Slide 23 text

CONNECT WITH US ON DATASCIENCE.LA