Eduardo Arino de la Rubia - Big Data is not Hadoop - LA DW/BI/Analytics Meetup - Febr 2015

Slide 1

Slide 1 text

N O W W H AT ? ! ! A S T O RY A B O U T N O T H A D O O P Y E A H , I ’ M P R E T T Y S U R E T H I S I S B I G D A TA @earino #dsla

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

We have lots of data (of course we do) Our needs are complex (of course they are) We will need an ecosystem of solutions (of course we will)

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

B U T I N R E A L I T Y…

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

S O M E H A R D Q U E S T I O N S … • What is the size of our “data lake” • What is the makeup of our team? • What kinds of bets can we comfortably make? • What is ready for us, today.

Slide 13

Slide 13 text

W E H AV E B I G D ATA • Oh, some of it is unstructured • Oh, some of it is very slow • Oh, some of it we would never ask questions about… • We have PETABYTES of data

Slide 14

Slide 14 text

W E H AV E B I G D ATA • Oh, somemost of it is unstructured • Oh, somemost of it is very slow • Oh, somemost of it we would never ask questions about… • We have PETATERABYTES of data

Slide 15

Slide 15 text

O U R T E A M • We are not a technology company (though we’d like to say that we are.) • We don’t have the capacity to take on an entire ecosystem • The end users are BI/DW • They are EXPERTS in their field. Their field is BI/DW.

Slide 16

Slide 16 text

– M I C H A E L S T O N E B R A K E R “It is indeed ironic that Hadoop is picking up support in the general community about five years after Google moved on to better things.”

Slide 17

Slide 17 text

– A D A M D R A K E “This pipeline gets us down to a runtime of about 12 seconds, or about 270MB/sec, which is around 235 times faster than the Hadoop implementation.”

Slide 18

Slide 18 text

– U N I D E N T I F I E D H A D O O P S A L E S R E P “The future of Hadoop is Spark. We’ve moved our best and brightest engineers over to it.”

Slide 19

Slide 19 text

T H I S I S N O T T O S AY H A D O O P I S B A D • Hadoop is amazing for large scale ETL • Hadoop supports a wide variety of tools for analysis of unstructured data • Hadoop supports some amazing frameworks (HBase, Hive, Pig, Mahout, etc…)

Slide 20

Slide 20 text

B U T… I T C O M E S W I T H S O M E C O S T

Slide 21

Slide 21 text

– B I M A N A G E R “When evaluating our business needs, we didn’t need an ecosystem, we needed an MPP.”

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text