Slide 1

Slide 1 text

Hadoop &! Machine Learning @hadooping

Slide 2

Slide 2 text

Rules of the road....

Slide 3

Slide 3 text

Whatever happens...

Slide 4

Slide 4 text

tweet

Slide 5

Slide 5 text

tweet (or whatever floats yer boat)

Slide 6

Slide 6 text

What is Hadoop?! ! ! !

Slide 7

Slide 7 text

What is Hadoop?! A real quick overview (AP)! ! !

Slide 8

Slide 8 text

What is Hadoop?! A real quick overview (AP)! Word count == Hello World!! !

Slide 9

Slide 9 text

What is Hadoop?! A real quick overview (AP)! Word count == Hello World! ! A more useful example!

Slide 10

Slide 10 text

What is Hadoop?! A real quick overview (AP)! Word count == Hello World! ! A more useful example! Machine Learning!

Slide 11

Slide 11 text

What is Hadoop?! A real quick overview (AP)! Word count == Hello World! ! A more useful example! Machine Learning! Q & A

Slide 12

Slide 12 text

>> What is Hadoop?! ! ! !

Slide 13

Slide 13 text

An open source framework for storing and large scale processing of data sets on clusters of commodity hardware.! !

Slide 14

Slide 14 text

! >>Is it shiny and hot?! ! !

Slide 15

Slide 15 text

Depends who you ask....

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Clusters == lots of processing...! !

Slide 18

Slide 18 text

It goes to 11.! !

Slide 19

Slide 19 text

Uses MapReduce wonderfully.! !

Slide 20

Slide 20 text

>>A real quick overview (AP)

Slide 21

Slide 21 text

AP = Audience Participation

Slide 22

Slide 22 text

AP = Audience Participation! ! (and I know you’ll all hate me for it but I really don’t care that much, we’ll still be friends....)

Slide 23

Slide 23 text

AP = Audience Participation! A quick sort!

Slide 24

Slide 24 text

AP = Audience Participation! A quick sort with MapReduce!

Slide 25

Slide 25 text

>>WordCount == Hello World!!

Slide 26

Slide 26 text

Basic MapReduce in Action! WordCount!

Slide 27

Slide 27 text

Basic MapReduce in Action! WordCount! It’s boring but explains it well.

Slide 28

Slide 28 text

Basic MapReduce in Action! WordCount!

Slide 29

Slide 29 text

Map in parallel

Slide 30

Slide 30 text

>>A more useful example

Slide 31

Slide 31 text

>>A more useful example! ! The coffee shop....

Slide 32

Slide 32 text

20,000 customers with ! 13 month sales data 1,3,11,6,10,7,10,12,9,7,6,10,14,5

Slide 33

Slide 33 text

20,000 customers 1. The average 12 month sales.! 2. The variance! 3. The month 13 sales drop against the average! 4. Number months which are 40% below avg.

Slide 34

Slide 34 text

20,000 customers So far we’ve only output to one file. Now I want to segment my customers into groups I can market to.

Slide 35

Slide 35 text

>>Recommending Stuff 1000 Users, 5000 Items ! and ! 100,000 Recommendations

Slide 36

Slide 36 text

>>Recommending Stuff 196 242 3 881250949 186 302 3 891717742 22 377 1 878887116 244 51 2 880606923

Slide 37

Slide 37 text

>>Q&A? ! >>Thank you! >>@hadooping! >>http://github.com/jasebell