Slide 1

Slide 1 text

Beyond the Tweeting Toaster (I)IoT Streaming Analytics with Apache Storm, Kafka and Arduino P. Taylor Goetz, Hortonworks @ptgoetz

Slide 2

Slide 2 text

Credit where credit is due…

Slide 3

Slide 3 text

@mytoaster • Created by Hans Scharler in 2008 • http://nothans.tumblr.com

Slide 4

Slide 4 text

26 billion IoT devices by 2020 -Gartner http://www.gartner.com/newsroom/id/2636073

Slide 5

Slide 5 text

IPv4 Address Space: 4.6 billion

Slide 6

Slide 6 text

Sensors are everywhere!

Slide 7

Slide 7 text

Is that a sensor in your pocket? • GPS • Proximity Sensor • Ambient Light Sensor • 3-Axis Accelerometer • Magnetometer • Gyroscopic Sensor • Wifi • Camera(s) • UI (senses user interaction) • iBeacon

Slide 8

Slide 8 text

In your car? sensormag.com

Slide 9

Slide 9 text

In your car?

Slide 10

Slide 10 text

On your wrist? Jawbone UP FitBit …

Slide 11

Slide 11 text

2014 South Napa Earthquake August 24, 3:20 AM (no, the earthquake was not caused by people waking up)

Slide 12

Slide 12 text

Sensors

Slide 13

Slide 13 text

–Wikipedia A sensor is a device that detects events or changes in quantities and provides a corresponding output, generally as an electrical or optical signal; for example, a thermocouple converts temperature to an output voltage.

Slide 14

Slide 14 text

Sensors

Slide 15

Slide 15 text

http://adafruit.com

Slide 16

Slide 16 text

http://adafruit.com

Slide 17

Slide 17 text

IoT Use Cases

Slide 18

Slide 18 text

Detect : Anticipate : React

Slide 19

Slide 19 text

Detect : Anticipate : React Detect behavior. Anticipate behavior. React to behavior.

Slide 20

Slide 20 text

Hotel Room Monitoring and Automation • Why heat/cool an unoccupied room? • Detect occupancy. • Anticipate occupancy. • React to occupancy.! • Added benefits to customer experience. • Analyze customer behavior.

Slide 21

Slide 21 text

Quikie Auto Lube • Manage inventory in response to demand. • Detect inventory. • Anticipate customer demand. • React accordingly.

Slide 22

Slide 22 text

Hospital Infection Control • CDC: Hospital acquired infections cost $30B per year and lead to 100K patient deaths. • Inadequate hand washing a big cause.

Slide 23

Slide 23 text

Hospital Infection Control • Ensure proper hygiene of medical staff • Detect staff presence. • Anticipate hand washing. • React to inadequate hygiene.

Slide 24

Slide 24 text

Hospital Infection Control

Slide 25

Slide 25 text

Auto Insurance • Rethinking traditional risk assessment • Detect unsafe driving practices. • Anticipate who is most at risk. • React to risk assessment.

Slide 26

Slide 26 text

Use your imagination.

Slide 27

Slide 27 text

How can you use and combine sensors to better serve your [user’s] needs?

Slide 28

Slide 28 text

What is Arduino?

Slide 29

Slide 29 text

What is Arduino? • Open Source Microcontroller Hardware + Software • Geared toward prototyping • “Physical Computing:” interacting with the environment "Arduino is an open-source electronics platform based on easy-to-use hardware and software. It's intended for anyone making interactive projects.”

Slide 30

Slide 30 text

http://adafruit.com

Slide 31

Slide 31 text

What is Arduino?

Slide 32

Slide 32 text

Programming Arduino • Official cross-platform IDE written in java • C/C++ with some sugar • Program referred to as “Sketch” • Many open source libraries available for various hardware (sensors, etc.)

Slide 33

Slide 33 text

Going Wireless with XBee

Slide 34

Slide 34 text

What is XBee? • Radio modules that support wireless point-to-point communication • Serial communication • Minimal connections required — power, ground, data in, data out (UART) • 2 power options (1 mW/100 mW) • Support for multiple network topologies (Mesh, star, tree, etc.)

Slide 35

Slide 35 text

Architecture/Data Flow Transmit raw sensor data. Receive data, Add timestamp, Publish. Reliable queue. Analytics, persistence, alerting. Sensor (XBee/Arduino) Collector (XBee/R-Pi) Kafka Storm 2.4 GHz TCP/IP

Slide 36

Slide 36 text

Why Apache Storm? • Speed: Process streaming data in realtime • Scalability & Fault Tolerance • Flexibility: Single event + Microbatch/Transactional APIs • Choose the latency/throughput balance that best for your use case. • At-most-once, at-least-once, exactly-once semantics.

Slide 37

Slide 37 text

Why Apache Kafka? • Distributed, Reliable Pub/Sub Event Queue • Allows consumers to rewind to specific points in the queue • Redeploy topologies without data loss • Durability: Provides everything Storm needs for exactly-once and at- least-once guarantees.

Slide 38

Slide 38 text

Architecture/Data Flow Storm Zone 2 Zone 1 Kafka

Slide 39

Slide 39 text

Architecture/Data Flow Analytics Layer (Real-Time/Interactive/Batch) Storm Collection Layer (Device Network) Kafka HDFS/HBase/ Hive etc.

Slide 40

Slide 40 text

Architecture/Data Flow Sensor Ouput Kafka Storm Persist/ETL Analysis Alerting Persist/archive raw/intermediate data for batch/interactive flows and views (e.g. Lambda, etc.) Detect : Anticipate : React } }

Slide 41

Slide 41 text

Architecture/Data Flow Sensor Ouput Kafka Storm Persist/ETL Analysis Alerting Persist/archive raw/intermediate data for batch/interactive flows (e.g. Lambda, etc.) Detect : Anticipate : React } } Feed your model

Slide 42

Slide 42 text

Where do I put the smarts? In the IoT Device? Or in the Analytics Layer?

Slide 43

Slide 43 text

Whay put smarts near the edge? • Required for User Experience • Device collaboration • Bandwidth Limitations • Storage Limitations

Slide 44

Slide 44 text

Whay not put smarts near the edge? • Updating hardware in the field is HARD! • Updating firmware in the field is almost as hard! • You probably got it wrong in the first place. Now what?

Slide 45

Slide 45 text

Whay not put smarts near the edge? • Save all the things! • You will get it wrong. • Weave a Safety Net (i.e. CYA) • Use batch processing to correct errors.

Slide 46

Slide 46 text

Save all the data. Storage is cheap.

Slide 47

Slide 47 text

You can't analyze data you don't have.

Slide 48

Slide 48 text

Demo Twitter: @ptgoetz #HadoopSummit

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

No content

Slide 52

Slide 52 text

“Three eyes are better than two!”

Slide 53

Slide 53 text

Arduino Esplora Several sensors included. (no soldering required)

Slide 54

Slide 54 text

Arduino Sketch (Sensor) #include ! #include ! ! void setup() {! // initialize the serial communication:! Serial.begin(9600);! }! ! void loop() {! // read sensor variables! int loudness = Esplora.readMicrophone();! int light = Esplora.readLightSensor();! int temp = Esplora.readTemperature(DEGREES_F);! int slider = Esplora.readSlider();! int joystickButton = Esplora.readJoystickSwitch();! int xAxis = Esplora.readAccelerometer(X_AXIS);! // … ! ! Serial.print("{");! // Misc. Sensors! printAttribute("temperature", temp, false);! printAttribute("loudness", loudness, false);! // …! Serial.println(“}");! ! delay(1000); ! }! ! Initialize serial communication Read sensor values Dump JSON to serial port (XBee) Include Esplora convenience lib

Slide 55

Slide 55 text

Serial Monitor (Collector) • Read serial data, parse JSON • Add timestamp • Add device/sensor UID (“Sector 7-G”) • Publish to Kafka

Slide 56

Slide 56 text

Radiation Leak Topology Raw Sensor Output (JSON) Extract Req’d Fields Evaluate Threshold Raise Hell! Kafka Spout Parse Bolt Threshold Bolt Alert Bolt Shuffle Grouping Fields Grouping Shuffle Grouping

Slide 57

Slide 57 text

Swag Time Twitter: @ptgoetz #HadoopSummit

Slide 58

Slide 58 text

Swag Time Be the 7th person to retweet the last tweet from the demo.

Slide 59

Slide 59 text

Resources • Apache Storm
 http://storm.apache.org • Apache Kafka
 http://kafka.apache.org • Arduino
 http://arduino.cc • Adafruit
 https://www.adafruit.com • SparkFun
 https://www.sparkfun.com


Slide 60

Slide 60 text

Thank You! P. Taylor Goetz, Hortonworks @ptgoetz Storm BoF Session today @ 17:30, Hall 400