Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond the Tweeting Toaster: IoT Analytics with Apache Storm, Kafka and Arduino

Beyond the Tweeting Toaster: IoT Analytics with Apache Storm, Kafka and Arduino

My presentation from Hadoop Summit Brussels 2015.


P. Taylor Goetz

April 16, 2015


  1. Beyond the Tweeting Toaster (I)IoT Streaming Analytics with Apache Storm,

    Kafka and Arduino P. Taylor Goetz, Hortonworks @ptgoetz
  2. Credit where credit is due…

  3. @mytoaster • Created by Hans Scharler in 2008 • http://nothans.tumblr.com

  4. 26 billion IoT devices by 2020 -Gartner http://www.gartner.com/newsroom/id/2636073

  5. IPv4 Address Space: 4.6 billion

  6. Sensors are everywhere!

  7. Is that a sensor in your pocket? • GPS •

    Proximity Sensor • Ambient Light Sensor • 3-Axis Accelerometer • Magnetometer • Gyroscopic Sensor • Wifi • Camera(s) • UI (senses user interaction) • iBeacon
  8. In your car? sensormag.com

  9. In your car?

  10. On your wrist? Jawbone UP FitBit …

  11. 2014 South Napa Earthquake August 24, 3:20 AM (no, the

    earthquake was not caused by people waking up)
  12. Sensors

  13. –Wikipedia A sensor is a device that detects events or

    changes in quantities and provides a corresponding output, generally as an electrical or optical signal; for example, a thermocouple converts temperature to an output voltage.
  14. Sensors

  15. http://adafruit.com

  16. http://adafruit.com

  17. IoT Use Cases

  18. Detect : Anticipate : React

  19. Detect : Anticipate : React Detect behavior. Anticipate behavior. React

    to behavior.
  20. Hotel Room Monitoring and Automation • Why heat/cool an unoccupied

    room? • Detect occupancy. • Anticipate occupancy. • React to occupancy.! • Added benefits to customer experience. • Analyze customer behavior.
  21. Quikie Auto Lube • Manage inventory in response to demand.

    • Detect inventory. • Anticipate customer demand. • React accordingly.
  22. Hospital Infection Control • CDC: Hospital acquired infections cost $30B

    per year and lead to 100K patient deaths. • Inadequate hand washing a big cause.
  23. Hospital Infection Control • Ensure proper hygiene of medical staff

    • Detect staff presence. • Anticipate hand washing. • React to inadequate hygiene.
  24. Hospital Infection Control

  25. Auto Insurance • Rethinking traditional risk assessment • Detect unsafe

    driving practices. • Anticipate who is most at risk. • React to risk assessment.
  26. Use your imagination.

  27. How can you use and combine sensors to better serve

    your [user’s] needs?
  28. What is Arduino?

  29. What is Arduino? • Open Source Microcontroller Hardware + Software

    • Geared toward prototyping • “Physical Computing:” interacting with the environment "Arduino is an open-source electronics platform based on easy-to-use hardware and software. It's intended for anyone making interactive projects.”
  30. http://adafruit.com

  31. What is Arduino?

  32. Programming Arduino • Official cross-platform IDE written in java •

    C/C++ with some sugar • Program referred to as “Sketch” • Many open source libraries available for various hardware (sensors, etc.)
  33. Going Wireless with XBee

  34. What is XBee? • Radio modules that support wireless point-to-point

    communication • Serial communication • Minimal connections required — power, ground, data in, data out (UART) • 2 power options (1 mW/100 mW) • Support for multiple network topologies (Mesh, star, tree, etc.)
  35. Architecture/Data Flow Transmit raw sensor data. Receive data, Add timestamp,

    Publish. Reliable queue. Analytics, persistence, alerting. Sensor (XBee/Arduino) Collector (XBee/R-Pi) Kafka Storm 2.4 GHz TCP/IP
  36. Why Apache Storm? • Speed: Process streaming data in realtime

    • Scalability & Fault Tolerance • Flexibility: Single event + Microbatch/Transactional APIs • Choose the latency/throughput balance that best for your use case. • At-most-once, at-least-once, exactly-once semantics.
  37. Why Apache Kafka? • Distributed, Reliable Pub/Sub Event Queue •

    Allows consumers to rewind to specific points in the queue • Redeploy topologies without data loss • Durability: Provides everything Storm needs for exactly-once and at- least-once guarantees.
  38. Architecture/Data Flow Storm Zone 2 Zone 1 Kafka

  39. Architecture/Data Flow Analytics Layer (Real-Time/Interactive/Batch) Storm Collection Layer (Device Network)

    Kafka HDFS/HBase/ Hive etc.
  40. Architecture/Data Flow Sensor Ouput Kafka Storm Persist/ETL Analysis Alerting Persist/archive

    raw/intermediate data for batch/interactive flows and views (e.g. Lambda, etc.) Detect : Anticipate : React } }
  41. Architecture/Data Flow Sensor Ouput Kafka Storm Persist/ETL Analysis Alerting Persist/archive

    raw/intermediate data for batch/interactive flows (e.g. Lambda, etc.) Detect : Anticipate : React } } Feed your model
  42. Where do I put the smarts? In the IoT Device?

    Or in the Analytics Layer?
  43. Whay put smarts near the edge? • Required for User

    Experience • Device collaboration • Bandwidth Limitations • Storage Limitations
  44. Whay not put smarts near the edge? • Updating hardware

    in the field is HARD! • Updating firmware in the field is almost as hard! • You probably got it wrong in the first place. Now what?
  45. Whay not put smarts near the edge? • Save all

    the things! • You will get it wrong. • Weave a Safety Net (i.e. CYA) • Use batch processing to correct errors.
  46. Save all the data. Storage is cheap.

  47. You can't analyze data you don't have.

  48. Demo Twitter: @ptgoetz #HadoopSummit

  49. None
  50. None
  51. None
  52. “Three eyes are better than two!”

  53. Arduino Esplora Several sensors included. (no soldering required)

  54. Arduino Sketch (Sensor) #include <stdio.h>! #include <Esplora.h>! ! void setup()

    {! // initialize the serial communication:! Serial.begin(9600);! }! ! void loop() {! // read sensor variables! int loudness = Esplora.readMicrophone();! int light = Esplora.readLightSensor();! int temp = Esplora.readTemperature(DEGREES_F);! int slider = Esplora.readSlider();! int joystickButton = Esplora.readJoystickSwitch();! int xAxis = Esplora.readAccelerometer(X_AXIS);! // … ! ! Serial.print("{");! // Misc. Sensors! printAttribute("temperature", temp, false);! printAttribute("loudness", loudness, false);! // …! Serial.println(“}");! ! delay(1000); ! }! ! Initialize serial communication Read sensor values Dump JSON to serial port (XBee) Include Esplora convenience lib
  55. Serial Monitor (Collector) • Read serial data, parse JSON •

    Add timestamp • Add device/sensor UID (“Sector 7-G”) • Publish to Kafka
  56. Radiation Leak Topology Raw Sensor Output (JSON) Extract Req’d Fields

    Evaluate Threshold Raise Hell! Kafka Spout Parse Bolt Threshold Bolt Alert Bolt Shuffle Grouping Fields Grouping Shuffle Grouping
  57. Swag Time Twitter: @ptgoetz #HadoopSummit

  58. Swag Time Be the 7th person to retweet the last

    tweet from the demo.
  59. Resources • Apache Storm
 http://storm.apache.org • Apache Kafka
 http://kafka.apache.org •

 http://arduino.cc • Adafruit
 https://www.adafruit.com • SparkFun

  60. Thank You! P. Taylor Goetz, Hortonworks @ptgoetz Storm BoF Session

    today @ 17:30, Hall 400