Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond the Tweeting Toaster: IoT Analytics with Apache Storm, Kafka and Arduino

Beyond the Tweeting Toaster: IoT Analytics with Apache Storm, Kafka and Arduino

Slides from my presentation at Data Day Seattle 2015.

P. Taylor Goetz

June 27, 2015
Tweet

More Decks by P. Taylor Goetz

Other Decks in Technology

Transcript

  1. Beyond the Tweeting Toaster IoT Streaming Analytics with Apache Storm,

    Kafka and Arduino P. Taylor Goetz, Hortonworks @ptgoetz
  2. Is that a sensor in your pocket? • GPS •

    Proximity Sensor • Ambient Light Sensor • 3-Axis Accelerometer • Magnetometer • Gyroscopic Sensor • Wifi • Camera(s) • UI (senses user interaction) • iBeacon
  3. 2014 South Napa Earthquake August 24, 3:20 AM (no, the

    earthquake was not caused by people waking up)
  4. –Wikipedia A sensor is a device that detects events or

    changes in quantities and provides a corresponding output, generally as an electrical or optical signal; for example, a thermocouple converts temperature to an output voltage.
  5. Hotel Room Monitoring and Automation • Why heat/cool an unoccupied

    room? • Detect occupancy. • Predict occupancy. • React to occupancy.! • Added benefits to customer experience. • Analyze customer behavior.
  6. Quikie Auto Lube • Manage inventory in response to demand.

    • Detect inventory. • Predict customer demand. • React accordingly.
  7. Hospital Infection Control • CDC: Hospital acquired infections cost $30B

    per year and lead to 100K patient deaths. • Inadequate hand washing a big cause.
  8. Hospital Infection Control • Ensure proper hygiene of medical staff

    • Detect staff presence. • Predict hand washing. • React to inadequate hygiene.
  9. Auto Insurance • Rethinking traditional risk assessment • Detect (un)safe

    driving practices. • Predict who is most at risk. • React to risk assessment.
  10. What is Arduino? • Open Source Microcontroller Hardware + Software

    • Geared toward prototyping • “Physical Computing:” interacting with the environment "Arduino is an open-source electronics platform based on easy-to-use hardware and software. It's intended for anyone making interactive projects.”
  11. Programming Arduino • Official cross-platform IDE written in java •

    C/C++ with some sugar • Program referred to as “Sketch” • Many open source libraries available for various hardware (sensors, etc.)
  12. What is XBee? • Radio modules that support wireless point-to-point

    communication • Serial communication • Minimal connections required — power, ground, data in, data out (UART) • 2 power options (1 mW/100 mW) • Support for multiple network topologies (Mesh, star, tree, etc.)
  13. Architecture/Data Flow Transmit raw sensor data. Receive data, Add timestamp,

    Publish. Reliable queue. Analytics, persistence, alerting. Sensor (XBee/Arduino) Collector (XBee/R-Pi) Kafka Storm 2.4 GHz TCP/IP
  14. Why Apache Storm? • Speed: Process streaming data in realtime

    • Scalability & Fault Tolerance • Flexibility: Single event + Microbatch/Transactional APIs • Choose the latency/throughput balance that best for your use case. • At-most-once, at-least-once, exactly-once semantics.
  15. Why Apache Kafka? • Distributed, Reliable Pub/Sub Event Queue •

    Allows consumers to rewind to specific points in the queue • Redeploy topologies without data loss • Durability: Provides everything Storm needs for exactly-once and at- least-once guarantees.
  16. Architecture/Data Flow Sensor Ouput Kafka Storm Persist/ETL Analysis Alerting Persist/archive

    raw/intermediate data for batch/interactive flows and views (e.g. Lambda, etc.) Detect : Predict : React } }
  17. Architecture/Data Flow Sensor Ouput Kafka Storm Persist/ETL Analysis Alerting Persist/archive

    raw/intermediate data for batch/interactive flows (e.g. Lambda, etc.) Detect : Predict : React } } Feed your model
  18. Why put smarts near the edge? • Required for User

    Experience • Device collaboration • Bandwidth Limitations • Storage Limitations
  19. Whay not put smarts near the edge? • Updating hardware

    in the field is HARD! • Updating firmware in the field is almost as hard! • You probably got it wrong in the first place. Now what?
  20. Why not put smarts near the edge? • Save all

    the things! • You will get it wrong. • Weave a Safety Net (i.e. CYA) • Use batch processing to correct errors.
  21. Arduino Sketch (Sensor) #include <stdio.h>! #include <Esplora.h>! ! void setup()

    {! // initialize the serial communication:! Serial.begin(9600);! }! ! void loop() {! // read sensor variables! int loudness = Esplora.readMicrophone();! int light = Esplora.readLightSensor();! int temp = Esplora.readTemperature(DEGREES_F);! int slider = Esplora.readSlider();! int joystickButton = Esplora.readJoystickSwitch();! int xAxis = Esplora.readAccelerometer(X_AXIS);! // … ! ! Serial.print("{");! // Misc. Sensors! printAttribute("temperature", temp, false);! printAttribute("loudness", loudness, false);! // …! Serial.println(“}");! ! delay(1000); ! }! ! Initialize serial communication Read sensor values Dump JSON to serial port (XBee) Include Esplora convenience lib
  22. Serial Monitor (Collector) • Read serial data, parse JSON •

    Add timestamp • Add device/sensor UID (“Sector 7-G”) • Publish to Kafka
  23. Radiation Leak Topology Raw Sensor Output (JSON) Extract Req’d Fields

    Evaluate Threshold Raise Hell! Kafka Spout Parse Bolt Threshold Bolt Alert Bolt Shuffle Grouping Fields Grouping Shuffle Grouping
  24. Resources • Apache Storm
 http://storm.apache.org • Apache Kafka
 http://kafka.apache.org •

    Arduino
 http://arduino.cc • Adafruit
 https://www.adafruit.com • SparkFun
 https://www.sparkfun.com