Upgrade to Pro — share decks privately, control downloads, hide ads and more …

My IoT Data Processing Toolbelt

My IoT Data Processing Toolbelt

Presentation at Startupbootcamp Internet of Things & Data, Barcelona

Michael Hausenblas

January 22, 2015
Tweet

More Decks by Michael Hausenblas

Other Decks in Technology

Transcript

  1. © 2014 MapR Technologies, confidential ® ® Michael Hausenblas, Chief

    Data Engineer, MapR Technologies Startupbootcamp Internet of Things & Data, Barcelona, 2015-01-22
  2. © 2014 MapR Technologies, confidential ® New and Existing Devices

    IoT Gateways Network/Wireless Services Backend Systems Orientation http://iot.eclipse.org
  3. © 2014 MapR Technologies, confidential ® IoT lends itself to

    ‘Big Data’ approach ”Using scale-out techniques on commodity hardware in a schema-on- read fashion along with community-defined interfaces” •  Volume: store all incoming sensor data for historical references •  Variety: dozens of data formats in use in the IoT world, none is relational •  Velocity: many devices generate data at a high rate; usually data streams
  4. © 2014 MapR Technologies, confidential ® Apache Kafka •  A

    high-throughput, distributed, persistent publish-subscribe messaging system •  Originates from LinkedIn •  Typically used as buffer and routing layer in online stream processing http://kafka.apache.org/
  5. © 2014 MapR Technologies, confidential ® Fluentd •  Data collector

    for unified logging layer http://www.fluentd.org/
  6. © 2014 MapR Technologies, confidential ® Apache Storm •  Distributed,

    fault-tolerant stream- processing platform •  Guaranteed message processing; takes care of replaying messages on failure •  Concepts: tuples, streams, spouts, bolts, topologies http://storm.apache.org/
  7. © 2014 MapR Technologies, confidential ® Apache Spark https://spark.apache.org/ Spark

    SQL (SQL/HQL) Spark Streaming (stream processing) MLlib (machine learning) Spark (core execution engine) GraphX (graph processing) Mesos Distributed File System (local FS, HDFS, S3, …) YARN
  8. © 2014 MapR Technologies, confidential ® Apache HBase •  Distributed,

    column-oriented NoSQL database built on top of HDFS •  Based on Google’s BigTable technology •  Scales to 1,000s of commodity servers, billions of rows/ PB of data •  Low-latency get/put operations http://hbase.apache.org/
  9. © 2014 MapR Technologies, confidential ® http://drill.apache.org/ Apache Drill • 

    Interactive analysis at scale with and without schema •  Easy to support evolving structures of NoSQL data •  Use with and without Hadoop https://www.mapr.com/blog/how-use-sql-hadoop- drill-rest-json-nosql-and-hbase-simple-rest-client
  10. © 2014 MapR Technologies, confidential ® Stream data sources • 

    physical sources such as IoT devices •  social media streams such as Twitter firehose
  11. © 2014 MapR Technologies, confidential ® Stream data sources What

    about development and testing? •  synthetic sources •  https://github.com/tdunning/log-synth •  https://github.com/mapr-demos/gess •  https://github.com/mapr-demos/direhose
  12. © 2014 MapR Technologies, confidential ® OpenTSDB OpenTSDB is a

    distributed Time Series Database on top of HBase, enabling you … •  to store & index, as well as •  to query & plot … metrics at scale. http://opentsdb.net/
  13. © 2014 MapR Technologies, confidential ® OpenTSDB: key concepts data

    point: (timestamp, value) + metric + tag: key=value à time series (00:38, 56) mysql.com_delete schema=userdb
  14. © 2014 MapR Technologies, confidential ® OpenTSDB: interfacing •  HTTP

    API •  CLI (tsd, query, mkmetric, etc.) •  Java lib: asynchbase •  Dashboards (Grafana, etc.)
  15. © 2014 MapR Technologies, confidential ® InfluxDB, an alternative TSDB

    for smaller scales •  Written in Go, no dependencies •  Lots of client libs •  Support for cluster op via Raft •  Powerful, SQL-like query language select mean(value), percentile(90, value) as percentile_90 from /^stats.*/ group by time(10m) into 10m.:series_name http://influxdb.com
  16. ® © 2014 MapR Technologies © 2014 MapR Technologies ®

    The Internet of Things Architecture: iot-a
  17. ® © 2014 MapR Technologies Key Requirements for an IoT

    Data Platform •  Deal with raw data natively •  Support a range of workloads; streaming as first-class citizen •  Ensure business continuity •  Provide secure and privacy-aware operation https://www.mapr.com/blog/key-requirements-iot-data-platform
  18. © 2014 MapR Technologies, confidential ® The IoT architecture (iot-a)

    http://iot-a.info/ MQ/SP DFS DB input outputas-it-happens outputinteractive outputbatch
  19. © 2014 MapR Technologies, confidential ® Example iot-a HDFS HBase

    input outputas-it-happens outputinteractive outputbatch batch jobs batch jobs
  20. © 2014 MapR Technologies, confidential ® To sum up …

    •  Data volume, variety & velocity à not a good fit for RDBMS •  Many open source tools available, iterate and scale as you go •  If you need help re tooling à I’m around! •  And last but not least …
  21. ® © 2014 MapR Technologies Q & A @mhausenblas maprtech

    [email protected] Engage with us! MapR maprtech mapr-technologies