Upgrade to Pro — share decks privately, control downloads, hide ads and more …

IoT data ingestion pipelines and Clojure transducers

IoT data ingestion pipelines and Clojure transducers

Yodit Stanton, Founder @OpenSensors.io talk at @ds_ldn meetup

Data Science London

June 04, 2015
Tweet

More Decks by Data Science London

Other Decks in Technology

Transcript

  1. Open data is data that is made available by organisations,

    businesses and individuals for anyone to access, use and share. Open data has to have a licence that says it is open data. Without a licence, the data can’t be reused. The licence might also say: • that people who use the data must credit whoever is publishing it (this is called attribution) • that people who mix the data with other data have to also release the results as open data (this is called share-alike)
  2. Context of Scale - Traditional tech architecture will not scale

    to these volumes - People and devices need to find data easily “20 Billion IoT devices to be connected by 2020” — Gartner Expected no of Daily Messages - 333x the size of Twitter - 94x the size of WhatsApp IoT
  3. ▪ HTTP like ▪ Over UDP ▪ Use GET, POST,

    PUT, DELETE ▪ coap://example.se:5683/~sensors./temp1.xml CoAP
  4. MQTT ▪ PUB/SUB Protocol ▪ Tiny Packets on the Wire

    ▪ Suitable for low bandwidth devices
  5. Topic Matching Show me all data for my office ▪

    /users/yods/myoffice/* Show me all data from the lights in my office ▪ /users/yods/myoffice/+/lights
  6. Take one reducing fn and return another Do something with

    a seq of data to reuse functions on different data sources
  7. (defn xform [xs] (->> xs (map #(+ 2 %)) (filter

    odd?))) (reduce + (xform (range 0 10)) (def xform2 (comp (map #(+ 2 %)) (filter odd?))) (transduce xform2 + (range 0 10))
  8. Why? Faster - XML parsing example (defn dothread [doc] (->>

    [doc] (mapcat :content) (filter #(= :chapter (:tag %))) (filter #(= "Introduction" (get-in % [:attrs :name]))) (mapcat :content) (filter #(= :para (:tag %))) (mapcat :content) (filter string?))) Threading Benchmark Evaluation count : 22646520 in 60 samples of 377442 calls. Execution time mean : 2.685036 µs Transducer Benchmark Evaluation count : 50155860 in 60 samples of 835931 calls. Execution time mean : 1.276018 µs