Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Interoperable & Efficient: Linked Data for the ...

Eugene Siow
September 28, 2017

Interoperable & Efficient: Linked Data for the Internet of Things

Two requirements to utilise the large source of time-series sensor data from the Internet of Things are interoperability and efficient access. We present a Linked Data solution that increases interoperability through the use and referencing of common identifiers and ontologies for integration. From our study of the shape of Internet of Things data, we show how we can improve access within the resource constraints of Lightweight Computers, compact machines deployed in close proximity to sensors, by storing time-series data succinctly as rows and producing Linked Data ‘just-in-time’. We examine our approach within two scenarios: a distributed meteorological analytics system and a smart home hub. We show with established benchmarks that in comparison to storing the data in a traditional Linked Data store, our approach provides gains in both storage efficiency and query performance from over 3 times to over three orders of magnitude on Lightweight Computers. Finally, we reflect how pushing computing to edge networks with our infrastructure can affect privacy, data ownership and data locality.

Presentation at The 3rd International Conference for Internet Science (INSCI16) in Florence, Italy.

Eugene Siow

September 28, 2017
Tweet

More Decks by Eugene Siow

Other Decks in Technology

Transcript

  1. DEVICES & DATA: SECURITY, PRIVACY, LOCALITY? “The Internet of Things

    is currently beset by product silos.” W3C Web of Things Interest Group CURRENT STATE OF THE INTERNET OF THINGS PRODUCT & DATA SILOS DEPENDENCY ON THE CLOUD PERFORMANCE OF APPLICATIONS & ANALYTICS
  2. DATA OWNERSHIP & PRIVACY WITH LIGHTWEIGHT COMPUTERS A Smart Home

    Scenario implementing a Personal IoT Repository Smart Home Dashboard Personal IoT Repository Environmental Sensors Energy Meters Data Stream Energy Saving Analytics Stream & Historical Queries Motion Sensors Data ownership Own and store your data at home Less Cloud ENCRYPTION BETTER PERFORMANCE SPECIFIC POLICIES/CONTROL ONLINE/OFFLINE, TRUST, ACCESS CONTROL
  3. DATA LOCALITY WITH LIGHTWEIGHT COMPUTERS A Distributed Meteorological Scenario, minimising

    cloud dependency for Storage and Processing Irrigation Application Soil Moisture Analytics Environmental Sensors Lightweight Computer Hub Data Stream Weather Data State Inclement Weather Planning Application National Disaster Monitoring Application Cloud
  4. INTRODUCING LINKED DATA FOR INTEROPERABILITY URI and ontologies Establish common

    data structures & References http://thing.io/1 is a http://ont/weather_sensor CLASS produces http://thing.io/obs/1 http://ont/temp_observation is a 13.0 has value CLASS ℃ unit ENABLES RICH METADATA what, where, WHEN, HOW of DATA located at http://thing.io/loc/1 latitude longitude -1.41 50.9 PERFORMANCE CHALLENGES STORES DON’T SCALE & PERFORM WELL ON WEB YET Buil-Aranda, C., Hogan, A.: SPARQL Web-Querying Infrastructure: Ready for Action? ISWC 2013
  5. THE SHAPE OF IOT TIME-SERIES DATA { timestamp : 1467673132,

    temperature : { max: 22.0, min: 15.0, current: 17.0, error: { percentage: 5.0 } } } FLAT { timestamp : 1467673132, temperature : 32.0, wind_speed : 10.5, pressure : 1016 } COMPLEX { timestamp : 1467673132, temperature : 32.0, wind_speed : 10.5, pressure : 1016, precipitation: 0, humidity: 93.0, } 1 2 3 4 5 WIDTH
  6. THE SHAPE OF IOT TIME-SERIES DATA 20k UNIQUE DEVICES dweet.io

    18.5k NON-EMPTY SCHEMATA 92.3% 18k 99.5% FLAT SCHEMATA 92 0.5% COMPLEX SCHEMATA 1 2,3 4 5 6+ Width
  7. OPTIMISING FOR TIME-SERIES DATA THING TEMPERATURE OBS HUMIDITY OBS WIND

    SPEED OBS 13.0 2016-01-01 06:00:00 CELCIUS 93.0 2016-01-01 06:00:00 PERCENT 10.5 2016-01-01 06:00:00 MPH LOCATION produces produces located produces has value unit time RDF GRAPH
  8. THING TEMPERATURE OBS HUMIDITY OBS WIND SPEED OBS 13.0 LOCATION

    produces produces located produces has value THING THING THING TEMPERATURE OBS time TEMPERATURE OBS 2016-01-01 06:00:00 unit TEMPERATURE OBS celcius 93.0 has value HUMIDITY OBS time HUMIDITY OBS 2016-01-01 06:00:00 unit HUMIDITY OBS PERCENT 10.5 has value WIND SPEED OBS time WIND SPEED OBS 2016-01-01 06:00:00 unit WIND SPEED OBS MPH OPTIMISING FOR TIME-SERIES DATA RDF TRIPLES
  9. SHARE COLUMN HEADERS NO JOINS WITHIN ROWS ‘JUST IN TIME’

    METADATA OUR APPROACH OPTIMISING FOR TIME-SERIES DATA THING TEMPERATURE OBS WIND SPEED OBS CELCIUS PERCENT MPH LOCATION produces located HUMIDITY OBS unit TEMPERATURE HUMIDITY WIND SPEED 13.0 93.0 10.5 TIME 2016-01-01 06:00:00
  10. DESIGNING OUR ENGINE THING TEMPERATURE OBS WIND SPEED OBS CELCIUS

    PERCENT MPH LOCATION produces located HUMIDITY OBS unit TEMPERATURE HUMIDITY WINDSPEED 13.0 93.0 10.5 TIME 2016-01-01 06:00:00 Table1 TABLE1.TEMPERATURE has value has value TABLE1.HUMIDITY has value TABLE1.WINDSPEED
  11. DESIGNING OUR ENGINE THING TEMPERATURE OBS WIND SPEED OBS CELCIUS

    PERCENT MPH LOCATION produces located HUMIDITY OBS unit TEMPERATURE HUMIDITY WINDSPEED 13.0 93.0 10.5 TIME 2016-01-01 06:00:00 Table1 TABLE1.TEMPERATURE has value has value TABLE1.HUMIDITY has value TABLE1.WINDSPEED
  12. DESIGNING OUR ENGINE THING TEMPERATURE OBS CELCIUS PERCENT produces loc

    HUMIDITY OBS unit TEMPERATURE HUMID 13.0 93.0 TIME 2016-01-01 06:00:00 TABLE1.TEMPERATURE has value has va TABLE1.H MAX( ) ?TEMPERATURE SELECT ?OBS TEMPERATURE OBS a has value ?OBS ?TEMPERATURE has unit ?OBS ?uom { } SELECT MAX( ) ?TEMPERATURE ?OBS TEMPERATURE OBS a has value ?OBS ?TEMPERATURE has unit ?OBS ?uom
  13. DESIGNING OUR ENGINE TEMPERATURE OBS CELCIUS TEMPERATURE 13.0 TABLE1.TEMPERATURE has

    value MAX( ) ?TEMPERATURE SELECT ?OBS TEMPERATURE OBS a has value ?OBS ?TEMPERATURE has unit ?OBS ?uom { } SELECT MAX( ) ?TEMPERATURE ?OBS TEMPERATURE OBS a has value ?OBS ?TEMPERATURE has unit ?OBS ?uom
  14. SPARQL SQL DESIGNING OUR ENGINE MAX( ) ?TEMPERATURE SELECT ?OBS

    TEMPERATURE OBS a has value ?OBS ?TEMPERATURE has unit ?OBS ?uom { } SELECT MAX( ) ?TEMPERATURE ?OBS TEMPERATURE OBS a has value ?OBS ?TEMPERATURE has unit ?OBS ?uom SELECT MAX( ) ?TEMPERATURE ?OBS ?TEMPERATURE ?uom TABLE1.TEMPERATURE CELCIUS NODE_TEMP SELECT MAX( ) TEMPERATURE FROM TABLE1
  15. BENCHMARKS & IOT Scenarios Meteorological SYSTEM ~20,000 Stations 100 –

    300k triples Wind, Rainfall, etc. 10 SRBench Queries ANALYTICS HUB STATION HUB STATION HUB Weather SENSORS Weather SENSORS Weather SENSORS 3 months, 1 home ~30k triples Motion, energy, env 4 Analytics Queries PERSONAL STORE Weather SENSORS Weather SENSORS DEVICES W/ SENSORS SMART HOME ANALYTICS LIGHTWEIGHT COMPUTER COMPUTER/SERVER CLUSTER DEVICE SENSOR Compute & Storage Level of Distribution github.com/eugenesiow/sparql2sql
  16. STORAGE SIZE 3ook Hurricane Ike 1ook NEVADA BLIZZARD 3ok SMART

    HOME OUR APPROACH (s2S) NATIVE STORE (TDB) x15 x68 x112
  17. Get the rainfall observed in a particular hour from all

    stations 01 02 SRBENCH QUERY RESULTS Q01 with an optional clause on unit of measure OUR APPROACH (S2S) NATIVE STORE (TDB) x4.6 x4
  18. 03 04 05 Detect if a hurricane has been observed

    X3.4 Get the average wind speed at the stations where the air temperature is >32 x88 Join between wind observation and temperature observation subtrees time-consuming in low resource environment (Raspberry Pi) X2.7 Detect if a station is observing a blizzard
  19. 06 07 08 Get the stations with extremely low visibility

    X6 Detect stations that are recently broken x14 X5.6 Get the daily minimal and maximal air temperature observed by the sensor at a given location
  20. 09 10 Get the daily average wind force and direction

    observed by the sensor at a given location Get the locations where a heavy snowfall has been observed x305 X7 Our Approach (s2s) is shown to be faster on all queries in the Distributed Meteorological System Join between wind force and wind direction observation subtrees is time-consuming in low resource environment (Raspberry Pi)
  21. Temperature aggregated by hour on a specified day 01 02

    SMART HOME QUERY RESULTS Minimum and maximum temperature each day for a particular month OUR APPROACH (S2S) NATIVE STORE (TDB) x29 x9
  22. 03 04 Energy Usage Per Room By Day Diagnose unattended

    appliances consuming energy with no motion in room x69 Our Approach (s2s) is shown, once again, to be faster on all queries for Smart Home Analytics x3.6 Involves motion and meter data (much larger set), with space-time aggregations and joins between motion and meter tables/subgraphs. Involves meter data (larger set), with space-time aggregations.
  23. WHY IS OUR APPROACH FASTER THAN NATIVE RDF? FASTER AGGREGATIONS

    ON LESS RESOURCES CAN SPECIFCALLY BUILD INDEXES FOR FAST RANGE QUERIES EFFICIENT SQL QUERIES OPTIMISE FLAT & WIDE DATA ACCESS REDUCE JOINS BETWEEN SUBGRAPHS ON THE SAME ROW COLLAPSE INTERMEDIATE NODES REDUCE JOINS W/ BLANK OR FAUX NODES IN MAPPINGS
  24. RELATED WORK Rodriguez-Muro, M., Rezk, M. (2014) Efficient SPARQL-to-SQL with

    R2RML mappings. Web Semantics: Science, Services and Agents on the World Wide Web 33, pp. 141–169 -ontop- morph sparql2stream Priyatna, F., Corcho, O., Sequeda, J. (2014) Formalisation and Experiences of R2RMLbased SPARQL to SQL Query Translation using Morph. Proceedings of the 23rd International Conference on World Wide Web pp. 479–489 GENERAL ONTOLOGY BASED DATA ACCESS ENGINES sparql2sql Siow, Eugene, Tiropanis, Thanassis and Hall, Wendy (2016) SPARQL-to-SQL on internet of things databases and streams. Proceedings of the 15th International Semantic Web Conference (accepted, to be published) github.com/eugenesiow/sparql2sql github.com/eugenesiow/piotre Siow, Eugene, Tiropanis, Thanassis and Hall, Wendy (2016) PIOTRe: Personal IoT Repository. Proceedings of the 15th International Semantic Web Conference P&D (accepted, to be published)
  25. “Until they become conscious they will never rebel and until

    after they have rebelled they cannot become conscious.” DATA OWNERSHIP & DATA LOCALITY DISTRIBUTED LIGHTWEIGHT COMPUTERS FOR STORAGE AND PROCESSING IN THE IOT 1984 by George Orwell LINKED DATA FOR INTEROPERABILITY A rich model to describe things and integrate connected thing’s data NOVEL TIERED LINKED DATA STORE FROM 3 to 3 orders of magnitude performance improvement @eugene_siow