A Scalable Distributed Spatial Index for the Internet-of-Things

0ff46442256bf55681d64027c68beea7?s=47 Anand Iyer
September 26, 2017

A Scalable Distributed Spatial Index for the Internet-of-Things

0ff46442256bf55681d64027c68beea7?s=128

Anand Iyer

September 26, 2017
Tweet

Transcript

  1. 2.

    2

  2. 5.

    From batch data to advanced analytics 5 From live data

    to real-time decisions Big Data Analytics
  3. 6.

    IoT Data Challenge#1 Inherently geospatial data • Complex polygons •

    Existing spatial indices not designed for dynamic data 6
  4. 7.

    IoT Data Challenge#1 Inherently geospatial data • Complex polygons •

    Existing spatial indices not designed for dynamic data 7
  5. 8.

    IoT Data Challenge#1 Inherently geospatial data • Complex polygons •

    Existing spatial indices not designed for dynamic data 8 Need robust dynamic spatial indexing
  6. 9.
  7. 10.

    IoT Data Challenge#2 Human generated → Machine generated • Location

    Based Services (LBS) → Spatial analytics 10
  8. 11.

    IoT Data Challenge#2 Human generated → Machine generated • Location

    Based Services (LBS) → Spatial analytics 11 Need online ingestion at massive rates
  9. 12.

    IoT Data Challenge#3 Heavily skewed • Operating on fresh data

    better than using stale data at all • Post-ingestion load-balancing not sufficient 12
  10. 13.

    IoT Data Challenge#3 Heavily skewed • Operating on fresh data

    better than using stale data at all • Post-ingestion load-balancing not sufficient 13 Need good performance under skews
  11. 15.
  12. 17.

    Problem: Ingest, index & query dynamic spatial data having unpredictable

    skews at unprecedented rates SIFT: Robust, skew-resistant, massively parallel spatial index
  13. 24.

    SIFT Design Distributing data When/how to create children Skew-resistant design

    The Grid File: An Adaptable, Symmetric Multikey File Structure, TODS 84
  14. 27.

    What to use for distribution? SIFT Design 0.9976! 0.9981! 0.9986!

    0.9991! 0.9996! 1.0001! 0! 25000! 50000! 75000! Probability! Area (x 1million m2)!
  15. 32.

    Cloud Network Latency 0! 0.5! 1! 1.5! 2! 2.5! 3!

    3.5! 0! 5! 10! 15! 20! Avg. Query Time (ms)! Number of Machines! No Locality! With Locality! No Locality (Batched)!
  16. 34.

    SIFT Design 0 0 1 2 3 0 1 14

    15 2 3 13 12 4 5 6 7 8 9 10 11 How to parallelize? Need to address node
  17. 35.

    SIFT Design 0 0 1 2 3 0 1 14

    15 2 3 13 12 4 5 6 7 8 9 10 11 How to parallelize? Need to address node
  18. 36.

    SIFT Design 0 0 1 2 3 0 1 14

    15 2 3 13 12 4 5 6 7 8 9 10 11 0, 00* 11* 01* 10*
  19. 38.

    Amazon EC2 20 r4.xlarge instances, 30.5GB memory Performance compared against

    PostGIS & MongoDB Evaluations Dataset Records Size All landmark in USA (Tiger) 122K 406 MB All cities in earth (OSM) 542K 844 MB All parks in earth (OSM) 234K 102 MB All rivers in earth (OSM) 555K 945 MB Taxi trip records 1.1 billion 280 GB Cellular network (partial) 500 million 2 TB Table 2: Real-world datasets used in evaluations (from [27, 45, 49]).
  20. 39.

    Amazon EC2 20 r4.xlarge instances, 30.5GB memory Performance compared against

    PostGIS & MongoDB Evaluations Dataset Records Size All landmark in USA (Tiger) 122K 406 MB All cities in earth (OSM) 542K 844 MB All parks in earth (OSM) 234K 102 MB All rivers in earth (OSM) 555K 945 MB Taxi trip records 1.1 billion 280 GB Cellular network (partial) 500 million 2 TB Table 2: Real-world datasets used in evaluations (from [27, 45, 49]).
  21. 41.

    Evaluations: Indexing 0 0.2 0.4 0.6 0.8 1 0 0.5

    1 1.5 2 Indexing Time (s) Number of Objects Stored (Billions) MongoDB SIFT
  22. 42.

    Evaluations: Indexing 0 0.2 0.4 0.6 0.8 1 0 0.5

    1 1.5 2 Indexing Time (s) Number of Objects Stored (Billions) MongoDB SIFT
  23. 43.

    Evaluations: Indexing 0 100 200 300 400 500 600 700

    0 0.5 1 1.5 2 Index Size (GB) Number of Objects Stored (Billions) MongoDB SIFT 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 Indexing Time (s) Number of Objects Stored (Billions) MongoDB SIFT
  24. 44.

    Evaluations: Indexing 0 100 200 300 400 500 600 700

    0 0.5 1 1.5 2 Index Size (GB) Number of Objects Stored (Billions) MongoDB SIFT 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 Indexing Time (s) Number of Objects Stored (Billions) MongoDB SIFT
  25. 46.

    Evaluations: Querying 0 5 10 15 0 0.5 1 1.5

    2 Query Time (ms) Number of Objects Stored (Billions) MongoDB SIFT
  26. 47.

    Evaluations: Querying 0 5 10 15 0 0.5 1 1.5

    2 Query Time (ms) Number of Objects Stored (Billions) MongoDB SIFT
  27. 48.

    Evaluations: Querying 0 5 10 15 0 0.5 1 1.5

    2 Query Time (ms) Number of Objects Stored (Billions) MongoDB SIFT 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 Probability Query Time (ms) SIFT MongoDB
  28. 50.

    Evaluations: Skew Handling 0 5 10 15 20 25 0

    0.5 1 1.5 2 Machines Used Number of Objects Stored (Billions) MongoDB SIFT
  29. 51.

    Evaluations: Skew Handling 0 50 100 150 200 250 0

    0.5 1 1.5 Chunks/Partition Number of Objects Stored (Billions) 0 5 10 15 20 25 0 0.5 1 1.5 2 Machines Used Number of Objects Stored (Billions) MongoDB SIFT
  30. 52.

    Summary Emerging IoT workloads challenging • Inherently geospatial, heavy skews,

    unprecedented volume • Need efficient support for storing & querying Our solution, SIFT: • Robust, skew-resistant, massively parallel • Performs well 52