Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Scalable Distributed Spatial Index for the I...

Anand Iyer
September 26, 2017

A Scalable Distributed Spatial Index for the Internet-of-Things

Anand Iyer

September 26, 2017
Tweet

More Decks by Anand Iyer

Other Decks in Research

Transcript

  1. 2

  2. From batch data to advanced analytics 5 From live data

    to real-time decisions Big Data Analytics
  3. IoT Data Challenge#1 Inherently geospatial data • Complex polygons •

    Existing spatial indices not designed for dynamic data 6
  4. IoT Data Challenge#1 Inherently geospatial data • Complex polygons •

    Existing spatial indices not designed for dynamic data 7
  5. IoT Data Challenge#1 Inherently geospatial data • Complex polygons •

    Existing spatial indices not designed for dynamic data 8 Need robust dynamic spatial indexing
  6. IoT Data Challenge#2 Human generated → Machine generated • Location

    Based Services (LBS) → Spatial analytics 10
  7. IoT Data Challenge#2 Human generated → Machine generated • Location

    Based Services (LBS) → Spatial analytics 11 Need online ingestion at massive rates
  8. IoT Data Challenge#3 Heavily skewed • Operating on fresh data

    better than using stale data at all • Post-ingestion load-balancing not sufficient 12
  9. IoT Data Challenge#3 Heavily skewed • Operating on fresh data

    better than using stale data at all • Post-ingestion load-balancing not sufficient 13 Need good performance under skews
  10. Problem: Ingest, index & query dynamic spatial data having unpredictable

    skews at unprecedented rates SIFT: Robust, skew-resistant, massively parallel spatial index
  11. SIFT Design Distributing data When/how to create children Skew-resistant design

    The Grid File: An Adaptable, Symmetric Multikey File Structure, TODS 84
  12. What to use for distribution? SIFT Design 0.9976! 0.9981! 0.9986!

    0.9991! 0.9996! 1.0001! 0! 25000! 50000! 75000! Probability! Area (x 1million m2)!
  13. Cloud Network Latency 0! 0.5! 1! 1.5! 2! 2.5! 3!

    3.5! 0! 5! 10! 15! 20! Avg. Query Time (ms)! Number of Machines! No Locality! With Locality! No Locality (Batched)!
  14. SIFT Design 0 0 1 2 3 0 1 14

    15 2 3 13 12 4 5 6 7 8 9 10 11 How to parallelize? Need to address node
  15. SIFT Design 0 0 1 2 3 0 1 14

    15 2 3 13 12 4 5 6 7 8 9 10 11 How to parallelize? Need to address node
  16. SIFT Design 0 0 1 2 3 0 1 14

    15 2 3 13 12 4 5 6 7 8 9 10 11 0, 00* 11* 01* 10*
  17. Amazon EC2 20 r4.xlarge instances, 30.5GB memory Performance compared against

    PostGIS & MongoDB Evaluations Dataset Records Size All landmark in USA (Tiger) 122K 406 MB All cities in earth (OSM) 542K 844 MB All parks in earth (OSM) 234K 102 MB All rivers in earth (OSM) 555K 945 MB Taxi trip records 1.1 billion 280 GB Cellular network (partial) 500 million 2 TB Table 2: Real-world datasets used in evaluations (from [27, 45, 49]).
  18. Amazon EC2 20 r4.xlarge instances, 30.5GB memory Performance compared against

    PostGIS & MongoDB Evaluations Dataset Records Size All landmark in USA (Tiger) 122K 406 MB All cities in earth (OSM) 542K 844 MB All parks in earth (OSM) 234K 102 MB All rivers in earth (OSM) 555K 945 MB Taxi trip records 1.1 billion 280 GB Cellular network (partial) 500 million 2 TB Table 2: Real-world datasets used in evaluations (from [27, 45, 49]).
  19. Evaluations: Indexing 0 0.2 0.4 0.6 0.8 1 0 0.5

    1 1.5 2 Indexing Time (s) Number of Objects Stored (Billions) MongoDB SIFT
  20. Evaluations: Indexing 0 0.2 0.4 0.6 0.8 1 0 0.5

    1 1.5 2 Indexing Time (s) Number of Objects Stored (Billions) MongoDB SIFT
  21. Evaluations: Indexing 0 100 200 300 400 500 600 700

    0 0.5 1 1.5 2 Index Size (GB) Number of Objects Stored (Billions) MongoDB SIFT 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 Indexing Time (s) Number of Objects Stored (Billions) MongoDB SIFT
  22. Evaluations: Indexing 0 100 200 300 400 500 600 700

    0 0.5 1 1.5 2 Index Size (GB) Number of Objects Stored (Billions) MongoDB SIFT 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 Indexing Time (s) Number of Objects Stored (Billions) MongoDB SIFT
  23. Evaluations: Querying 0 5 10 15 0 0.5 1 1.5

    2 Query Time (ms) Number of Objects Stored (Billions) MongoDB SIFT
  24. Evaluations: Querying 0 5 10 15 0 0.5 1 1.5

    2 Query Time (ms) Number of Objects Stored (Billions) MongoDB SIFT
  25. Evaluations: Querying 0 5 10 15 0 0.5 1 1.5

    2 Query Time (ms) Number of Objects Stored (Billions) MongoDB SIFT 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 Probability Query Time (ms) SIFT MongoDB
  26. Evaluations: Skew Handling 0 5 10 15 20 25 0

    0.5 1 1.5 2 Machines Used Number of Objects Stored (Billions) MongoDB SIFT
  27. Evaluations: Skew Handling 0 50 100 150 200 250 0

    0.5 1 1.5 Chunks/Partition Number of Objects Stored (Billions) 0 5 10 15 20 25 0 0.5 1 1.5 2 Machines Used Number of Objects Stored (Billions) MongoDB SIFT
  28. Summary Emerging IoT workloads challenging • Inherently geospatial, heavy skews,

    unprecedented volume • Need efficient support for storing & querying Our solution, SIFT: • Robust, skew-resistant, massively parallel • Performs well 52