IoT Data Challenge#1 Inherently geospatial data ⢠Complex polygons ⢠Existing spatial indices not designed for dynamic data 8 Need robust dynamic spatial indexing
IoT Data Challenge#2 Human generated â Machine generated ⢠Location Based Services (LBS) â Spatial analytics 11 Need online ingestion at massive rates
IoT Data Challenge#3 Heavily skewed ⢠Operating on fresh data better than using stale data at all ⢠Post-ingestion load-balancing not sufficient 12
IoT Data Challenge#3 Heavily skewed ⢠Operating on fresh data better than using stale data at all ⢠Post-ingestion load-balancing not sufficient 13 Need good performance under skews
Problem: Ingest, index & query dynamic spatial data having unpredictable skews at unprecedented rates SIFT: Robust, skew-resistant, massively parallel spatial index
SIFT Design Distributing data When/how to create children Skew-resistant design The Grid File: An Adaptable, Symmetric Multikey File Structure, TODS 84
Cloud Network Latency 0! 0.5! 1! 1.5! 2! 2.5! 3! 3.5! 0! 5! 10! 15! 20! Avg. Query Time (ms)! Number of Machines! No Locality! With Locality! No Locality (Batched)!
Amazon EC2 20 r4.xlarge instances, 30.5GB memory Performance compared against PostGIS & MongoDB Evaluations Dataset Records Size All landmark in USA (Tiger) 122K 406 MB All cities in earth (OSM) 542K 844 MB All parks in earth (OSM) 234K 102 MB All rivers in earth (OSM) 555K 945 MB Taxi trip records 1.1 billion 280 GB Cellular network (partial) 500 million 2 TB Table 2: Real-world datasets used in evaluations (from [27, 45, 49]).
Amazon EC2 20 r4.xlarge instances, 30.5GB memory Performance compared against PostGIS & MongoDB Evaluations Dataset Records Size All landmark in USA (Tiger) 122K 406 MB All cities in earth (OSM) 542K 844 MB All parks in earth (OSM) 234K 102 MB All rivers in earth (OSM) 555K 945 MB Taxi trip records 1.1 billion 280 GB Cellular network (partial) 500 million 2 TB Table 2: Real-world datasets used in evaluations (from [27, 45, 49]).