Succinct Trit-array Trie for Scalable Trajectory Similarity Search Shunsuke Kanda1 Koh Takeuchi2,1 Keisuke Fujii3,1 Yasuo Tabei1 1RIKEN AIP 2Kyoto Univ. 3Nagoya Univ. 28th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems November 3–6, 2020, Seattle, Washington, USA (virtual)
Background & Contribution • Background ▹ Massive datasets of spatial trajectories are ubiquitous in research and industry ▹ Similarity search of a huge collection of trajectories is indispensable for turning these datasets into knowledge • Our contribution ▹ Develop an efficient trajectory similarity search method – Powerful measure: Fréchet distance – Fast search: Locality sensitive hashing (LSH) + Trie search algorithm – Scalability: Compressed trie implementation using succinct data structures ▹ Experiments using real-world huge datasets – Demonstrate our method performs superiorly compared to state-of-the-art ones
(Discrete) Fréchet Distance • Often explained by the metaphor using an owner and his dog with a leash ▹ Both walk on their trajectories with their speeds, but cannot go backward ▹ The Fréchet distance is the leash length necessary at least • The computation time is O(traj-length2) by dynamic programing max = Fréchet(owner, dog) The computational demand makes difficult to design an efficient exact solution :( LSH enables us to quickly solve such difficult search problems :)
Approximate Approach 1. Map trajectories on the Fréchet space into integer vectors (i.e., sketches) on the Hamming space 2. Retrieve candidate solutions for sketches using Hamming distance 3. Remove false positives from the candidate solutions P3 P1 Q P2 P4 Trajectories 10230122 10220132 22030132 12031123 10230332 S1 S2 S3 S4 T Sketches Ham(S1 ,T) = 2 Ham(S2 ,T) = 2 Ham(S3 ,T) = 4 Ham(S4 ,T) = 6 Hamming distance Similar Dissimilar LSH Main problem
Approximate Trajectory Similarity Search Problem • Input ▹ Database of n sketches S1 , S2 , …, Sn ▹ Query sketch T ▹ Hamming distance threshold K • Output ▹ All sketches Si such that the Hamming distance to T is within K – i.e., { Si : Ham(Si , T) ≤ K } • Issues :( ▹ Most existing methods are designed for binary sketches and inefficient for integer ones ▹ Existing methods for integer sketches are memory-inefficient We develop a novel similarity search method called tSTAT
Compressed Data Structure: STAT (Succinct Trit-Array Trie) Trie nodes are represented using direct addressable tables H Tree navigation can be performed in O(1) time by Rank/Select queries over H Proposed Method H can be implemented by succinct trit array in bits of compressed space σNin log2 3 + o(Nin ) Close to the theoretically lower-bound space Rank Rank Select We developed an efficient implementation of the succinct trit array supporting Rank/Select (σ: #kinds of integers, Nin : #inner nodes)
Experiments • Dataset: 3.3 million NBA player trajectories of 636 games in the 2015/16 seasons • Queryset: 1000 trajectories randomly extracted from the dataset • Competitors ▹ LS: Strawman baseline with linear search (without any auxiliary data structure) ▹ HmSearch: State-of-the-art of similarity search for integer sketches [SSDBM13] ▹ FRESH: State-of-the-art of approximate trajectory similarity search [WADS19] 17x smaller than FRESH 10x smaller than HmSearch Memory usage (GiB) Fréchet radii R to find 1, 10, and 100 solutions on average per query
Experiments • Dataset: 3.3 million NBA player trajectories of 636 games in the 2015/16 seasons • Queryset: 1000 trajectories randomly extracted from the dataset • Competitors ▹ LS: Strawman baseline with linear search (without any auxiliary data structure) ▹ HmSearch: State-of-the-art of similarity search for integer sketches [SSDBM13] ▹ FRESH: State-of-the-art of approximate trajectory similarity search [WADS19] Average Search Time (ms/query) Fréchet radii R to find 1, 10, and 100 solutions on average per query 34x faster than FRESH 12x faster than HmSearch
Example of querying similar movements using tSTAT • Conclusion ▹ Proposed a novel similarity search method tSTAT ▹ Showed the efficiency through experiments using real-world datasets Date: 12/06/2015 Match: SAC vs OKC PlayerName: Rajon Rondo (No 9) Q4 – 07:09.74 Q4 – 07:00.29 Query Date: 10/31/2015 Match: NOP vs GSW PlayerName: Toney Douglas (No 16) Distance: 0.363737 Result 1 Q3 – 00:36.15 Q3 – 00:31.75 Date: 12/09/2015 Match: SAS vs TOR PlayerName: Tim Duncan (No 21) Distance: 0.423995 Result 2 Q1 – 09:48.32 Q1 – 09:43.59 Date: 01/12/2016 Match: PHX vs IND PlayerName: P. J. Tucker (No 17) Distance: 0.395999 Result 3 Q4 – 06:20.51 Q4 – 06:17.35 Database of 3.3 million trajs • For a short movement of Rajon Rondo in Kings vs. Thunder on Dec. 6, 2015