SIGSPATIAL20

Succinct Trit-array Trie for Scalable Trajectory Similarity Search Shunsuke Kanda1
Koh Takeuchi2,1 Keisuke Fujii3,1 Yasuo Tabei1 1RIKEN AIP 2Kyoto Univ. 3Nagoya Univ. 28th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems November 3–6, 2020, Seattle, Washington, USA (virtual)

Background & Contribution • Background ▹ Massive datasets of spatial
trajectories are ubiquitous in research and industry ▹ Similarity search of a huge collection of trajectories is indispensable for turning these datasets into knowledge • Our contribution ▹ Develop an efﬁcient trajectory similarity search method – Powerful measure: Fréchet distance – Fast search: Locality sensitive hashing (LSH) + Trie search algorithm – Scalability: Compressed trie implementation using succinct data structures ▹ Experiments using real-world huge datasets – Demonstrate our method performs superiorly compared to state-of-the-art ones

(Discrete) Fréchet Distance • Often explained by the metaphor using
an owner and his dog with a leash ▹ Both walk on their trajectories with their speeds, but cannot go backward ▹ The Fréchet distance is the leash length necessary at least • The computation time is O(traj-length2) by dynamic programing max = Fréchet(owner, dog) The computational demand makes difficult to design an efficient exact solution :( LSH enables us to quickly solve such difficult search problems :)

Approximate Approach 1. Map trajectories on the Fréchet space into
integer vectors (i.e., sketches) on the Hamming space 2. Retrieve candidate solutions for sketches using Hamming distance 3. Remove false positives from the candidate solutions P3 P1 Q P2 P4 Trajectories 10230122 10220132 22030132 12031123 10230332 S1 S2 S3 S4 T Sketches Ham(S1 ,T) = 2 Ham(S2 ,T) = 2 Ham(S3 ,T) = 4 Ham(S4 ,T) = 6 Hamming distance Similar Dissimilar LSH Main problem

Approximate Trajectory Similarity Search Problem • Input ▹ Database of
n sketches S1 , S2 , …, Sn ▹ Query sketch T ▹ Hamming distance threshold K • Output ▹ All sketches Si such that the Hamming distance to T is within K – i.e., { Si : Ham(Si , T) ≤ K } • Issues :( ▹ Most existing methods are designed for binary sketches and inefﬁcient for integer ones ▹ Existing methods for integer sketches are memory-inefﬁcient We develop a novel similarity search method called tSTAT

tSTAT (trajectory-indexing Succinct Trit-Array Trie) H 0 2 1 0
0 0 1 2 3 G 1 1 V 1 4 1 1 0 2 0 1 2 3 H 1 1 G 2 3 V 2 H 2 0 0 1 0 0 0 1 0 0 1 2 3 4 5 6 7 H 3 0 0 0 2 0 0 2 2 0 1 2 3 4 5 6 7 1 0 1 1 G 4 5 6 2 1 V 4 0 1 2 3 3 1 1 0 0 1 0 1 1 1 2 2 2 2 2 3 1 2 1 0 1,4 2 3,6 3 1 1 1 1 2 2 1 0 0 1 3 2 3 2 2 3 5,6 2 3 1 0 0 1 1 1 1 2 2 1 0 1 3 2 3 2 2 3 4 5,6 2 3 1 0 0 1 1 2 2 1 0 1 3 4 5,6 3 1,2 A2 -> 3 0 1 2 3 4 level: 0 1 2 level: Step1: Partition each sketch into B blocks based on the multi-index approach Step1 Step2 Step3 Enables us to divide the Hamming distance problem with large threshold K into B sub-problems with small threshold ⌊K/B⌋ Proposed Method Although false positives can arise, they can be safely removed in the ﬁnal veriﬁcation Block1 Block2 Threshold K = 3 becomes sub-thresholds ⌊3/2⌋=1

0 0 1 2 3 G 1 1 V 1 4 1 1 0 2 0 1 2 3 H 1 1 G 2 3 V 2 H 2 0 0 1 0 0 0 1 0 0 1 2 3 4 5 6 7 H 3 0 0 0 2 0 0 2 2 0 1 2 3 4 5 6 7 1 0 1 1 G 4 5 6 2 1 V 4 0 1 2 3 3 1 1 0 0 1 0 1 1 1 2 2 2 2 2 3 1 2 1 0 1,4 2 3,6 3 1 1 1 1 2 2 1 0 0 1 3 2 3 2 2 3 5,6 2 3 1 0 0 1 1 1 1 2 2 1 0 1 3 2 3 2 2 3 4 5,6 2 3 1 0 0 1 1 2 2 1 0 1 3 4 5,6 3 1,2 A2 -> 3 0 1 2 3 4 level: 0 1 2 level: Step2: Index each block using a trie where redundant nodes are eliminated Step1 Step2 Step3 The Hamming distance problem can be solved by traversing trie nodes while counting #errors for the query Proposed Method Similar Block1 Block2 The search takes O(B(L/B)K/B+2) time, where B is #blocks and L is sketch length Stop traversing down when #errors >⌊3/2⌋=1 Threshold K = 3 becomes sub-thresholds ⌊3/2⌋=1 0 0 2 3 Query:

0 0 1 2 3 G 1 1 V 1 4 1 1 0 2 0 1 2 3 H 1 1 G 2 3 V 2 H 2 0 0 1 0 0 0 1 0 0 1 2 3 4 5 6 7 H 3 0 0 0 2 0 0 2 2 0 1 2 3 4 5 6 7 1 0 1 1 G 4 5 6 2 1 V 4 0 1 2 3 3 1 1 0 0 1 0 1 1 1 2 2 2 2 2 3 1 2 1 0 1,4 2 3,6 3 1 1 1 1 2 2 1 0 0 1 3 2 3 2 2 3 5,6 2 3 1 0 0 1 1 1 1 2 2 1 0 1 3 2 3 2 2 3 4 5,6 2 3 1 0 0 1 1 2 2 1 0 1 3 4 5,6 3 1,2 A2 -> 3 0 1 2 3 4 level: 0 1 2 level: Step3: Implement the trie index using novel data structure STAT in compressed space Step1 Step2 Step3 Proposed Method Leverage succinct data structures (compressed ones supporting various data operations) Block1 Block2

Compressed Data Structure: STAT (Succinct Trit-Array Trie) Trie nodes are
represented using direct addressable tables H Tree navigation can be performed in O(1) time by Rank/Select queries over H Proposed Method H can be implemented by succinct trit array in bits of compressed space σNin log2 3 + o(Nin ) Close to the theoretically lower-bound space Rank Rank Select We developed an efﬁcient implementation of the succinct trit array supporting Rank/Select (σ: #kinds of integers, Nin : #inner nodes)

0 0 1 2 3 G 1 1 V 1 4 1 1 0 2 0 1 2 3 H 1 1 G 2 3 V 2 H 2 0 0 1 0 0 0 1 0 0 1 2 3 4 5 6 7 H 3 0 0 0 2 0 0 2 2 0 1 2 3 4 5 6 7 1 0 1 1 G 4 5 6 2 1 V 4 0 1 2 3 3 1 1 0 0 1 0 1 1 1 2 2 2 2 2 3 1 2 1 0 1,4 2 3,6 3 1 1 1 1 2 2 1 0 0 1 3 2 3 2 2 3 5,6 2 3 1 0 0 1 1 1 1 2 2 1 0 1 3 2 3 2 2 3 4 5,6 2 3 1 0 0 1 1 2 2 1 0 1 3 4 5,6 3 1,2 A2 -> 3 0 1 2 3 4 level: 0 1 2 level: Step1 Step2 Step3 Proposed Method Block1 Block2 Step1: Partition each sketch into B blocks based on the multi-index approach Step2: Index each block using a trie where redundant nodes are eliminated Step3: Implement the trie index using novel data structure STAT in compressed space

Experiments • Dataset: 3.3 million NBA player trajectories of 636
games in the 2015/16 seasons • Queryset: 1000 trajectories randomly extracted from the dataset • Competitors ▹ LS: Strawman baseline with linear search (without any auxiliary data structure) ▹ HmSearch: State-of-the-art of similarity search for integer sketches [SSDBM13] ▹ FRESH: State-of-the-art of approximate trajectory similarity search [WADS19] 17x smaller than FRESH 10x smaller than HmSearch Memory usage (GiB) Fréchet radii R to ﬁnd 1, 10, and 100 solutions on average per query

Experiments • Dataset: 3.3 million NBA player trajectories of 636
games in the 2015/16 seasons • Queryset: 1000 trajectories randomly extracted from the dataset • Competitors ▹ LS: Strawman baseline with linear search (without any auxiliary data structure) ▹ HmSearch: State-of-the-art of similarity search for integer sketches [SSDBM13] ▹ FRESH: State-of-the-art of approximate trajectory similarity search [WADS19] Average Search Time (ms/query) Fréchet radii R to ﬁnd 1, 10, and 100 solutions on average per query 34x faster than FRESH 12x faster than HmSearch

Example of querying similar movements using tSTAT • Conclusion ▹
Proposed a novel similarity search method tSTAT ▹ Showed the efﬁciency through experiments using real-world datasets Date: 12/06/2015 Match: SAC vs OKC PlayerName: Rajon Rondo (No 9) Q4 – 07:09.74 Q4 – 07:00.29 Query Date: 10/31/2015 Match: NOP vs GSW PlayerName: Toney Douglas (No 16) Distance: 0.363737 Result 1 Q3 – 00:36.15 Q3 – 00:31.75 Date: 12/09/2015 Match: SAS vs TOR PlayerName: Tim Duncan (No 21) Distance: 0.423995 Result 2 Q1 – 09:48.32 Q1 – 09:43.59 Date: 01/12/2016 Match: PHX vs IND PlayerName: P. J. Tucker (No 17) Distance: 0.395999 Result 3 Q4 – 06:20.51 Q4 – 06:17.35 Database of 3.3 million trajs • For a short movement of Rajon Rondo in Kings vs. Thunder on Dec. 6, 2015

SIGSPATIAL20

SIGSPATIAL20

Shunsuke Kanda

More Decks by Shunsuke Kanda

Other Decks in Research

Featured

Transcript

Succinct Trit-array Trie for Scalable Trajectory Similarity Search Shunsuke Kanda1

Background & Contribution • Background ▹ Massive datasets of spatial

(Discrete) Fréchet Distance • Often explained by the metaphor using

Approximate Approach 1. Map trajectories on the Fréchet space into

Approximate Trajectory Similarity Search Problem • Input ▹ Database of

tSTAT (trajectory-indexing Succinct Trit-Array Trie) H 0 2 1 0

tSTAT (trajectory-indexing Succinct Trit-Array Trie) H 0 2 1 0

tSTAT (trajectory-indexing Succinct Trit-Array Trie) H 0 2 1 0

Compressed Data Structure: STAT (Succinct Trit-Array Trie) Trie nodes are

tSTAT (trajectory-indexing Succinct Trit-Array Trie) H 0 2 1 0

Experiments • Dataset: 3.3 million NBA player trajectories of 636

Experiments • Dataset: 3.3 million NBA player trajectories of 636

Example of querying similar movements using tSTAT • Conclusion ▹

SIGSPATIAL20

SIGSPATIAL20

More Decks by Shunsuke Kanda

Other Decks in Research

Featured

Transcript

SIGSPATIAL20

SIGSPATIAL20