Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SIGSPATIAL20

 SIGSPATIAL20

The presentation slide of "Succinct Trit-array Trie for Scalable Trajectory Similarity Search" in SIGSPATIAL20

7336da77de517e04e2438553e4f8071d?s=128

Shunsuke Kanda

November 05, 2020
Tweet

Transcript

  1. Succinct Trit-array Trie for Scalable Trajectory Similarity Search Shunsuke Kanda1

    Koh Takeuchi2,1 Keisuke Fujii3,1 Yasuo Tabei1 1RIKEN AIP 2Kyoto Univ. 3Nagoya Univ. 28th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems November 3–6, 2020, Seattle, Washington, USA (virtual)
  2. Background & Contribution • Background ▹ Massive datasets of spatial

    trajectories are ubiquitous in research and industry ▹ Similarity search of a huge collection of trajectories is indispensable for turning these datasets into knowledge • Our contribution ▹ Develop an efficient trajectory similarity search method – Powerful measure: Fréchet distance – Fast search: Locality sensitive hashing (LSH) + Trie search algorithm – Scalability: Compressed trie implementation using succinct data structures ▹ Experiments using real-world huge datasets – Demonstrate our method performs superiorly compared to state-of-the-art ones
  3. (Discrete) Fréchet Distance • Often explained by the metaphor using

    an owner and his dog with a leash ▹ Both walk on their trajectories with their speeds, but cannot go backward ▹ The Fréchet distance is the leash length necessary at least • The computation time is O(traj-length2) by dynamic programing max = Fréchet(owner, dog) The computational demand makes difficult to design an efficient exact solution :( LSH enables us to quickly solve such difficult search problems :)
  4. Approximate Approach 1. Map trajectories on the Fréchet space into

    integer vectors (i.e., sketches) on the Hamming space 2. Retrieve candidate solutions for sketches using Hamming distance 3. Remove false positives from the candidate solutions P3 P1 Q P2 P4 Trajectories 10230122 10220132 22030132 12031123 10230332 S1 S2 S3 S4 T Sketches Ham(S1 ,T) = 2 Ham(S2 ,T) = 2 Ham(S3 ,T) = 4 Ham(S4 ,T) = 6 Hamming distance Similar Dissimilar LSH Main problem
  5. Approximate Trajectory Similarity Search Problem • Input ▹ Database of

    n sketches S1 , S2 , …, Sn ▹ Query sketch T ▹ Hamming distance threshold K • Output ▹ All sketches Si such that the Hamming distance to T is within K – i.e., { Si : Ham(Si , T) ≤ K } • Issues :( ▹ Most existing methods are designed for binary sketches and inefficient for integer ones ▹ Existing methods for integer sketches are memory-inefficient We develop a novel similarity search method called tSTAT
  6. tSTAT (trajectory-indexing Succinct Trit-Array Trie) H 0 2 1 0

    0 0 1 2 3 G 1 1 V 1 4 1 1 0 2 0 1 2 3 H 1 1 G 2 3 V 2 H 2 0 0 1 0 0 0 1 0 0 1 2 3 4 5 6 7 H 3 0 0 0 2 0 0 2 2 0 1 2 3 4 5 6 7 1 0 1 1 G 4 5 6 2 1 V 4 0 1 2 3 3 1 1 0 0 1 0 1 1 1 2 2 2 2 2 3 1 2 1 0 1,4 2 3,6 3 1 1 1 1 2 2 1 0 0 1 3 2 3 2 2 3 5,6 2 3 1 0 0 1 1 1 1 2 2 1 0 1 3 2 3 2 2 3 4 5,6 2 3 1 0 0 1 1 2 2 1 0 1 3 4 5,6 3 1,2 A2 -> 3 0 1 2 3 4 level: 0 1 2 level: Step1: Partition each sketch into B blocks based on the multi-index approach Step1 Step2 Step3 Enables us to divide the Hamming distance problem with large threshold K into B sub-problems with small threshold ⌊K/B⌋ Proposed Method Although false positives can arise, they can be safely removed in the final verification Block1 Block2 Threshold K = 3 becomes sub-thresholds ⌊3/2⌋=1
  7. tSTAT (trajectory-indexing Succinct Trit-Array Trie) H 0 2 1 0

    0 0 1 2 3 G 1 1 V 1 4 1 1 0 2 0 1 2 3 H 1 1 G 2 3 V 2 H 2 0 0 1 0 0 0 1 0 0 1 2 3 4 5 6 7 H 3 0 0 0 2 0 0 2 2 0 1 2 3 4 5 6 7 1 0 1 1 G 4 5 6 2 1 V 4 0 1 2 3 3 1 1 0 0 1 0 1 1 1 2 2 2 2 2 3 1 2 1 0 1,4 2 3,6 3 1 1 1 1 2 2 1 0 0 1 3 2 3 2 2 3 5,6 2 3 1 0 0 1 1 1 1 2 2 1 0 1 3 2 3 2 2 3 4 5,6 2 3 1 0 0 1 1 2 2 1 0 1 3 4 5,6 3 1,2 A2 -> 3 0 1 2 3 4 level: 0 1 2 level: Step2: Index each block using a trie where redundant nodes are eliminated Step1 Step2 Step3 The Hamming distance problem can be solved by traversing trie nodes while counting #errors for the query Proposed Method Similar Block1 Block2 The search takes O(B(L/B)K/B+2) time, where B is #blocks and L is sketch length Stop traversing down when #errors >⌊3/2⌋=1 Threshold K = 3 becomes sub-thresholds ⌊3/2⌋=1 0 0 2 3 Query:
  8. tSTAT (trajectory-indexing Succinct Trit-Array Trie) H 0 2 1 0

    0 0 1 2 3 G 1 1 V 1 4 1 1 0 2 0 1 2 3 H 1 1 G 2 3 V 2 H 2 0 0 1 0 0 0 1 0 0 1 2 3 4 5 6 7 H 3 0 0 0 2 0 0 2 2 0 1 2 3 4 5 6 7 1 0 1 1 G 4 5 6 2 1 V 4 0 1 2 3 3 1 1 0 0 1 0 1 1 1 2 2 2 2 2 3 1 2 1 0 1,4 2 3,6 3 1 1 1 1 2 2 1 0 0 1 3 2 3 2 2 3 5,6 2 3 1 0 0 1 1 1 1 2 2 1 0 1 3 2 3 2 2 3 4 5,6 2 3 1 0 0 1 1 2 2 1 0 1 3 4 5,6 3 1,2 A2 -> 3 0 1 2 3 4 level: 0 1 2 level: Step3: Implement the trie index using novel data structure STAT in compressed space Step1 Step2 Step3 Proposed Method Leverage succinct data structures (compressed ones supporting various data operations) Block1 Block2
  9. Compressed Data Structure: STAT (Succinct Trit-Array Trie) Trie nodes are

    represented using direct addressable tables H Tree navigation can be performed in O(1) time by Rank/Select queries over H Proposed Method H can be implemented by succinct trit array in bits of compressed space σNin log2 3 + o(Nin ) Close to the theoretically lower-bound space Rank Rank Select We developed an efficient implementation of the succinct trit array supporting Rank/Select (σ: #kinds of integers, Nin : #inner nodes)
  10. tSTAT (trajectory-indexing Succinct Trit-Array Trie) H 0 2 1 0

    0 0 1 2 3 G 1 1 V 1 4 1 1 0 2 0 1 2 3 H 1 1 G 2 3 V 2 H 2 0 0 1 0 0 0 1 0 0 1 2 3 4 5 6 7 H 3 0 0 0 2 0 0 2 2 0 1 2 3 4 5 6 7 1 0 1 1 G 4 5 6 2 1 V 4 0 1 2 3 3 1 1 0 0 1 0 1 1 1 2 2 2 2 2 3 1 2 1 0 1,4 2 3,6 3 1 1 1 1 2 2 1 0 0 1 3 2 3 2 2 3 5,6 2 3 1 0 0 1 1 1 1 2 2 1 0 1 3 2 3 2 2 3 4 5,6 2 3 1 0 0 1 1 2 2 1 0 1 3 4 5,6 3 1,2 A2 -> 3 0 1 2 3 4 level: 0 1 2 level: Step1 Step2 Step3 Proposed Method Block1 Block2 Step1: Partition each sketch into B blocks based on the multi-index approach Step2: Index each block using a trie where redundant nodes are eliminated Step3: Implement the trie index using novel data structure STAT in compressed space
  11. Experiments • Dataset: 3.3 million NBA player trajectories of 636

    games in the 2015/16 seasons • Queryset: 1000 trajectories randomly extracted from the dataset • Competitors ▹ LS: Strawman baseline with linear search (without any auxiliary data structure) ▹ HmSearch: State-of-the-art of similarity search for integer sketches [SSDBM13] ▹ FRESH: State-of-the-art of approximate trajectory similarity search [WADS19] 17x smaller than FRESH 10x smaller than HmSearch Memory usage (GiB) Fréchet radii R to find 1, 10, and 100 solutions on average per query
  12. Experiments • Dataset: 3.3 million NBA player trajectories of 636

    games in the 2015/16 seasons • Queryset: 1000 trajectories randomly extracted from the dataset • Competitors ▹ LS: Strawman baseline with linear search (without any auxiliary data structure) ▹ HmSearch: State-of-the-art of similarity search for integer sketches [SSDBM13] ▹ FRESH: State-of-the-art of approximate trajectory similarity search [WADS19] Average Search Time (ms/query) Fréchet radii R to find 1, 10, and 100 solutions on average per query 34x faster than FRESH 12x faster than HmSearch
  13. Example of querying similar movements using tSTAT • Conclusion ▹

    Proposed a novel similarity search method tSTAT ▹ Showed the efficiency through experiments using real-world datasets Date: 12/06/2015 Match: SAC vs OKC PlayerName: Rajon Rondo (No 9) Q4 – 07:09.74 Q4 – 07:00.29 Query Date: 10/31/2015 Match: NOP vs GSW PlayerName: Toney Douglas (No 16) Distance: 0.363737 Result 1 Q3 – 00:36.15 Q3 – 00:31.75 Date: 12/09/2015 Match: SAS vs TOR PlayerName: Tim Duncan (No 21) Distance: 0.423995 Result 2 Q1 – 09:48.32 Q1 – 09:43.59 Date: 01/12/2016 Match: PHX vs IND PlayerName: P. J. Tucker (No 17) Distance: 0.395999 Result 3 Q4 – 06:20.51 Q4 – 06:17.35 Database of 3.3 million trajs • For a short movement of Rajon Rondo in Kings vs. Thunder on Dec. 6, 2015