Slide 1

Slide 1 text

Efficient Spatial Sampling of Large Geographical Tables (SIGMOD ‘12 / TODS ‘13) Anish Das Sarma, Hongrae Lee, Hector Gonzalez, Jayant Madhavan, Alon Halevy Google Research Presented by Emaad Ahmed Manzoor March 10, 2014

Slide 2

Slide 2 text

Thinning

Slide 3

Slide 3 text

Constraints Objectives Challenges

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Definitions

Slide 7

Slide 7 text

Definitions

Slide 8

Slide 8 text

Visibility Zoom Consistency Adjacency Constraints

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

The Thinning Problem

Slide 11

Slide 11 text

K = 1 M1 = { 4, 4, 4, 4, 4 } M2 = { 1, 3, 4, 4, 4 } M3 = { 2, 3, 4, 4, 4 }

Slide 12

Slide 12 text

Maximality Fairness Importance Objectives

Slide 13

Slide 13 text

K = 1 M1 = { 4, 4, 4, 4, 4 } M2 = { 1, 3, 4, 4, 4 } M3 = { 2, 3, 4, 4, 4 }

Slide 14

Slide 14 text

Problem Maximality Fairness Importance Visibility Zoom Consistency Adjacency Constraints Objectives

Slide 15

Slide 15 text

Problem Maximality Fairness Importance Visibility Zoom Consistency Adjacency Constraints Objectives Optimization

Slide 16

Slide 16 text

Integer Programming

Slide 17

Slide 17 text

Variables

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

Sampling Constraints

Slide 20

Slide 20 text

Zoom Consistency & Visibility Constraints

Slide 21

Slide 21 text

Thinning solution

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

Program Size

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

Critical nodes

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

Bounded Cover

Slide 32

Slide 32 text

Critical nodes

Slide 33

Slide 33 text

Program Size

Slide 34

Slide 34 text

Relaxing Integer Constraints

Slide 35

Slide 35 text

Objectives

Slide 36

Slide 36 text

Maximality

Slide 37

Slide 37 text

Strong Maximality There does not exist M’ such that:

Slide 38

Slide 38 text

K = 1 M1 = { 4, 4, 4, 4, 4 } M2 = { 1, 3, 4, 4, 4 } M3 = { 2, 3, 4, 4, 4 } M4 = { 1, 4, 4, 4, 3 }

Slide 39

Slide 39 text

Strong Maximality is NP-Hard

Slide 40

Slide 40 text

Weak Maximality There does not exist M’ such that: for some 1 <= i <= n

Slide 41

Slide 41 text

K = 1 M1 = { 4, 4, 4, 4, 4 } M2 = { 1, 3, 4, 4, 4 } M3 = { 2, 3, 4, 4, 4 } M4 = { 1, 4, 4, 4, 3 }

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

DFS

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

K = 1 M2 = { 1, 3, 4, 4, 4 }

Slide 47

Slide 47 text

Point-only Datasets

Slide 48

Slide 48 text

No content

Slide 49

Slide 49 text

Experiments

Slide 50

Slide 50 text

2.67GHz quad-core 12GB (starting at 1GB, or 4GB for the scalability tests) Java 1.6 Apache Simplex K=500 “Some plots were too big, so we threw them out.”

Slide 51

Slide 51 text

Program Size

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

Integer Relaxation

Slide 54

Slide 54 text

Scalability

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

Objectives

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

No content

Slide 59

Slide 59 text

Takeaways

Slide 60

Slide 60 text

Use DFS if you care only about maximality Otherwise use the minimised LP The randomized points-only algorithm consumes constant memory and scales arbitrarily (not shown)

Slide 61

Slide 61 text

.