1; : : : ; n arranged around it, player N starts at 1. I In each round: I Player N calls how many places he would like to move. I Player M determines clockwise or counterclockwise that N will move. 1 2 n
largest number of seats player N can sit in? I full knowledge of the table, inﬁnitely smart, inﬁnite time. I f (2) = 2; f (3) T= 3; f (4) = 4; : : : : Z. Nedev and S. Muthukrishnan: Theor. Comput. Sci. 393(1-3): 124-132 (2008).
web logs, internet packets. I At least 3 distinct algorithmic theories: I Sequential algorithms on 1 m/c (cellphone log data). I Streaming algorithms (internet packet data) I MapReduce algorithms on 10k ’s of m/c s (web data) I Many other examples. I Non examples of massive data: Paul Erdos.
probability at least 1 , ~ F[i] F[i] + " X j T=i F[j ] I Space used is O(1 " log 1 ). I Time per update is O(log 1 ). Indep of n. G. Cormode and S. Muthukrishnan: An improved data stream sum- mary: count-min sketch and its applications. Journal of Algorithms, 55(1): 58-75 (2005).
, ~ F[i] F[i] + " X j T=i F[j ]: I Xi;j is the expected contribution of F[j ] to the bucket containing i, for any h. E(Xi;j ) = " e X j T=i F[j ]: I Consider Pr( ~ F[i] > F[i] + " P j T=i F[j ]): Pr() = Pr( Vj ; F[i] + Xi;j > F[i] + " X j T=i F[j ]) = Pr( Vj ; Xi;j ! eE(Xi;j )) < e log(1=) =
long bitstring and sends messages to BOB who wishes to compute the ith bit. I Needs (n) bits of communication. I Reduction of estimating F[i] in data stream model. I I [1 ¡¡¡1=(2")] such that I I [i ] = 1 ! F [i ] = 2 I I [i ] = 0 ! F [i ] = 0 ;F [0 ] F [0 ]+2 I Observe that jjFjj = P i F[i] = 1="
long bitstring and sends messages to BOB who wishes to compute the ith bit. I Needs (n) bits of communication. I Reduction of estimating F[i] in data stream model. I I [1 ¡¡¡1=(2")] such that I I [i ] = 1 ! F [i ] = 2 I I [i ] = 0 ! F [i ] = 0 ;F [0 ] F [0 ]+2 I Observe that jjFjj = P i F[i] = 1=" I Estimating F[i] ~ F[i] F[i] + " jjFjj implies, I I [i ] = 0 ! F [i ] = 0 ! 0 ~ F [i ] 1 I I [i ] = 1 ! F [i ] = 2 ! 2 ~ F [i ] 3 and reveals I [i]. I Therefore, (1=") space lower bound for index problem.
are the same: I All prior work (1="2) space, via Johnson-Lindenstrauss I Not all hashing algorithms are the same: I Pairwise independence I Not all approximations are sampling. I Recovering F[i] to ¦0:1jFj accuracy will retrieve each item precisely.
I Keep the set S of heavy hitters ( ~ F[i] ! 2" jjFjj). I Guaranteed that S contains i such that F[i] ! 2" jjFjj and no F[i] " jjFjj I Extra log n factor for answering n queries Problem is of database interest.
I Keep the set S of heavy hitters ( ~ F[i] ! 2" jjFjj). I Guaranteed that S contains i such that F[i] ! 2" jjFjj and no F[i] " jjFjj I Extra log n factor for answering n queries Problem is of database interest. I Faster recovery: Hash into buckets such that in each bucket, recover majority i (F[i] > P j same bucket as i F[j ]=2)
I Keep the set S of heavy hitters ( ~ F[i] ! 2" jjFjj). I Guaranteed that S contains i such that F[i] ! 2" jjFjj and no F[i] " jjFjj I Extra log n factor for answering n queries Problem is of database interest. I Faster recovery: Hash into buckets such that in each bucket, recover majority i (F[i] > P j same bucket as i F[j ]=2) I Takes O(log n) extra time, space I Gives compressed sensing in L1 : jjF ~ Fk jj 1 jjF F£ k jj 1 + " jjFjj 1 Sparse recovery experiments: http://groups.csail.mit.edu/toc/sparse/ wiki/index.php?title=Sparse_Recovery_Experiments
(F + G)[i] = minj =1;:::;log(1=) cmF [hj (i)] + cmG[hj (i)] I Good estimate since cmF +G = cmF + cmG GS at AT& T (pure DSMS system): I Ex: For each src IP, ﬁnd heavy hitter destination IP. I Two level arch: fast lightweight low level; high level expensive. I Parallelize by hashing on distinct groupbys, heartbeats, load shedding. I http: //www.corp.att.com/attlabs/docs/ att_gigascope_factsheet_071405.pdf
I Field GS Application: I High speed memory is expensive, ns update times. I Large universe I 1="2 space is prohibitive I Extensions: skipping over stream (CMON at Sprint), distributed (Sawzall),.