V-optimal histograms
A dataset with n objects
Reduce n to b bins where b n
Formally, assume a set V of n (sorted) values v1, v2, . . . , vn having
frequencies f1, f2, . . . , fn respectively
Problem is to output another histogram H having b bins, i.e., b
non-overlapping intervals on V
Interval Ii is of the form [li , ri ] and has a value hi
If value vj ∈ Ii , estimate e(vj ) of fj is hi
Error in estimation is distance d(f , e)
Arnab Bhattacharya (
[email protected]) CS685: Preprocessing 1 2012-13 39 / 43