Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Online Query Processing

Online Query Processing

An overview of the CONTROL project and Online QP. Versions of this talk given at MIT (1/03), Polytechnic (11/02), Harvard (10/02), OGI/PSU (1999), Internet Archive (1999). Distinguished lecture at University of Virginia, 1999.

Joe Hellerstein

September 01, 1999
Tweet

More Decks by Joe Hellerstein

Other Decks in Technology

Transcript

  1. Road Map Requisite tech trends and prognostications CONTROL of the

    situation: online query processing n HCI motivations n Example applications n Example Algorithms n Reordering and Ripple Joins Implications for “hot topics” n Big data, small pipes: P2P, Sensor nets n Continuous Queries over data streams
  2. Computers Keep Getting Faster Moore’s Law n # of transistors

    per chip doubles every 18 months (1965) à 2x price/performance every 18 months n About right, 24 periods running
  3. But Data Grows Faster Yet! Source: J. Porter, Disk/Trend, Inc.

    http://www.disktrend.com/pdf/portrpkg.pdf
  4. Disk Appetite, cont. Greg Papadopoulos, CTO Sun: n Disk sales

    doubling every 9 months Translate: n Time to process all your data doubles every 18 months n MOORE’S LAW INVERTED! Big challenge (opportunity?) for SW systems research n Feel free to discount the analysis above significantly n Even so, traditional “scalability” research won’t help n “Ideal” linear scaleup is NOT NEARLY ENOUGH! n HW will not take us there.
  5. Data Volume: Prognostications Today n SwipeStream n E.g. Wal-Mart >100

    TB Data Warehouse n ClickStream n One service generates >100GB of log per day n Web n Internet Archive WayBack Machine: >100 TB n Replicated OS/Apps, MP3s, etc. Tomorrow: n Sensor feeds galore n Big data, small pipes: Sensor nets, P2P n Continuous queries over data streams
  6. Road Map Requisite tech trends and prognostications CONTROL of the

    situation: online query processing n HCI motivations n Example applications n Example Algorithms n Reordering and Ripple Joins Implications for “hot topics” n P2P query systems n Sensor nets n Stream Query Processing
  7. Drawbacks of Current Systems Only exact answers are available n

    A losing proposition as data volume grows HCI solution: interactive tools don’t do big jobs n E.g., spreadsheet programs (64Krow limit) Systems solution: big jobs aren’t interactive n No user feedback or control in big DBMS queries n “back to the 60’s” n Long processing times n Fundamental mismatch with preferred modes of HCI Best solutions to date precompute n E.g. “OLAP” n Don’t handle ad hoc queries or data sets well
  8. CONTROL Continuous Output, Navigation and Transformation with Refinement On-Line “Of

    all men's miseries, the bitterest is this: to know so much and have control over nothing” -- Herodotus Requirements for CONTROL systems n Early answers n Refinement over time n Interaction and ad-hoc control “Crystal Ball” vs. Black Box n vs. “Lucite Watch”
  9. CONTROL Project, ‘96-’01 Main SW Artifacts n Online Aggregation in

    SQL n Postgres, Informix, IBM DB2 n Potter’s Wheel n Scalable Spreadsheet for Online Data Cleaning n Interactive anomaly detection & transformation n CARMA online association rule mining n 2 Online Data Viz prototypes n CLOUDS & Phoebus Seeded current Telegraph project n Adaptive dataflow for querying networked sources
  10. Goals for Online Processing New “greedy” performance regime n Maximize

    1st derivative of the “mirth index” n Mirth defined on-the-fly n Therefore need FEEDBACK and CONTROL Time J 100% Online Traditional
  11. Road Map Requisite tech trends and prognostications CONTROL of the

    situation: online query processing n HCI motivations n Example applications n Example Algorithms n Reordering and Ripple Joins Implications for “hot topics” n P2P query systems n Sensor nets n Stream Query Processing
  12. Online Aggregation SELECT AVG(temp) FROM t GROUP BY site 330K

    rows in table (synthetic data) the exact answer: Courtesy Peter Haas, IBM
  13. Online Browsing & Editing Potter’s Wheel [VLDB 2001] Scalable spreadsheet

    n A fraction of data is materialized in GUI widget at any time n Scrolling = preference for delivering quantiles to widget Interactive data cleaning n Direct-manipulation interface via visual algebra n Transformation by example Online structure and discrepancy detection n MDL meets ADTs n Minimum Description Length [Rissinen] for modeling data n Based on a grammar of user-defined Abstract Data Types n Simple API for programmers to register new ADTs
  14. Road Map Requisite tech trends and prognostications CONTROL of the

    situation: online query processing n HCI motivations n Example applications n Example Algorithms n Reordering and Ripple Joins Implications for “hot topics” n P2P query systems n Sensor nets n Stream Query Processing
  15. Sampling w/o Replacement We want i.i.d. samples w/o replacement n

    At any time, the input to the query is a sample n Input grows over time Can pre-sort tables randomly n And start scans from random positions n Best I/O behavior n Can re-randomize incrementally in the background n Note: can do this as a “secondary index” Techniques for random sampling in DBs well studied n Both from files and from indexes n Some tricks here (e.g. acceptance-rejection sampling)
  16. An Aside: Sampling in Commercial DBMSs “Simulated” Bernoulli sampling n

    SQL: SELECT * WHERE RAND() <= 0.01 n Similar capability in SAS Bernoulli Sampling with pre-specified rate n Informix, Oracle 8i, (DB2) n Ex: SELECT * FROM T1 SAMPLE ROW(10%), T2 n Ex: SELECT * FROM T1 SAMPLE BLOCK(10%), T2 Not for novices n Need to pre-specify precision n no feedback/control n recall the “multiresolution” patterns from example n No estimators provided in current systems
  17. Preferential Data Delivery Why needed? Examples. n Speedup/slowdown arrows n

    Spreadsheet scrollbars n Pipeline quasi-sort Mechanisms: n Juggle [Raman2/Hellerstein, VLDB ‘99] n General purpose: works with streaming data, etc. n Index stride [Hellerstein/Haas/Wang, SIGMOD ‘97] n Needs index, high I/O cost, but good for outliers n Mix adaptively! [Raman/Hellerstein, ICDE ‘03]
  18. Online Reordering Deliver “interesting” items first n “Interesting” determined on

    the fly Exploit rate gap between produce and process/consume process consume join transmit produce disk
  19. Online Reordering Deliver “interesting” items first n “Interesting” determined on

    the fly based on user gestures at the interface Exploit rate gap between produce and process/consume S T R reorder process consume join transmit produce disk
  20. Policies Want a “good” permutation of items t1 …tn to

    tP1 …tPn Quality of Feedback for a prefix tP1 tP2 …tPk QoF(UP(tP1 ), UP(tP2 ), … UP(tPk )), UP = user preference n determined by application Goodness of reordering: dQoF/dt Delivery Priority (DP): n At any given time, try to deliver tuple that improves QoF most time QoF J
  21. Example: Online Aggregation Metric: avg weighted confidence interval n Preference

    acts as weight on confidence interval n QOF = -Si UPi / (ni )1/2 n ni = number of tuples processed from group i n DPi = UPi / (ni )3/2 Metric: explicit proportional rates (NW QoS) n QOF = -Si (ni — nUPi )2 n where n = Si ni n DPi = nUPj — nj Explicit ranking (scrollbar) n QOF = -Si UPi n DPi = UPi
  22. Need: online join processing n Pipelined: produce output while consuming

    input n Statistically robust: estimates/CIs as you go SELECT AVG(R.a * S.b) FROM R, S WHERE R.c = S.c xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx Ripple Joins [Haas/Hellerstein, SIGMOD ’99]
  23. Need: online join processing n Pipelined: produce output while consuming

    input n Statistically robust: estimates/CIs as you go SELECT AVG(R.a * S.b) FROM R, S WHERE R.c = S.c Solution space n Simple idea: sample a join xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx Ripple Joins [Haas/Hellerstein, SIGMOD ’99] S R x x x x x x x x x x x x x x x x
  24. Need: online join processing n Pipelined: produce output while consuming

    input n Statistically robust: estimates/CIs as you go SELECT AVG(R.a * S.b) FROM R, S WHERE R.c = S.c Solution space n Simple idea: sample a join n Better: join samples xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx Ripple Joins [Haas/Hellerstein, SIGMOD ’99] S R xxxx xxxx xx xxxxxx xx xx xx xx xxxxxxxx S R x x x x x x x x x x x x x x x x
  25. Need: online join processing n Pipelined: produce output while consuming

    input n Statistically robust: estimates/CIs as you go SELECT AVG(R.a * S.b) FROM R, S WHERE R.c = S.c Solution space n Simple idea: sample a join n Better: join samples Ripple Joins n Family of algorithms for joining increasing samples n Optimized to shrink confidence intervals ASAP n Q: Which input to sample faster? “Aspect ratio” of ripples? n A: Adapt to estimator’s “variance contribution” from different inputs xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx Ripple Joins [Haas/Hellerstein, SIGMOD ’99] S R xxxx xxxx xx xxxxxx xx xx xx xx xxxxxxxx S R x x x x x x x x x x x x x x x x
  26. Traditional Nested Loops 3 3 3 3 3 3 3

    3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 S R xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx SELECT AVG(R.a * S.b) FROM R, S WHERE R.c = S.c
  27. Basic Ripple Join Simplest version n read new tuples s

    from S and r from R n join r and s n join r with old S tuples n join s with old R tuples xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx
  28. Basic Ripple Join R xx xx x xxx x x

    x x xxxx S xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx
  29. S R xxxx xxxx xxxx xxxx xxxx xxxx xx xx

    xx xx xx xx Better I/O: Block Ripple Join xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx Batch IOs: a = 2
  30. S R xxxx xxxx xx xxxxxx xx xx xx xx

    xxxxxxxx Adapt: Rectangular Ripple Join xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx bR = 2, bS = 1
  31. Ripple Joins, cont’d Really a family of join algorithms: n

    Block: minimizes I/O in alternating nested loops n Hash: symmetric hash tables to avoid scans n Index: basic index join Adaptive aspect ratio n User sets “frame rate” of animation at GUI n System goal: minibatch optimizations n minimize CI width subject to “frame-rate” time constraint n CI’s computable at corners, so a new frame each corner n Challenge: choose the next minibatch corner n Sample from higher variance-contributing relation faster n System solves linear programming problem to do this (approximately) xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx
  32. Nice and Tidy! But… Significant details missing n Systems issues:

    n Make it work in a re-entrant iterator model n RippleJoin.get_next() ? n Multi-join queries n “Corner” cases -- e.g. EOT on one input of k n Statistical matters n Estimators and CIs for joins of samples n Not textbook statistics! n The adaptivity issue: choosing the next corner n Based on cost (I/O model for algorithm) and benefit (CI formula) n Approximate a non-linear integer programming problem
  33. Implementation Issues Basically a “symmetric” nested loops n “inner” and

    “outer” roles switch back and forth n Note asymmetry in inner loops: “mitres” the box for (ever) { for (i = 1 to max -1) // S fixed, loop on R if (predicate(R[i], S[max])) output(R[i], S[max]); for (i = 1 to max) // R fixed, loop on S if (predicate(R[max], S[i])) output(R[max], S[i]); } x x x x x x x x x x x x x x x x x x x x x x x x x
  34. Now as an iterator Most DBMSs implement query operators as

    iterators n get_next() must re-enter and generate next result tuple n state for square, 2-table ripple join: n cursors on both inputs (as in nested loops join) n target -- where’s the next “corner” n currel -- which relation is currently looping x x x x x x x x x x x x x x x x x x x x x x
  35. Fussy complications in general Non-square aspect ratios n Not just

    “wrapping” 1 layer on each side n must pad side i with bi layers correctly Multiple joins = pipeline of binary joins n But can’t be done naively n “Wrapping” hypercubes not rectangles n Given subtree of size i, factor in b’s and a n Subtree’s hyperplane is: b1 b2 …bi ai tuples n Boundary conditions n Overload control messages: n EOT may mean end of sampling step, not end of table n Need to signal aggregator of arrival at corner (EOT on all tables) n Also, EOTable on a relation may change “mitre-ing” rules! S T R
  36. Empirical Results for Square Ripples (208 sec. Query) xxxx xxxx

    xxxx xxxx xxxx xxxx xx xx xx xx xx xx Select Online AVG(E.grade) From enroll E, student S Where E.sid = S.sid And S.honors_code IS NULL;
  37. Statistical Matters Need robust formulas for n running estimates n

    running CI’s Adaptive update of aspect ratio Efficient maintenance of running statistics xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx
  38. Example SELECT SUM(expression) FROM R, S WHERE predicate Natural estimator

    after n sampling steps: n Unbiased, consistent n A scaled average -- hence amenable to CLT S (r,s) Î Rn x Sn expressionp(r,s) |Rn| |Sn| |R| |S| Running Estimators xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx
  39. Large-Sample Confidence Intervals Review: single table R n # sampling

    steps: 1 << n << |R| n Based on Central Limit Theorem for averages of i.i.d. observations Complications for multiple tables n Estimator for AVG is a ratio of averages n expressionp (r, s) and expressionp (r, s’) are correlated! Solution: n CLT for (ratios of) cross-product averages n Based on theory of “convergence in distribution” n Courtesy Peter Haas, IBM n Also “conservative” CIs that can be used for very small n n Based on Hoeffding inequality n And deterministic CIs for the endgame xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx
  40. General form: For K tables and a 100%p CI: Where:

    n zp is the area under the standard normal curve 0 ± p n n is the number of tuples so far n n d(k) is agg-fn dependent, computed on tuples of input n dn (k) a consistent estimator by applying d(k) to samples so far n a the block size, a constant n bi the number of blocks to fetch from relation i Tradeoff between update rate/CI shrinkage n Bigger b’s = smaller CIs, slower animation CI width = 2 r zp n1/2 r2 = S k = 1 K d(k) abk Large-Sample Confidence Intervals xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx
  41. Cost of block ripple join after n steps: n c

    = b1 b2 …bk aK-1nK + o(nK) Cost of hash ripple after n steps: n c = (b1 + b2 + … + bk )n Non-linear integer programming problem: minimize r2(b1 , b2 , …, bk , a) such that 1. c (b1 , b2 , …, bk , a) £ constant [animation speed] 2. 1 £ bk £ |Rk |/ a for 1 £ k £ K 3. b1 , b2 , …, bk integer = S k = 1 K d(k) abk Aspect Ratio Selection xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx
  42. Aspect Ratio Selection Adaptive procedure: 1. Start with b1 =

    b2 = … = bK = 1 2. Execute m sampling steps (m >= 1) 3. Update each d(k) estimate 4. Solve optimization problem to get new b’s 5. Sweep out new rectangle 6. Go to 2 Approximate optimization algorithm for b’s n First ignore all constraints, minimize r2 n Then scale up all so smallest bi ≥ 1 (constraint 2) n Scale down by ever-larger bi ’s until animation goal met (constraint 1) n Round down to integers (constraint 3) xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx 1. c (b1 , b2 , …, bk , a) £ c 2. 1 £ bk £ |Rk |/ a 3. b1 , b2 , …, bk integer
  43. Empirical Results: aspect ratios xxxx xxxx xxxx xxxx xxxx xxxx

    xx xx xx xx xx xx Select Online AVG(D.grade/A.grade) From enroll D, enroll A Where D.college = Education And A.college = Agriculture And A.year = d.year --> 1x6
  44. Ripple Joins, cont’d Prototypes in Postgres, Informix, IBM DB2 Follow-on

    work n Subqueries in Singapore [Tan, et al, VLDB 99] n Scaling at Wisconsin/IBM [Luo, et al, SIGMOD 02] n Parallel and out-of-core variants n Query optimization issues n Motivated eddies [SIGMOD 99] and the rest of the Telegraph project [CIDR 03] A number of API and other systems issues n Informix implementation [DMKD 2000] xxxx xxxx xxxx xxxx xxxx xxxx xx xx xx xx xx xx
  45. Road Map Requisite tech trends and prognostications CONTROL of the

    situation: online query processing n HCI motivations n Example applications n Example Algorithms n Reordering and Ripple Joins Implications for “hot topics” n P2P query systems n Sensor nets n Stream Query Processing
  46. Two Hot DB Topics Queries over networks n Querying sensor

    nets (e.g. TinyDB) n Querying in P2P networks (e.g. PIER) Continuous queries over data streams n E.g. TelegraphCQ, Aurora, STREAM, NiagaraCQ Natural settings for these ideas!
  47. Queries over Networks: Big Data, Small Pipes Given: Large Data

    Sets n E.g. disks, traces in P2P setting n E.g. the physical world in sensornets Constraint: Small pipes n P2P != cluster [IPTPS03] n Sensor nets have extremely limited comm per battery Will have long-running dataflow programs! So… n Need to provide streaming, approximate answers n Need to prioritize delivery a la juggle n Need to decide adaptively on how to sample physical world n “Acquisitional Query Processing” in TinyDB related to both Ripple Join and Juggle n Need to consider human factors while the query runs
  48. Continuous Queries over Streams Hot topic of the moment n

    Berkeley’s TelegraphCQ, OGI/Wisconsin’s NiagaraCQ, MIT/Brown/Brandeis’ Aurora, Stanford’s STREAM Again, natural setting for these ideas n Need pipelining operators for STREAMS n Interest in approximate answers during bursts n And prioritized delivery n Need to consider human factors! n DB community focused on the next query language n Also need to focus on interactivity! n Initiation of a Continuous Query is only half the battle
  49. CONTROL: Lessons Learned Dream about UIs, work on systems n

    User needs drive systems design n For long-running, data-intensive apps especially Systems and statistics intertwine n Adaptive systems All 3 go together naturally n User desires and behavior: 2 more things to model, predict n “Performance” metrics need to reflect key user needs “What unlike things must meet and mate…” -- Art, Herman Melville
  50. Related Work on Online QP Morgenstein’s PhD, Berkeley ’80 Online

    Association Rules n Ng, et al’s CAP, SIGMOD ’98 n Hidber’s CARMA, SIGMOD ‘99 Implications for deductive DB semantics n Monotone aggregation in LDL++, Zaniolo and Wang Online agg with subqueries n Tan, et al. VLDB ’99 Dynamic Pipeline Scheduling n Urhan/Franklin VLDB ’01 Pipelining Hash Joins n Su/Raschid, Wilschut/Apers, Tukwila, Xjoin n Relation to semi-naive evaluation Anytime Algorithms n Zilberstein, Russell, et al.
  51. Sampling – Design Issues Granularity of sample n Instance-level (row-level):

    high I/O cost n Block-level (page-level): high variability from clustering Type of sample n Often simple random sample (SRS) n Especially for on-the-fly n With/without replacement usually not critical Data structure from which to sample n Files or relational tables n Indexes (B+ trees, etc)
  52. Row-level Sampling Techniques Maintain file in random order n Sampling

    = scan n Is file initially in random order? n Statistical tests needed: e.g., Runs test, Smirnov test n In DB systems: cluster via RAND function n Must “freshen” ordering (online reorg) On-the-fly sampling n Via index on “random” column n Else get random page, then row within page n Ex: extent-map sampling n Problem: variable number of records on page
  53. Acceptance/Rejection Sampling Accept row on page i with probability =

    ni /nMAX Commonly used in other settings n E.g. sampling from joins n E.g. sampling from indexes r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r Original pages Modified pages
  54. Estimation for Aggregates Point estimates n Easy: SUM, COUNT, AVERAGE

    n Hard: MAX, MIN, quantiles, distinct values Confidence intervals – a measure of precision Two cases: single-table and joins
  55. The Good and Bad News Good news: 1/n1/2 magic (n

    chosen on-the-fly) Bad news: needle-in-a-haystack problem
  56. Running confidence intervals (1) Confidence parameter pÎ(0,1) is prespecified Display

    a precision parameter єn such that running aggregate Yn is within ± єn of the final answer µ with probability approximately equal to p. [Yn- єn ,Yn+ єn ] contains µ with probability approximately equal to p
  57. Running confidence intervals (2) Three types to construct from n

    retrieved records: n Conservative confidence intervals based on Hoeffding’s inequality or recent extention of this inequality, for all n>=1 n Large-sample confidence intervals based on central limit theorems (CLT’s), for n both small and large enough n Deterministic confidence intervals contain µ with probability 1, only for very large n
  58. Running confidence intervals (3) SELECT AVG(exp) FROM R; v(i) (1

    £ i £ m): the value of exp when applied to tuple i Li: the random index of the ith tuple retrieved from R a and b are a priori bounds a £ v(i) £ b for 1 £ i £ m Conservative confidence interval equations:
  59. Running confidence intervals (4) Large-sample confidence interval equations By central

    limit theorems (CLT’s) , Yn approaches a normal distribution with a mean (µ) and a variance s2/n as n, the sample size, increases. s2 can be replaced by the estimator Tn,2 (v)
  60. Confidence intervals for ripple joins Use central limit theorems (CLT’s)

    to compute “large-sample” confidence intervals Fix the problems in classic CLT’s with newly defined s2 for different aggregate queries
  61. Ripple optimization:choosing aspect ratios (1) Blocking factor a is prespecified,

    we want to optimize bk ’s – the aspect- ratio parameters minimize such that b1 b2 b3 ...bK aK-1£c (decided by animation speed) 1 £ bk £ mk / a for 1 £ k £ K b1 ,b2 ,b3 ,...bK interger
  62. Precomputation: Explicit OLAP Data Cubes (drill-down hierarchies) n MOLAP, ROLAP,

    HOLAP Semantic hierarchies n APPROXIMATE (Vrbsky, et al.) n Query Relaxation, e.g. CoBase n Multiresolution Data Models (Silberschatz/Reed/Fussell) More general materialized views n See Gupta/Mumick’s text
  63. Precomputation: Stat. Summaries Histograms n Originally for aggregation queries, many

    flavors n Extended to enumeration queries recently n Multi-dimensional histograms Parametric estimation n Wavelets and Fractals n Discrete cosine transform n Regression n Curve fitting and splines n Singular-Value Decomposition (aka LSI, PCA) Indexes: hierarchical histograms n Ranking and pseudo-ranking n Aoki’s use of GiSTs as estimators for ADTs Data Mining n Clustering, classification, other multidimensional models
  64. Precomputed Samples Materialized sample views n Olken’s original work n

    Chaudhuri et al.: join samples n Statistical inferences complicated over “recycled” samples? Barbará’s quasi-cubes AQUA “join synopses” on universal relation Maintenance issues n AQUA’s backing samples Can use fancier/more efficient sampling techniques n Stratified sampling or AQUA’s “congressional” samples Haas and Swami AFV statistics n Combine precomputed “outliers” with on-the-fly samples