Slide 1

Slide 1 text

Oracle Meets Fractals and Learns the Power of Power Laws Dr. Neil Gunther Performance Dynamics NoCOUG Winter Conference 2012 Thursday, February 23 @ 1 pm SM c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 1 / 44

Slide 2

Slide 2 text

Fractals Outline 1 Fractals What is a Fractal? How It Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 1 / 44

Slide 3

Slide 3 text

Fractals What is a Fractal? Outline 1 Fractals What is a Fractal? How It Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 1 / 44

Slide 4

Slide 4 text

Fractals What is a Fractal? Fractals in Space c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 2 / 44

Slide 5

Slide 5 text

Fractals What is a Fractal? Fractals are Described by Power Laws Zipf’s law 5e+02 2e+03 5e+03 2e+04 5e+04 2e+05 Zipf's Law for UK English Words Ranked Words (W) Frequency (F) the it at would much us love lay eye dare Created by NJG Sat Oct 24 11:15:45 2009 F = W!D D = 1.1311 c = 981293.8 SQL accesses 1 2 5 10 20 50 100 100 200 300 400 500 Log-Log SQL Times [1:100] Index etimesA Internet packets DRAFT log10(d) log10(r/s) 0 1 2 3 4 5 0 1 2 3 4 Website visitors !"#$%&'()*')+,'-./'0+-/1+/-' ' $ !"$ "$ !"$ #$ !"$ $$ !"$ "$ !"$ #$ !"$ $$ %&'()*+,-+&%./&)+0.1.2,*1$ *3%4+,-+1.2)$ 567+0.1.2,*1+2,+1.2)1$ -.2+8.29+ !$ +:+!$ c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 3 / 44

Slide 6

Slide 6 text

Fractals How It Works Outline 1 Fractals What is a Fractal? How It Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 3 / 44

Slide 7

Slide 7 text

Fractals How It Works Calibrating a Circumference Approximating circumference by regular polygons with successively shorter sides. Polygon represents measurement device or ruler Double-log plot of the estimated circumference (y-axis) vs. the length of the polygon side (x-axis). As the sides get shorter the perimeter of the polygon approaches the actual circumference. 1.00 0.50 0.20 0.30 0.15 1.50 0.70 Ln S 6.1 6.15 6.2 6.25 Ln P Estimated perimeter vs. length of polygon sides Euclidean geometry of the circle Greeks knew (irrational) ratio of diameter D to circumference C: π = C/D Successive measurements converge to fixed value: C Speed of convergence is clearly seen on logarithmic axes. c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 4 / 44

Slide 8

Slide 8 text

Fractals How It Works Crinkly Coastlines What about highly irregular shapes like coastline of Britain? Greeks would never consider such imperfect non-Euclidean geometry. Successively smaller ruler size (S) produces longer coastline estimate (L)! Why? Smaller ruler gets into more coastal nooks and crannies. c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 5 / 44

Slide 9

Slide 9 text

Fractals How It Works Borders and Wars Plot border length (L) against ruler size (S) on log-log axes1. Why do border lengths fall on straight lines in log-log plot? Any crazy country shape is then characterized by a single number: its slope! Reason remained obscure until Mandelbrot resurrected it as geometry of fractals2 1 Lewis F. Richardson (1961) “The problem of contiguity: An appendix to Statistic of Deadly Quarrels.” 2 B. Mandelbrot, The Fractal Geometry of Nature, W. H. Freeman, New York (1983) c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 6 / 44

Slide 10

Slide 10 text

Fractals How It Works The Power of Power Laws Straight lines on log-log plot have form: Y = mX + c But Y ≡ ln(L) and X ≡ ln(S) with negative slope: ln(L) = −α ln(S) + ln(k) = ln(k S−α) Taking antilogs of both sides reveals general power law form: L = k S−α L = k Sα (1) Reverse this logic If your data looks “linear” on a log-log plot Assume it signals presence of a power law like (1) Find the slope to characterize it Exponent α is the “power” in power law c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 7 / 44

Slide 11

Slide 11 text

Fractals How It Works The Shape of Power Laws Α 1.0 Α 1.5 Α 2.0 Α 3.0 20 40 60 80 100 0.01 0.02 0.03 0.04 General shape of power law eqn.(1) is a hyperbola Blue curve is an exponential function Other curves are power laws with increasing α exponents (slopes) Big powers Large α power laws are indistinguishable from an exponential c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 8 / 44

Slide 12

Slide 12 text

Fractals How It Works The Tale is in the Tail k CDF 0 10 20 30 40 50 x 0.1 0.2 0.3 0.4 0.5 PDF Power laws differ from standard statistical distributions Power laws carry most of the information in their tail Fatter tail corresponds to stronger correlations than “normal” Mass of tail measured by cumulative distribution function (CDF) Log-log fitting On a log-log plot we are trying to fit right-hand side data, not left side c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 9 / 44

Slide 13

Slide 13 text

Fractals Internet Traffic Outline 1 Fractals What is a Fractal? How It Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 9 / 44

Slide 14

Slide 14 text

Fractals Internet Traffic Internet Congestion Internet performance collapsed c.1986 Bellcore c.1990 impact of ISDN broadband Packet tracing measurements at Bellcore Surprise: large packet trains Surprise: Service times file-size dependent Surprise: Packet arrivals not always Poisson Surprise: Queueing models break down How to do CaP for future Internet? Why is it happening? Part of the answer is power laws Netflix uses 33% of USA Internet BW c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 10 / 44

Slide 15

Slide 15 text

Fractals Internet Traffic Strangeness in the Interpipes DRAFT 0 100 200 300 400 500 600 700 800 900 1000 0 20000 40000 60000 (a) Time Unit = 100 Seconds Packets/Time Unit 0 100 200 300 400 500 600 700 800 900 1000 0 2000 4000 6000 (b) Time Unit = 10 Seconds Packets/Time Unit 0 100 200 300 400 500 600 700 800 900 1000 0 200 400 600 800 (c) Time Unit = 1 Second Packets/Time Unit 0 100 200 300 400 500 600 700 800 900 1000 0 20 40 60 80 100 (d) Time Unit = 0.1 Second Packets/Time Unit Packets/Time Unit 5 10 15 Read bottom to top, left to right Variance persists over 5 decades of time c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 11 / 44

Slide 16

Slide 16 text

Fractals Internet Traffic Traces on Log-Log Axes DRAFT log10(d) log10(r/s) 0 1 2 3 4 5 0 1 2 3 4 Figure 4.2.1 (b). Pox plot of R/S for sequence AUG89.MB. The plot tightly clusters around a straight line whose asymptotic slope clearly lies between the slopes 0.5 (lower dotted line) and 1.0 (upper dotted line) and is readily estimated (using the "brushed" points) to be about 0.79. 39 Y = (max − min)/std dev (“rescaled range”) X = sample size (in trace file c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 12 / 44

Slide 17

Slide 17 text

Fractals Internet Traffic Fractals in Time 10 5 5 10 0.05 0.10 0.15 0.20 0.25 Bellcore packet data shows fractal behavior in time (5 decades) Diffusion model of Brownian motion Solution is a normal distribution which evolves in time (t) ∂t f(x, t) = σ2∂2 x f(x, t) f(x, t) = 1 √ 4πσ2t exp „ −(x − µ)2 4σ2t « L2 = 4σ2t Diffusion length L = √ 4σ2t L ∼ t 1 2 Generalization L ∼ t 1 2 → tH (Brownian → Levy) What happens if H = 1 2 becomes 1 2 < H < 1? c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 13 / 44

Slide 18

Slide 18 text

Fractals Internet Traffic Router Occupancy 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 Router utilization Buffer occupancy M/M/1 (H = 0.50) H = 0.75 H = 0.90 Q: queue length or buffer occupancy ρ = λS: router utilization H: power law exponent (Hurst parameter) Q = ρ 1 2(1−H) (1 − ρ) 1 1−H H = 0.5 is identical to M/M/1 queue H = 0.9 Internet empirical Hurst exponent Buffer overflow can occur at lower loads Router model x<-c(1:100) rho<-x/100 qlen<-function(r,H){rˆ(1/(2*(1-H))) / ((1-r)ˆ(H/(1-H)))} plot(rho,qlen(rho,0.5),type="l",xlab="Router utilization",ylab="Buffer occupancy",ylim=c(0,10)) lines(rho,qlen(rho,0.75),col="blue") lines(rho,qlen(rho,0.90),col="red") c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 14 / 44

Slide 19

Slide 19 text

Applications Outline 1 Fractals What is a Fractal? How It Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 14 / 44

Slide 20

Slide 20 text

Applications Word Fractals Outline 1 Fractals What is a Fractal? How It Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 14 / 44

Slide 21

Slide 21 text

Applications Word Fractals Data Source — Corpus (Body of Words) Data (already ranked) is 1000 most common wordforms in UK English based on 29 works of literature by 18 authors (4.6 × 106 words) Wordform: english word Abs: absolute frequency (total number of occurrences) r: range (number of texts in which the word occurs) mod: modified frequency as defined by Rosengren (1972) Read data file > dir<-setwd("˜/../GDAT Scripts/Power Laws/") > td<-read.table("zipf1000.txt",header=TRUE) > head(td) Rank Wordform Abs r mod 1 1 the 225300 29 223066.9 2 2 and 157486 29 156214.4 3 3 to 134478 29 134044.8 4 4 of 126523 29 125510.2 5 5 a 100200 29 99871.2 6 6 I 91584 29 86645.5 c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 15 / 44

Slide 22

Slide 22 text

Applications Word Fractals Linear Visualization 0 50000 100000 150000 200000 Ranked 1000 UK English Words Ranked words (W) Frequency of occurrence (F) the their us love voice true state eye stand worth service neck land art Linear plot of data > plot(td$Rank, td$Abs, type="p", main="Ranked 1000 UK English Words", xlab="Ranked words (W)", ylab="Frequency of occurrence (F)", xaxt="n",log="xy",cex=0.5) > ticks.at<-seq(min(td$Rank), max(td$Rank),10) > ticks.lab<-as.character(td$Wordform[ticks.at]) > Axis(td$Rank, at=ticks.at, las=1, side=1,labels=ticks.lab, cex.axis=0.75) c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 16 / 44

Slide 23

Slide 23 text

Applications Word Fractals Double-Log Visualization 5e+02 2e+03 5e+03 2e+04 5e+04 2e+05 Ranked 1000 UK English Words Ranked words (W) Frequency of occurrence (F) the it at would much us love lay eye dare Data on log-log axes > plot(td$Rank, td$Abs, type="p", main="Ranked 1000 UK English Words", xlab="Ranked words (W)", ylab="Frequency of occurrence (F)", xaxt="n",log="xy",cex=0.5) > ticks.at<-seq(min(td$Rank), max(td$Rank),10) > ticks.lab<-as.character(td$Wordform[ticks.at]) > Axis(td$Rank, at=ticks.at, las=1, side=1,labels=ticks.lab, cex.axis=0.75) c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 17 / 44

Slide 24

Slide 24 text

Applications Word Fractals Regression Fit 5e+02 2e+03 5e+03 2e+04 5e+04 2e+05 Ranked 1000 UK English Words Ranked words (W) Frequency of occurrence (F) the it at would much us love lay eye dare Regression fit to logarithmic data # regression model of Y=log(y) and X=log(x) > z.fit <- lm(log(td$Abs) ˜ log(td$Rank)) # Must transform back to log scaled coords in plot > ly <- exp( (coef(z.fit)[2])*log(td$Rank) + coef(z.fit)[1] ) > lines(td$Rank, ly, col="blue",lty="solid",lwd=2) c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 18 / 44

Slide 25

Slide 25 text

Applications Word Fractals Summary of Regression Statistics Regression summary > summary(z.fit) Call: lm(formula = log(td$Abs) ˜ log(td$Rank)) Residuals: Min 1Q Median 3Q Max -1.47144 -0.06902 -0.01477 0.05757 0.81894 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 13.796627 0.024210 569.9 <2e-16 *** log(td$Rank) -1.131084 0.004039 -280.0 <2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.1258 on 998 degrees of freedom Multiple R-squared: 0.9874, Adjusted R-squared: 0.9874 F-statistic: 7.841e+04 on 1 and 998 DF, p-value: < 2.2e-16 c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 19 / 44

Slide 26

Slide 26 text

Applications Fractal Query Times Outline 1 Fractals What is a Fractal? How It Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 19 / 44

Slide 27

Slide 27 text

Applications Fractal Query Times Interpreting Time Histogram SQL Queries Elapse time Frequency 0 100 200 300 400 0 50 100 150 Histogram of measured SQL query times x-axis is elapsed time in seconds y-axis is number of queries with that time What distribution profile is it? Exponential, log-normal,... Can’t tell by just staring at it c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 20 / 44

Slide 28

Slide 28 text

Applications Fractal Query Times Data Source Original question: Craig Shallahamer’s blog Attempted solution: Dave Abercrombie’s blog My solution: My blog Read data file > dir<-setwd("˜/Desktop/GDAT Dev 2011/GDAT Scripts/Power Laws/") > orad<-read.table("orasql-data.txt",header=FALSE) # Add column names > colnames(orad) <- c( "SQLid", "sample", "execns", "dkReads", "buffGets", "CPUtime", "Elapstime" ) > head(orad) SQLid sample execns dkReads buffGets CPUtime Elapstime 1 8qtkxy0g5d1p3,2282376281 1 1 0 3 0.100 0.100 2 8qtkxy0g5d1p3,2282376281 2 1 0 3 0.106 0.106 3 8qtkxy0g5d1p3,2282376281 3 1 0 3 0.101 0.101 4 8qtkxy0g5d1p3,2282376281 4 1 0 3 0.098 0.098 5 8qtkxy0g5d1p3,2282376281 5 1 6 118 0.000 33.575 6 8qtkxy0g5d1p3,2282376281 6 1 8 137 10.000 31.004 c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 21 / 44

Slide 29

Slide 29 text

Applications Fractal Query Times Visualize Raw Data 0 100 200 300 400 500 0 100 200 300 400 Index orad$Elapstime Linear plot of unranked data > plot(orad$Elapstime, type="h") Like Zipf’s law, data must be ranked by frequency of occurrence c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 22 / 44

Slide 30

Slide 30 text

Applications Fractal Query Times Visualize Ranked Data 0 100 200 300 400 500 0 100 200 300 400 Ranked SQL Times Index otr Linear plot of ranked data > otr <- sort(orad$Elapstime, decreasing=TRUE) > plot(otr,type="h",main="Ranked SQL Times") c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 23 / 44

Slide 31

Slide 31 text

Applications Fractal Query Times Double-Log Visualization 1 2 5 10 20 50 100 200 500 0.1 0.5 1.0 5.0 50.0 500.0 Log-Log SQL Times Index otr Log-log plot of ranked data > plot(otr,log="xy",main="Log-Log SQL Times") Clearly this profile is not power law overall But the first 100 queries do appear to be power law c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 24 / 44

Slide 32

Slide 32 text

Applications Fractal Query Times Data Regions This suggests breaking data across 3 regions as follows: Windowed plots # Define data windows of ranked data etA<-otr[1:100] etB<-otr[100:270] # gap... etC<-otr[420:500] plot(etA,type="p",log="xy",main="Log-Log of SQL-A Times") plot(etB,type="p",log="y", main="Log-Lin of SQL-B Times") plot(etC,type="p",log="y", main="Log-Lin of SQL-C Times") 1 2 5 10 20 50 100 100 200 300 400 500 Log-Log of SQL-A Times Index etA 0 50 100 150 30 40 50 60 70 80 Log-Lin of SQL-B Times Index etB 0 20 40 60 80 0.090 0.095 0.100 0.105 0.110 Log-Lin of SQL-C Times Index etC c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 25 / 44

Slide 33

Slide 33 text

Applications Fractal Query Times Data Region A Fit 1 2 5 10 20 50 100 100 200 300 400 500 Log-Log SQL A-Times Index etA Regression analysis for Window A > xA<-seq(1:length(etA)) > zA.fit<-lm(log(etA) ˜ log(xA)) > EyA<-exp(coef(zA.fit)[2]*log(xA) + coef(zA.fit)[1]) > plot(etA,log="xy",main="Log-Log SQL A-Times") > lines(xA,EyA,col="blue",lwd=2) c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 26 / 44

Slide 34

Slide 34 text

Applications Fractal Query Times Data Region B Fit 0 50 100 150 30 40 50 60 70 80 Log-Lin SQL B-Times Index etB Regression analysis for Window B > xB<-seq(1:length(etB)) > zB.fit<-lm(log(etB) ˜ xB) > EyB<-exp(coef(zB.fit)[2]*xB + coef(zB.fit)[1]) > plot(etB,log="y",main="Log-Lin SQL B-Times") > lines(xB,EyB,col="blue",lwd=2) c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 27 / 44

Slide 35

Slide 35 text

Applications Fractal Query Times Data Region C Fit 0 20 40 60 80 0.090 0.095 0.100 0.105 0.110 Log-Lin SQL C-Times Index etC Regression analysis for Window C > xC<-seq(1:length(etC)) > zC.fit<-lm(log(etC) ˜ xC) > EyC<-exp(coef(zC.fit)[2]*xC + coef(zC.fit)[1]) > plot(etC,log="y",main="Log-Lin SQL C-Times") > lines(xC,EyC,col="blue",lwd=2) c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 28 / 44

Slide 36

Slide 36 text

Applications Fractal Query Times Regression Models 0 20 40 60 80 100 0 200 400 600 800 x y yA ∼ x−0.4632 power law yB ∼ e−0.0074x exponential decay yC ∼ e−0.0028x exponential decay Regression coefficients > coef(zA.fit) (Intercept) log(xA) 6.6055308 -0.4632485 > coef(zB.fit) (Intercept) xB 4.416070310 -0.007438368 > coef(zC.fit) (Intercept) xC -2.198802043 -0.002782828 c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 29 / 44

Slide 37

Slide 37 text

Applications Fractal Query Times Slope Analysis 1 2 5 10 20 50 100 100 200 300 400 500 Log-Log SQL A-Times Index etA From coef(zA.fit) know log(xA) = -0.4632485 Empirical slope γ = 0.46 to two significant decimal digits About half Zipfian slope γ = 1.0 ± 0.5 Correlations are stronger than for Zipf Hypothesis Shorter query times (window A) may be associated with dictionary lookups or other structured data. That structure provides correlations. Longer queries in windows B and C are not structured (ad hoc?) and are therefore more randomized. The lack of strong correlations shows up as different exponential decay rates. c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 30 / 44

Slide 38

Slide 38 text

Applications Fractal Time Series Outline 1 Fractals What is a Fractal? How It Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 30 / 44

Slide 39

Slide 39 text

Applications Fractal Time Series The Perfect Storm A Power Law Storm c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 31 / 44

Slide 40

Slide 40 text

Applications Fractal Time Series Before the Storm All businesses are required to register with the Australian Tax Office (ATO) for an Australian Business Number (ABN) to claim an income tax refund. The ABN was introduced in Y2K. Data from website hosting initial ABN registrations. Period covers March 27 to September 19, 2000 Post-advertising traffic 1 March to 30 May , 2000 Deadline spike on 31 May, 2000 Smaller traffic peaks from 1 June to 30 June, 2000 Post deadline period from 1 July to 19 Sept, 2000 Full details can be found in my CMG-A paper cmga-p10167.pdf included in your GDAT class materials. c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 32 / 44

Slide 41

Slide 41 text

Applications Fractal Time Series Full Data Profile 11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 4 8 2000 29 8 2000 0 200000 400000 600000 800000 1. 106 ORA Connections Question: Could the “11th hour” spike have been predicted? Answer: Yes, but quite involved. How: Using a power law. What else!? c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 33 / 44

Slide 42

Slide 42 text

Applications Fractal Time Series Log-Linear Plot 11 3 2000 21 4 2000 21 5 2000 1 104 2 104 5 104 1 105 2 105 5 105 1 106 2 106 ORA Connections y-axis is the number of Oracle RDBMS connections Here, the y-axis is log scaled Peak growth preceding spike looks linear on semi-log plot x-axis index (not shown) is “days from the start of data window” time series index range t = 0 to t = 38 days c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 34 / 44

Slide 43

Slide 43 text

Applications Fractal Time Series Semi-Log Regression on Peaks 11 3 2000 21 4 2000 1 104 2 104 5 104 1 105 2 105 5 105 1 106 ORA Connections Linear peak-growth on semi-log axes Curve must be an exponential function Use Exp as regression model ˆ y(t) = A exp(Bt) Model parameters Origin: A = 114128 Curvature: B= 0.0175 Doubling period: T2 = ln(2) B ∼ 6 months c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 35 / 44

Slide 44

Slide 44 text

Applications Fractal Time Series Trend Overview 11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 0 200000 400000 600000 800000 1. 106 ORA Connections Revert to linear axes to review the trend Exponential forecast up to the crosshairs looks valid But significantly underestimates onset of the “11th hour” peak As well as rapid drop off on RHS of the peak c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 36 / 44

Slide 45

Slide 45 text

Applications Fractal Time Series Power Law Fit 20 40 60 80 100 120 400000 600000 800000 1. 106 Power law has a critical point tc Equation: ˆ y(t) = k|t − tc|−γ See far LHS curve with tc = 40 (blue) Estimate ˆ y(t) → ∞ at t = tc Translate ˆ y(t) rightward until lower part of curve matches Exp function (red) Critical point also moves to tc = 61 (31 May, 2000) Critical point New element is the appearance of a critical point at tc Power law goes infinite and spikes at tc = 61 with γ = 0.6421 c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 37 / 44

Slide 46

Slide 46 text

Applications Fractal Time Series Comparison of Models Exp growth Power law 11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 0 200000 400000 600000 800000 1. 106 ORA Connections Exponential trend is consistent with data through April 2000 Completely underestimates onset of the “11th hour” spike Completely overestimates decay of traffic load beyond spike Data is already exceeding Exp model during April-May period Power law model predicts all these effects quite well Critical point is inclusion of critical point tc Use Exp model to baseline a power law model c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 38 / 44

Slide 47

Slide 47 text

Applications Fractal Time Series Look-ahead Tools Could we have seen the spike coming without knowing tc ? 2 weeks from data start 0.30 0.35 0.40 0.45 9.24 9.26 9.28 9.30 log(scale) log(SDF) SDF H = 0.714 1.5 months from data start 0.4 0.6 0.8 1.0 9.0 9.2 9.4 9.6 9.8 10.0 log(scale) log(SDF) SDF H = 0.687 Estimate H = 1 2 (1 − β) from the slope β of ln[S(f)] vs ln[f] in frequency domain: H ∈ [1 2 , 1) persistent autocorrelations (increase/decrease typically followed by increase/decrease) H = 1 2 statistically independent random fluctuations H ∈ (0, 1 2 ] antipersistent autocorrelations (increase/decrease typically followed by decrease/increase) c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 39 / 44

Slide 48

Slide 48 text

Conclusions Outline 1 Fractals What is a Fractal? How It Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 39 / 44

Slide 49

Slide 49 text

Conclusions Review Power laws are ubiquitous (but hidden) Power laws are not like standard statistical distributions Power laws have fatter tails than standard dsns Fat tail carries the bulk of the information (tale is in the tail) Power laws are easy to demonstrate with log-log plot Looked at 3 examples: Zipf, ORA SQL, and ORA ABN traffic spike Trick is to explain persistent strong correlations May need more data but that’s exactly how it should be c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 40 / 44

Slide 50

Slide 50 text

Conclusions Wanna Learn More? Chapter 10 Internet Planning Bellcore traces Fractals and Self-Similarity Short-range Dependence Long-range Dependence (LRD) Ethernet Packetization LRD and Flicker Noise Guerrilla training classes c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 41 / 44

Slide 51

Slide 51 text

Conclusions Why You Should Care c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 42 / 44

Slide 52

Slide 52 text

Conclusions Why You Should Care Power laws are ubiquitous c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 42 / 44

Slide 53

Slide 53 text

Conclusions Why You Should Care Power laws are ubiquitous Hard to see them in raw performance data c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 42 / 44

Slide 54

Slide 54 text

Conclusions Why You Should Care Power laws are ubiquitous Hard to see them in raw performance data Must transform your data to see them c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 42 / 44

Slide 55

Slide 55 text

Conclusions Why You Should Care Power laws are ubiquitous Hard to see them in raw performance data Must transform your data to see them Ranked data appears linear on double-log axes c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 42 / 44

Slide 56

Slide 56 text

Conclusions Why You Should Care Power laws are ubiquitous Hard to see them in raw performance data Must transform your data to see them Ranked data appears linear on double-log axes More persistent response degradation than usual c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 42 / 44

Slide 57

Slide 57 text

Conclusions Why You Should Care Power laws are ubiquitous Hard to see them in raw performance data Must transform your data to see them Ranked data appears linear on double-log axes More persistent response degradation than usual Can seriously impact overall database performance c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 42 / 44

Slide 58

Slide 58 text

Conclusions References B. Mandelbrot, The Fractal Geometry of Nature, W. H. Freeman, 1983 L. Liebovitch, Fractals and Chaos Simplified for the Life Sciences, Oxford Uni. Press, 1998 K. Park and W. Willinger, Self-Similar Network Traffic and Performance Evaluation, John Wiley, 2000 N. J. Gunther, Guerrilla Capacity Planning, Springer, 2007 www.perfdynamics.com/iBook/gcap.html The R Project Tools for Statistical Computing www.r-project.org Performance Dynamics Educational Services Local and on-site training www.perfdynamics.com/Classes/schedule.html c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 43 / 44

Slide 59

Slide 59 text

Conclusions Thank you for attending! Performance Dynamics Company Castro Valley, California www.perfdynamics.com perfdynamics.blogspot.com twitter.com/DrQz facebook.com/Performance-Dynamics-Company [email protected] +1-510-537-5758 c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 44 / 44