variance, stdev, etc. - scan items only once - preserve precision as possible for Float values ‣ sum (only for Ruby < 2.4) - almost same algorithm as Ruby trunk • Very fast implementation
variance, stdev - scan items only once - preserve precision as possible for Float values ‣ sum (only for Ruby < 2.4) - almost same algorithm as Ruby trunk • Very fast implementation
E [( x µ )2] = 1 n n X k=1 ( xk µ )2 = 1 n n X k=1 ( xk 2 2 µxk + µ 2) = 1 n n X k=1 xk 2 2 µ 1 n n X k=1 xk + µ 2 1 n n X k=1 1 = 1 n n X k=1 xk 2 2 µ 2 + µ 2 1 n ⇥ n = 1 n n X k=1 xk 2 µ 2 = E [ x 2] E [ x ]2
E [( x µ )2] = 1 n n X k=1 ( xk µ )2 = 1 n n X k=1 ( xk 2 2 µxk + µ 2) = 1 n n X k=1 xk 2 2 µ 1 n n X k=1 xk + µ 2 1 n n X k=1 1 = 1 n n X k=1 xk 2 2 µ 2 + µ 2 1 n ⇥ n = 1 n n X k=1 xk 2 µ 2 = E [ x 2] E [ x ]2
)2] = 1 n n X k=1 ( xk µ )2 = 1 n n X k=1 0 @ xk 1 n n X j=1 xj 1 A 2 2 = E [ x 2] E [ x ]2 = 1 n n X k=1 x 2 1 n n X k=1 x !2 need to scan twice enough to scan once (online algorithm) The 2nd formula is better than the 1st for large populations Really?
[4, -1.1102230246251565e-16] [5, -1.1102230246251565e-16] [6, -2.220446049250313e-16] [7, -1.1102230246251565e-16] [8, -1.1102230246251565e-16] [9, -1.1102230246251565e-16] [10, -2.220446049250313e-16] : : • The 2nd formula rarely derives negative values • This is due to errors on floating-point arithmetic • We cannot calculate standard deviation if variance is negative
formula when n is small • Use 1-pass formula for shifted values • Use recurrence relation formula 2 = E [( x µ )2] = 1 n n X k=1 ( xk µ )2 2 = E [( x ˆ x )2] E [ x ˆ x ]2 = 1 n n X k=1 ( x ˆ x )2 ( 1 n n X k=1 ( x ˆ x ) )2
of the first n items: S 2 1 = 0 , S 2 n = n P k=1 ( x ¯ xn)2 = n P k=1 ( x ¯ xn 1 + ¯ xn 1 ¯ xn)2 = n P k=1 ( x ¯ xn 1)2 + 2 n P k=1 ( x ¯ xn 1)(¯ xn 1 ¯ xn) + n P k=1 (¯ xn 1 ¯ xn)2 . . . snip . . . = S 2 n 1 + ( xn ¯ xn 1)2 1 n ( xn ¯ xn 1)2 = S 2 n 1 + ( xn ¯ xn 1)2 ( xn ¯ xn 1)(¯ xn ¯ xn 1) = S 2 n 1 + ( xn ¯ xn 1)( xn ¯ xn)
n n 1 2 n = S2 n n Mean: Sample variance: Population variance: Sum of squares: S 2 1 = 0 , S 2 n = S 2 n 1 + ( xn ¯ xn 1)( xn ¯ xn) ¯ x1 = x1, ¯ xn = ¯ xn 1 + xn ¯ xn 1 n
of Enumerable • Use recurrence relation formula for 1- pass and precision preserving calculation • Fast calculation without method calls • gem install enumerable-statistics