Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Statistical Thinking for Data Science

Chris Fonnesbeck
February 08, 2015

Statistical Thinking for Data Science

PyTennessee 2015 Keynote Address

Chris Fonnesbeck

February 08, 2015
Tweet

More Decks by Chris Fonnesbeck

Other Decks in Science

Transcript

  1. ?

  2. "... 132 such victims were admitted to the Animal Medical

    Center on 62nd Street in Manhattan ..."
  3. "Next week, the first answers from these ten million will

    begin the incoming tide of marked ballots, to be triple-checked, verified, five-times cross-classified and totalled."
  4. 66%

  5. p = 0.5 sample_sizes = [10, 100, 1000, 10000, 100000]

    replicates = 1000 biases = [] for n in sample_sizes: bias = np.empty(replicates) for i in range(replicates): true_sample = np.random.normal(size=n) negative_values = true_sample<0 missing = np.random.binomial(1, p, n).astype(bool) observed_sample = true_sample[~(negative_values & missing)] bias[i] = observed_sample.mean() biases.append(bias)
  6. "The value for which , or 1 in 20, is

    1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation ought to be considered significant or not." R.A. Fisher
  7. "If an experiment were repeated infinitely, p represents the proportion

    of values more extreme than the observed value, given that the null hypothesis is true."
  8. H0 : The density of large trees in logged and

    unlogged forest stands were equal
  9. H0 : The density of large trees in logged and

    unlogged forest stands were equal
  10. import seaborn as sb import pandas as pd n =

    20 r = 36 df = pd.concat([pd.DataFrame({'y':np.random.normal(size=n), 'x':np.random.random(n), 'replicate':[i]*n}) for i in range(r)]) sb.lmplot('x', 'y', df, col='replicate', col_wrap=6)
  11. "Despite a large statistical literature for multiple testing corrections, usually

    it is impossible to decipher how much data dredging by the reporting authors or other research teams has preceded a reported research finding."
  12. “While everyone is looking at the polls and the storm,

    Romney’s slipping into the presidency. ”