Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Artful Business of Data Mining: Computational Statistics with Open Source Tools

The Artful Business of Data Mining: Computational Statistics with Open Source Tools

This talk goes over a concepts of data mining and data analysis using open source tools, mainly Python and R with interesting libraries and the tools I have used and currently use at Engine Yard.

David Coallier

March 20, 2013
Tweet

More Decks by David Coallier

Other Decks in Education

Transcript

  1. 1 0 0 1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟

    Wednesday 20 March 13
  2. import numpy as np x = np.array([ [1, 0], [0,

    1] ]) vec, val = np.linalg.eig(x) np.linalg.eigvals(x) Wednesday 20 March 13
  3. from sklearn import tree X = [[0, 0], [1, 1]]

    Y = [0, 1] clf = tree.DecisionTreeClassifier() clf = clf.fit(X, Y) clf.predict([[2., 2.]]) >>> array([1]) Wednesday 20 March 13
  4. from pandas import * x = DataFrame([ {"age": 26}, {"age":

    19}, {"age": 21}, {"age": 18} ]) print x[x['age'] > 20].count() print x[x['age'] > 20].mean() Wednesday 20 March 13