David Coallier
March 20, 2013
190

The Artful Business of Data Mining: Computational Statistics with Open Source Tools

This talk goes over a concepts of data mining and data analysis using open source tools, mainly Python and R with interesting libraries and the tools I have used and currently use at Engine Yard.

March 20, 2013

Transcript

1. The Artful Business of Data Mining Computational Statistics with Open

Source Tool Wednesday 20 March 13

March 13

March 13

20 March 13

20 March 13

27. 1 0 0 1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟

Wednesday 20 March 13
28. import numpy as np x = np.array([ [1, 0], [0,

1] ]) vec, val = np.linalg.eig(x) np.linalg.eigvals(x) Wednesday 20 March 13
29. >>> np.linalg.eig(x) ( array([ 1., 1.]), array([ [ 1., 0.],

[ 0., 1.] ]) ) Wednesday 20 March 13

34. from sklearn import tree X = [[0, 0], [1, 1]]

Y = [0, 1] clf = tree.DecisionTreeClassifier() clf = clf.fit(X, Y) clf.predict([[2., 2.]]) >>> array([1]) Wednesday 20 March 13

41. from pandas import * x = DataFrame([ {"age": 26}, {"age":

19}, {"age": 21}, {"age": 18} ]) print x[x['age'] > 20].count() print x[x['age'] > 20].mean() Wednesday 20 March 13

45. yy/mm/dd mm/dd/yy YYYY-mm-dd HH:MM:ss TZ yy-mm-dd 1363784094.513425 yy/mm different timezone

Wednesday 20 March 13

13