David Coallier
March 20, 2013
The Artful Business of Data Mining: Computational Statistics with Open Source Tools

This talk goes over a concepts of data mining and data analysis using open source tools, mainly Python and R with interesting libraries and the tools I have used and currently use at Engine Yard.

1. The Artful Business of Data Mining Computational Statistics with Open

27. 1 0 0 1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟

28. import numpy as np x = np.array([ [1, 0], [0,

vec, val = np.linalg.eig(x) np.linalg.eigvals(x)
29. >>> np.linalg.eig(x) ( array([ 1., 1.]), array([ [ 1., 0.],

[ 0., 1.] ]) ) Wednesday 20 March 13

34. from sklearn import tree X = [[0, 0], [1, 1]]

from sklearn import tree X = [[0, 0], [1, 1]] Y = [0, 1] clf = tree.DecisionTreeClassifier() clf = clf.fit(X, Y) clf.predict([[2., 2.]]) >>> array([1])

41. from pandas import * x = DataFrame([ {"age": 26}, {"age":

from pandas import * x = DataFrame([ {"age": 26}, {"age": 19}, {"age": 21}, {"age": 18} ]) print x[x['age'] > 20].count() print x[x['age'] > 20].mean()

45. yy/mm/dd mm/dd/yy YYYY-mm-dd HH:MM:ss TZ yy-mm-dd 1363784094.513425 yy/mm different timezone

Wednesday 20 March 13

