Save 37% off PRO during our Black Friday Sale! »

pandas + scikit-learn = pandas_ml @ PyData.Tokyo Meetup #6 (Lightning Talk)

22f56e55955b9aa693081ed5dc6400ae?s=47 Sinhrks
October 23, 2015

pandas + scikit-learn = pandas_ml @ PyData.Tokyo Meetup #6 (Lightning Talk)

22f56e55955b9aa693081ed5dc6400ae?s=128

Sinhrks

October 23, 2015
Tweet

Transcript

  1. pandas + scikit-learn = pandas_ml

  2. Introduction • ۀ຿: σʔλ෼ੳ • OSS׆ಈ: • PyData Development Team

    (pandasίϛολ) • Blaze Development Team (Daskίϛολ) • GitHub: https://github.com/sinhrks
  3. PythonͰػցֶश • pandas: લॲཧ • Scikit-learn: ػցֶश

  4. ࢖ͬͯΈ͚ͨͲʜ ͪΐͬͱ໘౗Ͱ͸ʁ $SFBUJWF$PNNPOT$$GSPNUIF1JYBCBZ

  5. Ͳ͕͜໘౗͔ import numpy as np import pandas as pd from

    sklearn import datasets digits = datasets.load_digits() df = pd.DataFrame(digits.data) df EJHJUTσʔλΛϩʔυ આ໌ม਺ͷΈΛ%BUB'SBNFʹ
  6. import sklearn.cross_validation as crv train_df, test_df, train_l, test_l = crv.train_test_split(df,

    digits.target) train_df Ͳ͕͜໘౗͔ import sklearn.preprocessing as pp pp.normalize(train_df) array([[ 0. , 0. , 0.11183193, ..., 0.04792797, 0. , 0. ], ..., [ 0. , 0. , 0.13155475, ..., 0. , 0. , 0. ]]) ෆศͳϙΠϯτ ฦΓ஋͸OEBSSBZ QBOEBTͰॲཧ͕ଓ͚ʹ͍͘ ෆศͳϙΠϯτ ֤αϒϞδϡʔϧ͸ͦΕͧΕ JNQPSU͕ඞཁ
  7. Ͳ͕͜໘౗͔ import sklearn.svm as sum svc = svm.SVC(kernel='linear', C=1.0) svc.fit(train_df,

    train_l) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0, kernel='linear', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) predicted = svc.predict(test_df) predicted array([1, 5, 0, ..., 7, 8, 3]) import sklearn.metrics as metrics metrics.confusion_matrix(test_l, predicted) array([[53, 0, 0, ..., 0, 0, 0], [ 0, 42, 0, ..., 0, 0, 0], [ 0, 0, 41, ..., 0, 0, 0], ..., [ 0, 0, 0, ..., 47, 0, 1], [ 0, 0, 0, ..., 0, 36, 0], [ 0, 0, 0, ..., 0, 1, 45]]) ෆศͳϙΠϯτ આ໌໨తม਺ͰࣅͨΑ͏ͳ ࢦఆΛ܁Γฦ͢ඞཁ͕
  8. pandas_ml

  9. pandas_ml • ModelFrame: pandas.DataFrame Λܧঝ • ϝλσʔλͱͯ͠ɺઆ໌/໨తม਺ͷΧϥϜ৘ ใΛอ࣋ • Scikit-learn

    ͱ࿈ܞ͢ΔϝιουΛ௥Ճ
  10. pandas_ml import pandas_ml as pdml df = pdml.ModelFrame(digits) df EJHJUTσʔλ͔Β

    .PEFM'SBNF࡞੒ .PEFM'SBNF͸ ໨తม਺ΛΧϥϜͱؚͯ͠Ή
  11. pandas_ml train_df, test_df = df.crv.train_test_split() train_df train_df.preprocessing.normalize() վળฦΓ஋͸.PEFM'SBNF લॲཧܥͷϝιου͸ આ໌ม਺෦෼ʹͷΈద༻

    վળϓϩύςΟ͔Β ؔ਺Λݺͼग़͠ JNQPSUෆཁ
  12. pandas_ml svc = train_df.svm.SVC(kernel=‘linear’, C=1.0) train_df.fit(svc) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,

    degree=3, gamma=0.0, kernel='linear', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) test_df.predict(svc) test_df.metrics.confusion_matrix() վળઆ໌໨తม਺ͷࢦఆ͸লུ
  13. هड़ͷൺֱ df = pd.DataFrame(digits.data) import sklearn.cross_validation as crv train_df, test_df,

    train_l, test_l = crv.train_test_split(df, digits.target) import sklearn.svm as sum svc = svm.SVC(kernel='linear', C=1.0) svc.fit(train_df, train_l) predicted = svc.predict(test_df) import sklearn.metrics as metrics metrics.confusion_matrix(test_l, predicted) import pandas_ml as pdml df = pdml.ModelFrame(digits) train_df, test_df = df.crv.train_test_split() svc = train_df.svm.SVC(kernel=‘linear’, C=1.0) train_df.fit(svc) test_df.predict(svc) test_df.metrics.confusion_matrix() QBOEBT TDJLJUMFBSO QBOEBT@NM
  14. Ͱɺ9(#PPTU͸ $SFBUJWF$PNNPOT$$GSPNUIF1JYBCBZ

  15. XGBoost xgc = train_df.xgboost.XGBClassifier() train_df.fit(xgc) XGBClassifier(base_score=0.5, colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,

    max_depth=3, min_child_weight=1, missing=None, n_estimators=100, nthread=-1, objective='multi:softprob', seed=0, silent=True, subsample=1) test_df.predict(xgc) 4DJLJUMFBSOͷ"1*Λ ར༻ͯ͠(SJE4FBSDI ͱ͔Ͱ͖Δ
  16. XGBoost • ࠷ۙগ͠ߩݙͯ͠·͢

  17. pandas + scikit-learn + xgboost = pandas_ml