Upgrade to Pro — share decks privately, control downloads, hide ads and more …

pandas + scikit-learn = pandas_ml @ PyData.Toky...

Sinhrks
October 23, 2015

pandas + scikit-learn = pandas_ml @ PyData.Tokyo Meetup #6 (Lightning Talk)

Sinhrks

October 23, 2015
Tweet

More Decks by Sinhrks

Other Decks in Programming

Transcript

  1. Introduction • ۀ຿: σʔλ෼ੳ • OSS׆ಈ: • PyData Development Team

    (pandasίϛολ) • Blaze Development Team (Daskίϛολ) • GitHub: https://github.com/sinhrks
  2. Ͳ͕͜໘౗͔ import numpy as np import pandas as pd from

    sklearn import datasets digits = datasets.load_digits() df = pd.DataFrame(digits.data) df EJHJUTσʔλΛϩʔυ આ໌ม਺ͷΈΛ%BUB'SBNFʹ
  3. import sklearn.cross_validation as crv train_df, test_df, train_l, test_l = crv.train_test_split(df,

    digits.target) train_df Ͳ͕͜໘౗͔ import sklearn.preprocessing as pp pp.normalize(train_df) array([[ 0. , 0. , 0.11183193, ..., 0.04792797, 0. , 0. ], ..., [ 0. , 0. , 0.13155475, ..., 0. , 0. , 0. ]]) ෆศͳϙΠϯτ ฦΓ஋͸OEBSSBZ QBOEBTͰॲཧ͕ଓ͚ʹ͍͘ ෆศͳϙΠϯτ ֤αϒϞδϡʔϧ͸ͦΕͧΕ JNQPSU͕ඞཁ
  4. Ͳ͕͜໘౗͔ import sklearn.svm as sum svc = svm.SVC(kernel='linear', C=1.0) svc.fit(train_df,

    train_l) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0, kernel='linear', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) predicted = svc.predict(test_df) predicted array([1, 5, 0, ..., 7, 8, 3]) import sklearn.metrics as metrics metrics.confusion_matrix(test_l, predicted) array([[53, 0, 0, ..., 0, 0, 0], [ 0, 42, 0, ..., 0, 0, 0], [ 0, 0, 41, ..., 0, 0, 0], ..., [ 0, 0, 0, ..., 47, 0, 1], [ 0, 0, 0, ..., 0, 36, 0], [ 0, 0, 0, ..., 0, 1, 45]]) ෆศͳϙΠϯτ આ໌໨తม਺ͰࣅͨΑ͏ͳ ࢦఆΛ܁Γฦ͢ඞཁ͕
  5. pandas_ml import pandas_ml as pdml df = pdml.ModelFrame(digits) df EJHJUTσʔλ͔Β

    .PEFM'SBNF࡞੒ .PEFM'SBNF͸ ໨తม਺ΛΧϥϜͱؚͯ͠Ή
  6. pandas_ml svc = train_df.svm.SVC(kernel=‘linear’, C=1.0) train_df.fit(svc) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,

    degree=3, gamma=0.0, kernel='linear', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) test_df.predict(svc) test_df.metrics.confusion_matrix() վળઆ໌໨తม਺ͷࢦఆ͸লུ
  7. هड़ͷൺֱ df = pd.DataFrame(digits.data) import sklearn.cross_validation as crv train_df, test_df,

    train_l, test_l = crv.train_test_split(df, digits.target) import sklearn.svm as sum svc = svm.SVC(kernel='linear', C=1.0) svc.fit(train_df, train_l) predicted = svc.predict(test_df) import sklearn.metrics as metrics metrics.confusion_matrix(test_l, predicted) import pandas_ml as pdml df = pdml.ModelFrame(digits) train_df, test_df = df.crv.train_test_split() svc = train_df.svm.SVC(kernel=‘linear’, C=1.0) train_df.fit(svc) test_df.predict(svc) test_df.metrics.confusion_matrix() QBOEBT TDJLJUMFBSO QBOEBT@NM
  8. XGBoost xgc = train_df.xgboost.XGBClassifier() train_df.fit(xgc) XGBClassifier(base_score=0.5, colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,

    max_depth=3, min_child_weight=1, missing=None, n_estimators=100, nthread=-1, objective='multi:softprob', seed=0, silent=True, subsample=1) test_df.predict(xgc) 4DJLJUMFBSOͷ"1*Λ ར༻ͯ͠(SJE4FBSDI ͱ͔Ͱ͖Δ