Upgrade to Pro — share decks privately, control downloads, hide ads and more …

pandas + scikit-learn = pandas_ml @ PyData.Tokyo Meetup #6 (Lightning Talk)

Sinhrks
October 23, 2015

pandas + scikit-learn = pandas_ml @ PyData.Tokyo Meetup #6 (Lightning Talk)

Sinhrks

October 23, 2015
Tweet

More Decks by Sinhrks

Other Decks in Programming

Transcript

  1. pandas + scikit-learn

    = pandas_ml

    View Slide

  2. Introduction
    • ۀ຿: σʔλ෼ੳ

    • OSS׆ಈ:

    • PyData Development Team (pandasίϛολ)

    • Blaze Development Team (Daskίϛολ)

    • GitHub: https://github.com/sinhrks

    View Slide

  3. PythonͰػցֶश
    • pandas: લॲཧ

    • Scikit-learn: ػցֶश

    View Slide

  4. ࢖ͬͯΈ͚ͨͲʜ
    ͪΐͬͱ໘౗Ͱ͸ʁ
    $SFBUJWF$PNNPOT$$GSPNUIF1JYBCBZ

    View Slide

  5. Ͳ͕͜໘౗͔
    import numpy as np
    import pandas as pd
    from sklearn import datasets
    digits = datasets.load_digits()
    df = pd.DataFrame(digits.data)
    df
    EJHJUTσʔλΛϩʔυ
    આ໌ม਺ͷΈΛ%BUB'SBNFʹ

    View Slide

  6. import sklearn.cross_validation as crv
    train_df, test_df, train_l, test_l = crv.train_test_split(df, digits.target)
    train_df
    Ͳ͕͜໘౗͔
    import sklearn.preprocessing as pp
    pp.normalize(train_df)
    array([[ 0. , 0. , 0.11183193, ..., 0.04792797,
    0. , 0. ],
    ...,
    [ 0. , 0. , 0.13155475, ..., 0. ,
    0. , 0. ]])
    ෆศͳϙΠϯτ
    ฦΓ஋͸OEBSSBZ
    QBOEBTͰॲཧ͕ଓ͚ʹ͍͘
    ෆศͳϙΠϯτ
    ֤αϒϞδϡʔϧ͸ͦΕͧΕ
    JNQPSU͕ඞཁ

    View Slide

  7. Ͳ͕͜໘౗͔
    import sklearn.svm as sum
    svc = svm.SVC(kernel='linear', C=1.0)
    svc.fit(train_df, train_l)
    SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
    kernel='linear', max_iter=-1, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)
    predicted = svc.predict(test_df)
    predicted
    array([1, 5, 0, ..., 7, 8, 3])
    import sklearn.metrics as metrics
    metrics.confusion_matrix(test_l, predicted)
    array([[53, 0, 0, ..., 0, 0, 0],
    [ 0, 42, 0, ..., 0, 0, 0],
    [ 0, 0, 41, ..., 0, 0, 0],
    ...,
    [ 0, 0, 0, ..., 47, 0, 1],
    [ 0, 0, 0, ..., 0, 36, 0],
    [ 0, 0, 0, ..., 0, 1, 45]])
    ෆศͳϙΠϯτ
    આ໌໨తม਺ͰࣅͨΑ͏ͳ
    ࢦఆΛ܁Γฦ͢ඞཁ͕

    View Slide

  8. pandas_ml

    View Slide

  9. pandas_ml
    • ModelFrame: pandas.DataFrame Λܧঝ

    • ϝλσʔλͱͯ͠ɺઆ໌/໨తม਺ͷΧϥϜ৘
    ใΛอ࣋

    • Scikit-learn ͱ࿈ܞ͢ΔϝιουΛ௥Ճ

    View Slide

  10. pandas_ml
    import pandas_ml as pdml
    df = pdml.ModelFrame(digits)
    df
    EJHJUTσʔλ͔Β
    .PEFM'SBNF࡞੒
    .PEFM'SBNF͸
    ໨తม਺ΛΧϥϜͱؚͯ͠Ή

    View Slide

  11. pandas_ml
    train_df, test_df = df.crv.train_test_split()
    train_df
    train_df.preprocessing.normalize()
    վળฦΓ஋͸.PEFM'SBNF
    લॲཧܥͷϝιου͸
    આ໌ม਺෦෼ʹͷΈద༻
    վળϓϩύςΟ͔Β
    ؔ਺Λݺͼग़͠ JNQPSUෆཁ

    View Slide

  12. pandas_ml
    svc = train_df.svm.SVC(kernel=‘linear’, C=1.0)
    train_df.fit(svc)
    SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
    kernel='linear', max_iter=-1, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)
    test_df.predict(svc)
    test_df.metrics.confusion_matrix()
    վળઆ໌໨తม਺ͷࢦఆ͸লུ

    View Slide

  13. هड़ͷൺֱ
    df = pd.DataFrame(digits.data)
    import sklearn.cross_validation as crv
    train_df, test_df, train_l, test_l = crv.train_test_split(df, digits.target)
    import sklearn.svm as sum
    svc = svm.SVC(kernel='linear', C=1.0)
    svc.fit(train_df, train_l)
    predicted = svc.predict(test_df)
    import sklearn.metrics as metrics
    metrics.confusion_matrix(test_l, predicted)
    import pandas_ml as pdml
    df = pdml.ModelFrame(digits)
    train_df, test_df = df.crv.train_test_split()
    svc = train_df.svm.SVC(kernel=‘linear’, C=1.0)
    train_df.fit(svc)
    test_df.predict(svc)
    test_df.metrics.confusion_matrix()
    QBOEBTTDJLJUMFBSO
    [email protected]

    View Slide

  14. Ͱɺ9(#PPTU͸
    $SFBUJWF$PNNPOT$$GSPNUIF1JYBCBZ

    View Slide

  15. XGBoost
    xgc = train_df.xgboost.XGBClassifier()
    train_df.fit(xgc)
    XGBClassifier(base_score=0.5, colsample_bytree=1, gamma=0, learning_rate=0.1,
    max_delta_step=0, max_depth=3, min_child_weight=1, missing=None,
    n_estimators=100, nthread=-1, objective='multi:softprob', seed=0,
    silent=True, subsample=1)
    test_df.predict(xgc)
    4DJLJUMFBSOͷ"1*Λ
    ར༻ͯ͠(SJE4FBSDI
    ͱ͔Ͱ͖Δ

    View Slide

  16. XGBoost
    • ࠷ۙগ͠ߩݙͯ͠·͢

    View Slide

  17. pandas + scikit-learn + xgboost

    = pandas_ml

    View Slide