Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning with Python

Machine Learning with Python

G's Academy Tokyo にて行なっているPython講義の3日目の資料です。3日目はPythonによる機械学習を学びます。SVMアルゴリズムを用いて、手書き文字を判定するアプリを作る演習付きです。
Presented by http://www.yoheim.net

Yohei Munesada

February 22, 2017
Tweet

More Decks by Yohei Munesada

Other Decks in Technology

Transcript

  1.  corse catalog 㾎 1ZUIPOͱ͸ 㾎 1ZUIPOجຊฤ 㾎 Ϟδϡʔϧͱύοέʔδ 㾎

    8FCεΫϨΠϐϯά 㾎 8FCαʔόʔ 㾎 1ZUIPOͱػցֶश Basic Advanced
  2.  machine learning is … ? େྔσʔλ͔ΒύλʔϯΛೝࣝ͢Δ ໌೔͕੖ΕΔ৚݅ͱ͸ʁ Πέϝϯʹೝఆ͞ΕΔύλʔϯͱ͸ʁ ໌೔ͷσΟζχʔϥϯυ͕ࠞΉ৚݅͸ʁ

    ύλʔϯΛ࢖ͬͯɺະ஌ͷσʔλΛ༧ଌ͢Δ ͋ͷಈ͖ɺ΋͠΍ෆ৹ऀͰ͸ɾɾɾʁ ࠓि຤ͷ͓ళ΁ͷདྷ٬਺͸ʁ དྷ݄ͷిྗ࢖༻ྔ͸ʁ
  3.  ػցֶशͷϞσϧ͸ɺճؼ 3FHSFTTJPO ͱ෼ྨ $MBTTJpDBUJPO ʹେผ͞Ε·͢ ճؼ(Regression) ෼ྨ(Classification) ೖྗ͞Εͨσʔλ͔Β਺஋Λ༧ଌ͢ΔϞσϧɻ ྫɿϢʔβʔͷߪೖֹ༧ଌɺϢʔβʔͷεϚϗར༻࣌ؒ༧ଌ

    ೖྗ͞Εͨσʔλ͔Β෼ྨΛ༧ଌ͢ΔϞσϧɻ෼ྨث $MBTTJpFS ͱ΋ݺ͹ΕΔɻ ྫɿϢʔβʔ͕ߪೖ͢Δ͔൱͔ɺը૾ʹೣؚ͕·ΕΔ͔൱͔ɺखॻ͖਺஋ͷ஋͸Կ͔ʁ regression and classification
  4.  ػցֶशͷϞσϧ͸ɺڭࢣ͋Γ 4VQFSWJTFE ͱڭࢣͳ͠ 6OTVQFSWJTFE ʹେผ͞Ε·͢ ڭࢣ͋Γ(Supervised) ڭࢣͳ͠(Unsupervised) ࣄલʹ༩͑ΒΕͨσʔλ τϨʔχϯάσʔλ

    Λ࢖ֶͬͯशΛߦ͍ɺͦΕΛ΋ͱʹ༧ଌ͢Δɻ ྫɿઢܗճؼɺϩδεςΟοΫճؼɺ47.ɺχϡʔϥϧωοτɺܾఆ໦ɺFUD ࣄલσʔλͳ͠ʹɺ༩͑ΒΕͨະ஌ͳσʔλ͔ΒԿΒ͔ͷຊ࣭తͳߏ଄Λಋ͖ग़͢ɻ ྫɿΫϥελϦϯάɺओ੒෼෼ੳɺFUD supervised and unsupervised
  5.  ڭࢣ͋Γ Supervised ڭࢣͳ͠ Unsupervised ճؼ Regression A B ෼ྨ

    Classification C D 4 types of machine learning ଞʹ΋ڧԽֶश΍Ҩ఻తΞϧΰϦζϜ΋
  6.  NumPy ߦྻ΍ϕΫτϧΛѻ͏͜ͱ͕Ͱ͖ɺ$ݴޠϨϕϧͰߴ଎ʹॲཧ͢Δ import numpy as np X = np.array([

    [5,0,1], [1,6,0], [0,0,1] ]) X.shape X.T E = np.eye(X.shape[0]) X.dot(E)
  7.  Matplotlib %ϓϩοτΛߦ͏ϥΠϒϥϦͰ͢ import numpy as np import matplotlib.pylab as

    plt x = np.linspace(-np.pi, np.pi, 200) plt.plot(x, np.sin(x)) plt.show()
  8.  ڭࢣ͋Γ Supervised ڭࢣͳ͠ Unsupervised ճؼ Regression A B ෼ྨ

    Classification C D re: 4 types of machine learning
  9.  ڭࢣ͋Γ Supervised ڭࢣͳ͠ Unsupervised ճؼ Regression A B ෼ྨ

    Classification C D re: 4 types of machine learning
  10.  Support Vector Machine ڭࢣ͋Γɺ෼ྨ໰୊Λղ͘ΞϧΰϦζϜͰ͢ ˚ ˚ ˚ ˚ ˚

    ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋
  11.  Support Vector Machine ڭࢣ͋Γɺ෼ྨ໰୊Λղ͘ΞϧΰϦζϜͰ͢ ˚ ˚ ˚ ˚ ˚

    ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋
  12.  Support Vector Machine ڭࢣ͋Γɺ෼ྨ໰୊Λղ͘ΞϧΰϦζϜͰ͢ ˚ ˚ ˚ ˚ ˚

    ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋
  13.  Support Vector Machine ڭࢣ͋Γɺ෼ྨ໰୊Λղ͘ΞϧΰϦζϜͰ͢ ˚ ˚ ˚ ˚ ˚

    ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ Ϛʔδϯͷ࠷େԽ
  14.  the features of MNIST ./*45ͷը૾ͷಛ௃ྔͱ͸ԿͰ͠ΐ͏͔ʁ    

                                                                                                                                                          ʜ ϐΫηϧ͕ͦΕͧΕಛ௃ྔʢίʣ
  15.  binary data with Python όΠφϦσʔλΛTUSVDUϞδϡʔϧ ඪ४ϥΠϒϥϦ Ͱѻ͍·͢ import struct

    f = open(“file.binary", "rb") num1, num2 = struct.unpack(">II", f.read(8))
  16.  import struct import gzip # Read MNIST `label`. fpath

    = "./data_mnist/train-labels-idx1-ubyte.gz" with gzip.open(fpath, "rb") as f: magic_number, img_count = struct.unpack(">II", f.read(8)) labels = [] for i in range(img_count): label = str(struct.unpack("B", f.read(1))[0]) labels.append(label) # Write as csv. outpath = './csv/train-labels.csv' with open(outpath, "w") as f: f.write("\n".join(labels)) ˞ϥϕϧσʔλ
  17.  ˞ը૾σʔλ import struct import gzip # Read MNIST `images`.

    fpath = "./data_mnist/train-images-idx3-ubyte.gz" with gzip.open(fpath, "rb") as f: _, img_count = struct.unpack(">II", f.read(8)) rows, cols = struct.unpack(">II", f.read(8)) images = [] for i in range(img_count): binary = f.read(rows * cols) images.append(",".join([str(b) for b in binary])) # Write as csv. outpath = './csv/train-images.csv' with open(outpath, "w") as f: f.write("\n".join(images))
  18.  output as images Ұ෦Λը૾ʹग़ྗͯ͠$47݁Ռ͕ਖ਼͍͜͠ͱΛ֬ೝ͠·͢ with open("./csv/train-images.csv") as f: images

    = f.read().split("\n") for i, image in enumerate(images[:10]): with open("./image/%d.pgm" % i, "w") as f: s = "P2 28 28 255\n" s += " ".join(image.split(",")) f.write(s)
  19.  use Support Vector Machine ͦΕͰ͸47.ͰύλʔϯೝࣝΛߦ͍͍ͨͱࢥ͍·͢ from sklearn import svm

    # Load training data. with open("./csv/train-images.csv") as f: images = f.read().split("\n")[:500] with open("./csv/train-labels.csv") as f: labels = f.read().split("\n")[:500] # Convert data. images = [[int(i)/256 for i in image.split(",")] for image in images] labels = [int(l) for l in labels] # Use SVM. clf = svm.SVC() clf.fit(images, labels)
  20.  evaluate the model ςετσʔλΛ༻ֶ͍ͯशͨ݁͠Ռ͔Β༧ଌΛߦ͍ɺਫ਼౓ΛධՁ͠·͢ from sklearn import metrics test_images

    = #ಡΈࠐΈॲཧ͸লུʢࠓճ͸500݅ಡΈࠐΉʣ# test_labels = #ಡΈࠐΈॲཧ͸লུʢࠓճ͸500݅ಡΈࠐΉʣ# # Predict. predict = clf.predict(test_images) # Show results. ac_score = metrics.accuracy_score(test_labels, predict) print("Accuracy:", ac_score) cl_report = metrics.classification_report(test_labels, predict) print(cl_report)
  21.  ֶशσʔλΛˠ ݅ʹ૿΍͠·͢ from sklearn import svm # Load training

    data. with open("./csv/train-images.csv") as f: images = f.read().split("\n")[:5000] with open("./csv/train-labels.csv") as f: labels = f.read().split("\n")[:5000] # Convert data. images = [[int(i)/256 for i in image.split(",")] for image in images] labels = [int(l) for l in labels] # Use SVM. clf = svm.SVC() clf.fit(images, labels) how to improve the model
  22.  ֶश݁ՌΛอଘ͠ɺͷͪʹ8FCαʔόʔ͔Βར༻Ͱ͖ΔΑ͏ʹ͠·͢ save the training result # Save the training

    result. from sklearn.externals import joblib joblib.dump(clf, "./result/svm.pkl") result svm.pkl
  23.  ઌ΄Ͳอଘֶͨ͠श݁ՌΛɺผͷϓϩάϥϜ͔Βར༻͠·͢ use the model from sklearn.externals import joblib #

    Load the training result. pklfile = path.join("result", "svm.pkl") clf = joblib.load(pklfile) # Predict. predict = clf.predict([data]) number = str(predict.tolist()[0])