Save 37% off PRO during our Black Friday Sale! »

Machine Learning with Python

Machine Learning with Python

G's Academy Tokyo にて行なっているPython講義の3日目の資料です。3日目はPythonによる機械学習を学びます。SVMアルゴリズムを用いて、手書き文字を判定するアプリを作る演習付きです。
Presented by http://www.yoheim.net

2dfd5e0acd70adff0e2efc745d992396?s=128

Yohei Munesada

February 22, 2017
Tweet

Transcript

  1. Machine Learning with Python ʙ1ZUIPOͰػցֶशͷσϞΞϓϦΛ࡞Δʙ Yohei Munesada

  2.  फఆ༸ฏ ΉͶͩ͞Α͏΁͍ (`T"$"%&.:50,:0ϝϯλʔ ϒϩάIUUQXXXZPIFJNOFU about me

  3.  corse catalog 㾎 1ZUIPOͱ͸ 㾎 1ZUIPOجຊฤ 㾎 Ϟδϡʔϧͱύοέʔδ 㾎

    8FCεΫϨΠϐϯά 㾎 8FCαʔόʔ 㾎 1ZUIPOͱػցֶश Basic Advanced
  4.  a concept of the day ػցֶशͱ͍͑͹1ZUIPOͱݴΘΕΔͱ͜ΖΛମݧ͢Δ ػցֶशͷ࡞ۀεςοϓΛ1ZUIPOͰ࣮૷͢Δ ԋशΛ௨࣮ͯ͠૷ͯ͠ମݧ͢Δ

  5.  today’s demo https://goo.gl/KMgK3e

  6.  the contents ػցֶशͷ֓࿦ ػցֶशΛऔΓר͘1ZUIPOϥΠϒϥϦ 47.Λ༻͍ͨखॻ͖൑ఆΞϓϦΛ࡞Δ

  7.  Machine Leaning Basic

  8.  agenda ػցֶशͱ͸Կ͔ʁ ճؼͱ෼ྨ ڭࢣ͋Γͱڭࢣͳ͠ ػցֶशͷओͳͭͷछྨ ػցֶशͷεςοϓ

  9.  machine learning is … ? ?

  10.  machine learning is … ? େྔσʔλ͔ΒύλʔϯΛೝࣝ͢Δ ໌೔͕੖ΕΔ৚݅ͱ͸ʁ Πέϝϯʹೝఆ͞ΕΔύλʔϯͱ͸ʁ ໌೔ͷσΟζχʔϥϯυ͕ࠞΉ৚݅͸ʁ

    ύλʔϯΛ࢖ͬͯɺະ஌ͷσʔλΛ༧ଌ͢Δ ͋ͷಈ͖ɺ΋͠΍ෆ৹ऀͰ͸ɾɾɾʁ ࠓि຤ͷ͓ళ΁ͷདྷ٬਺͸ʁ དྷ݄ͷిྗ࢖༻ྔ͸ʁ
  11.  ػցֶशͷϞσϧ͸ɺճؼ 3FHSFTTJPO ͱ෼ྨ $MBTTJpDBUJPO ʹେผ͞Ε·͢ ճؼ(Regression) ෼ྨ(Classification) ೖྗ͞Εͨσʔλ͔Β਺஋Λ༧ଌ͢ΔϞσϧɻ ྫɿϢʔβʔͷߪೖֹ༧ଌɺϢʔβʔͷεϚϗར༻࣌ؒ༧ଌ

    ೖྗ͞Εͨσʔλ͔Β෼ྨΛ༧ଌ͢ΔϞσϧɻ෼ྨث $MBTTJpFS ͱ΋ݺ͹ΕΔɻ ྫɿϢʔβʔ͕ߪೖ͢Δ͔൱͔ɺը૾ʹೣؚ͕·ΕΔ͔൱͔ɺखॻ͖਺஋ͷ஋͸Կ͔ʁ regression and classification
  12.  ػցֶशͷϞσϧ͸ɺڭࢣ͋Γ 4VQFSWJTFE ͱڭࢣͳ͠ 6OTVQFSWJTFE ʹେผ͞Ε·͢ ڭࢣ͋Γ(Supervised) ڭࢣͳ͠(Unsupervised) ࣄલʹ༩͑ΒΕͨσʔλ τϨʔχϯάσʔλ

    Λ࢖ֶͬͯशΛߦ͍ɺͦΕΛ΋ͱʹ༧ଌ͢Δɻ ྫɿઢܗճؼɺϩδεςΟοΫճؼɺ47.ɺχϡʔϥϧωοτɺܾఆ໦ɺFUD ࣄલσʔλͳ͠ʹɺ༩͑ΒΕͨະ஌ͳσʔλ͔ΒԿΒ͔ͷຊ࣭తͳߏ଄Λಋ͖ग़͢ɻ ྫɿΫϥελϦϯάɺओ੒෼෼ੳɺFUD supervised and unsupervised
  13.  ڭࢣ͋Γ Supervised ڭࢣͳ͠ Unsupervised ճؼ Regression A B ෼ྨ

    Classification C D 4 types of machine learning ଞʹ΋ڧԽֶश΍Ҩ఻తΞϧΰϦζϜ΋
  14.  ref: ڧԽֶश(Reinforcement Learning) Ҩ఻తΞϧΰϦζϜ(Genetic Algorithm) http://yoheim.net/work/01_maze/ https://goo.gl/3gS3fq https://goo.gl/POq5el

  15.  ػձֶश͸ҎԼͷεςοϓͰ࣮ߦ͠·͢ steps of machine learning ?

  16.  ػձֶश͸ҎԼͷεςοϓͰ࣮ߦ͠·͢ σʔλΛೖखͯ͠ཧղ͢Δ ΞϧΰϦζϜΛܾΊΔ ಛ௃ྔΛܾΊΔ σʔλͷΫϨϯδϯά੔ܗΛ͢Δ ֶश͢Δ ϞσϧΛධՁ͢Δ ϞσϧΛվળ͢Δ steps

    of machine learning
  17.  for your information ˞Ҿ༻ɿӬా͞ΜߨٛࢿྉΑΓ

  18.  for your information ˞Ҿ༻ɿӬా͞ΜߨٛࢿྉΑΓ

  19.  for your information ˞Ҿ༻ɿӬా͞ΜߨٛࢿྉΑΓ

  20.  re:agenda ػցֶशͱ͸Կ͔ʁ ճؼͱ෼ྨ ڭࢣ͋Γͱڭࢣͳ͠ ػցֶशͷओͳͭͷछྨ ػցֶशͷεςοϓ

  21.  python libraries for ML

  22.  main python libraries

  23.  NumPy ߦྻ΍ϕΫτϧΛѻ͏͜ͱ͕Ͱ͖ɺ$ݴޠϨϕϧͰߴ଎ʹॲཧ͢Δ import numpy as np X = np.array([

    [5,0,1], [1,6,0], [0,0,1] ]) X.shape X.T E = np.eye(X.shape[0]) X.dot(E)
  24.  Matplotlib %ϓϩοτΛߦ͏ϥΠϒϥϦͰ͢ import numpy as np import matplotlib.pylab as

    plt x = np.linspace(-np.pi, np.pi, 200) plt.plot(x, np.sin(x)) plt.show()
  25.  http://scikit-learn.org/

  26.  hand-written digits recognition

  27.  agenda σʔλΛೖखͯ͠ཧղ͢Δ ΞϧΰϦζϜΛܾΊΔ ಛ௃ྔΛܾΊΔ σʔλͷΫϨϯδϯά੔ܗΛ͢Δ ֶश͢Δ ϞσϧΛධՁ͢Δ ϞσϧΛվળ͢Δ ֶश݁ՌΛอଘ͢Δ

    8FCαʔόʔ͔Βར༻͢Δ
  28.  http://yann.lecun.com/exdb/mnist/

  29.  samples

  30.  download MNIST data ./*45σʔλ͸8FC͔Βऔಘ͢Δ͜ͱ͕Ͱ͖·͢ http://yann.lecun.com/exdb/mnist/ data_mnist train-images-idx3-… train-labels-idx1-… …

  31.  original format

  32.  agenda σʔλΛೖखͯ͠ཧղ͢Δ ΞϧΰϦζϜΛܾΊΔ ಛ௃ྔΛܾΊΔ σʔλͷΫϨϯδϯά੔ܗΛ͢Δ ֶश͢Δ ϞσϧΛධՁ͢Δ ϞσϧΛվળ͢Δ ֶश݁ՌΛอଘ͢Δ

    8FCαʔόʔ͔Βར༻͢Δ
  33.  ڭࢣ͋Γ Supervised ڭࢣͳ͠ Unsupervised ճؼ Regression A B ෼ྨ

    Classification C D re: 4 types of machine learning
  34.  ڭࢣ͋Γ Supervised ڭࢣͳ͠ Unsupervised ճؼ Regression A B ෼ྨ

    Classification C D re: 4 types of machine learning
  35.  Support Vector Machine ڭࢣ͋Γɺ෼ྨ໰୊Λղ͘ΞϧΰϦζϜͰ͢ ˚ ˚ ˚ ˚ ˚

    ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋
  36.  Support Vector Machine ڭࢣ͋Γɺ෼ྨ໰୊Λղ͘ΞϧΰϦζϜͰ͢ ˚ ˚ ˚ ˚ ˚

    ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋
  37.  Support Vector Machine ڭࢣ͋Γɺ෼ྨ໰୊Λղ͘ΞϧΰϦζϜͰ͢ ˚ ˚ ˚ ˚ ˚

    ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋
  38.  Support Vector Machine ڭࢣ͋Γɺ෼ྨ໰୊Λղ͘ΞϧΰϦζϜͰ͢ ˚ ˚ ˚ ˚ ˚

    ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ Ϛʔδϯͷ࠷େԽ
  39.  Wanna dive into deep ? https://goo.gl/JE8Vvx ػցֶशͷجૅʢ਺ࣜฤʣ https://goo.gl/ATGxjM αϙʔτϕΫλʔϚγϯ

  40.  agenda σʔλΛೖखͯ͠ཧղ͢Δ ΞϧΰϦζϜΛܾΊΔ ಛ௃ྔΛܾΊΔ σʔλͷΫϨϯδϯά੔ܗΛ͢Δ ֶश͢Δ ϞσϧΛධՁ͢Δ ϞσϧΛվળ͢Δ ֶश݁ՌΛอଘ͢Δ

    8FCαʔόʔ͔Βར༻͢Δ
  41.  feature is ػցֶशͷΞϧΰϦζϜʹೖྗ͢Δಛ௃ྔ GFBUVSF Ͱ͢ ?

  42.  feature is ػցֶशͷΞϧΰϦζϜʹೖྗ͢Δಛ௃ྔ GFBUVSF Ͱ͢ Ϟς཰ ϕʔε஋ ਎௕ ϧοΫε

    ੑ֨ + = + + ൑ఆ݁Ռ ಛ௃ ಛ௃ ಛ௃
  43.  the features of MNIST ./*45ͷը૾ͷಛ௃ྔͱ͸ԿͰ͠ΐ͏͔ʁ    

                                                                                                                                                          ʜ ϐΫηϧ͕ͦΕͧΕಛ௃ྔʢίʣ
  44.  agenda σʔλΛೖखͯ͠ཧղ͢Δ ΞϧΰϦζϜΛܾΊΔ ಛ௃ྔΛܾΊΔ σʔλͷΫϨϯδϯά੔ܗΛ͢Δ ֶश͢Δ ϞσϧΛධՁ͢Δ ϞσϧΛվળ͢Δ ֶश݁ՌΛอଘ͢Δ

    8FCαʔόʔ͔Βར༻͢Δ
  45.  data pre-processing ./*45σʔλ͔ΒɺϞσϧʹಡΈࠐ·ͤΔσʔλΛ࡞Γ·͢ $47ܗࣜʹม׵͢Δ αϯϓϧը૾Λग़ྗͯ͠ΈΔʢՃ޻݁Ռͷ֬ೝʣ

  46.  convert to csv format ./*45ͷόΠφϦʔσʔλΛ$47ܗࣜʹม׵͠·͢ ͕ɺͦͷલʹʜ

  47.  binary data with Python όΠφϦσʔλΛTUSVDUϞδϡʔϧ ඪ४ϥΠϒϥϦ Ͱѻ͍·͢ https://docs.python.org/ja/3/library/struct.html

  48.  binary data with Python όΠφϦσʔλΛTUSVDUϞδϡʔϧ ඪ४ϥΠϒϥϦ Ͱѻ͍·͢ import struct

    f = open(“file.binary", "rb") num1, num2 = struct.unpack(">II", f.read(8))
  49.  big-endian and little-endian (ࢀߟ) ΤσΟΞϯʹ͍ͭͯɿhttp://www.ertl.jp/~takayuki/readings/info/no05.html 100

  50.  convert to csv format ./*45ͷόΠφϦʔσʔλΛ$47ܗࣜʹม׵͠·͢

  51.  ˞ϥϕϧσʔλ

  52.  import struct import gzip # Read MNIST `label`. fpath

    = "./data_mnist/train-labels-idx1-ubyte.gz" with gzip.open(fpath, "rb") as f: magic_number, img_count = struct.unpack(">II", f.read(8)) labels = [] for i in range(img_count): label = str(struct.unpack("B", f.read(1))[0]) labels.append(label) # Write as csv. outpath = './csv/train-labels.csv' with open(outpath, "w") as f: f.write("\n".join(labels)) ˞ϥϕϧσʔλ
  53.  ˞ը૾σʔλ

  54.  ˞ը૾σʔλ import struct import gzip # Read MNIST `images`.

    fpath = "./data_mnist/train-images-idx3-ubyte.gz" with gzip.open(fpath, "rb") as f: _, img_count = struct.unpack(">II", f.read(8)) rows, cols = struct.unpack(">II", f.read(8)) images = [] for i in range(img_count): binary = f.read(rows * cols) images.append(",".join([str(b) for b in binary])) # Write as csv. outpath = './csv/train-images.csv' with open(outpath, "w") as f: f.write("\n".join(images))
  55.  output as images Ұ෦Λը૾ʹग़ྗͯ͠$47݁Ռ͕ਖ਼͍͜͠ͱΛ֬ೝ͠·͢ with open("./csv/train-images.csv") as f: images

    = f.read().split("\n") for i, image in enumerate(images[:10]): with open("./image/%d.pgm" % i, "w") as f: s = "P2 28 28 255\n" s += " ".join(image.split(",")) f.write(s)
  56.  output as images Ұ෦Λը૾ʹग़ྗͯ͠$47݁Ռ͕ਖ਼͍͜͠ͱΛ֬ೝ͠·͢

  57.  agenda σʔλΛೖखͯ͠ཧղ͢Δ ΞϧΰϦζϜΛܾΊΔ ಛ௃ྔΛܾΊΔ σʔλͷΫϨϯδϯά੔ܗΛ͢Δ ֶश͢Δ ϞσϧΛධՁ͢Δ ϞσϧΛվળ͢Δ ֶश݁ՌΛอଘ͢Δ

    8FCαʔόʔ͔Βར༻͢Δ
  58.  use Support Vector Machine ͦΕͰ͸47.ͰύλʔϯೝࣝΛߦ͍͍ͨͱࢥ͍·͢ from sklearn import svm

    # Load training data. with open("./csv/train-images.csv") as f: images = f.read().split("\n")[:500] with open("./csv/train-labels.csv") as f: labels = f.read().split("\n")[:500] # Convert data. images = [[int(i)/256 for i in image.split(",")] for image in images] labels = [int(l) for l in labels] # Use SVM. clf = svm.SVC() clf.fit(images, labels)
  59.  agenda σʔλΛೖखͯ͠ཧղ͢Δ ΞϧΰϦζϜΛܾΊΔ ಛ௃ྔΛܾΊΔ σʔλͷΫϨϯδϯά੔ܗΛ͢Δ ֶश͢Δ ϞσϧΛධՁ͢Δ ϞσϧΛվળ͢Δ ֶश݁ՌΛอଘ͢Δ

    8FCαʔόʔ͔Βར༻͢Δ
  60.  evaluate the model ςετσʔλΛ༻ֶ͍ͯशͨ݁͠Ռ͔Β༧ଌΛߦ͍ɺਫ਼౓ΛධՁ͠·͢ from sklearn import metrics test_images

    = #ಡΈࠐΈॲཧ͸লུʢࠓճ͸500݅ಡΈࠐΉʣ# test_labels = #ಡΈࠐΈॲཧ͸লུʢࠓճ͸500݅ಡΈࠐΉʣ# # Predict. predict = clf.predict(test_images) # Show results. ac_score = metrics.accuracy_score(test_labels, predict) print("Accuracy:", ac_score) cl_report = metrics.classification_report(test_labels, predict) print(cl_report)
  61.  ςετσʔλΛ༻ֶ͍ͯशͨ݁͠Ռ͔Β༧ଌΛߦ͍ɺਫ਼౓ΛධՁ͠·͢ evaluate the model ద߹཰ͱ࠶ݱ཰ɿIUUQEIBUFOBOFKQ;FMMJKQ

  62.  agenda σʔλΛೖखͯ͠ཧղ͢Δ ΞϧΰϦζϜΛܾΊΔ ಛ௃ྔΛܾΊΔ σʔλͷΫϨϯδϯά੔ܗΛ͢Δ ֶश͢Δ ϞσϧΛධՁ͢Δ ϞσϧΛվળ͢Δ ֶश݁ՌΛอଘ͢Δ

    8FCαʔόʔ͔Βར༻͢Δ
  63.  Ͳ͏΍ͬͯվળ͠·͔͢ʁ how to improve the model

  64.  τϨʔχϯάσʔλΛ૿΍͢ ಛ௃Λ૿΍͢ ಛ௃ΛݮΒ͢ ਖ਼ଇԽ߲ͷӨڹ౓ʢЕʣΛ૿΍͢ ਖ਼ଇԽ߲ͷӨڹ౓ʢЕʣΛݮΒ͢ σʔλͷਖ਼نԽ how to improve

    the model
  65.  τϨʔχϯάσʔλΛ૿΍͢ ಛ௃Λ૿΍͢ ಛ௃ΛݮΒ͢ ਖ਼ଇԽ߲ͷӨڹ౓ʢЕʣΛ૿΍͢ ਖ਼ଇԽ߲ͷӨڹ౓ʢЕʣΛݮΒ͢ σʔλͷਖ਼نԽ how to improve

    the model
  66.  ֶशσʔλΛˠ ݅ʹ૿΍͠·͢ from sklearn import svm # Load training

    data. with open("./csv/train-images.csv") as f: images = f.read().split("\n")[:5000] with open("./csv/train-labels.csv") as f: labels = f.read().split("\n")[:5000] # Convert data. images = [[int(i)/256 for i in image.split(",")] for image in images] labels = [int(l) for l in labels] # Use SVM. clf = svm.SVC() clf.fit(images, labels) how to improve the model
  67.  evaluate the training result ֶशσʔλΛˠ ݅ʹ૿΍͢ͱɺਫ਼౓͕ˠʹվળ͠·ͨ͠

  68.  agenda σʔλΛೖखͯ͠ཧղ͢Δ ΞϧΰϦζϜΛܾΊΔ ಛ௃ྔΛܾΊΔ σʔλͷΫϨϯδϯά੔ܗΛ͢Δ ֶश͢Δ ϞσϧΛධՁ͢Δ ϞσϧΛվળ͢Δ ֶश݁ՌΛอଘ͢Δ

    8FCαʔόʔ͔Βར༻͢Δ
  69.  ֶश݁ՌΛอଘ͠ɺͷͪʹ8FCαʔόʔ͔Βར༻Ͱ͖ΔΑ͏ʹ͠·͢ save the training result # Save the training

    result. from sklearn.externals import joblib joblib.dump(clf, "./result/svm.pkl") result svm.pkl
  70.  agenda σʔλΛೖखͯ͠ཧղ͢Δ ΞϧΰϦζϜΛܾΊΔ ಛ௃ྔΛܾΊΔ σʔλͷΫϨϯδϯά੔ܗΛ͢Δ ֶश͢Δ ϞσϧΛධՁ͢Δ ϞσϧΛվળ͢Δ ֶश݁ՌΛอଘ͢Δ

    8FCαʔόʔ͔Βར༻͢Δ
  71.  ઌ΄Ͳอଘֶͨ͠श݁ՌΛɺผͷϓϩάϥϜ͔Βར༻͠·͢ use the model from sklearn.externals import joblib #

    Load the training result. pklfile = path.join("result", "svm.pkl") clf = joblib.load(pklfile) # Predict. predict = clf.predict([data]) number = str(predict.tolist()[0])
  72.  σʔλΛೖखͯ͠ཧղ͢Δ ΞϧΰϦζϜΛܾΊΔ ಛ௃ྔΛܾΊΔ σʔλͷΫϨϯδϯά੔ܗΛ͢Δ ֶश͢Δ ϞσϧΛධՁ͢Δ ϞσϧΛվળ͢Δ ֶश݁ՌΛอଘ͢Δ 8FCαʔόʔ͔Βར༻͢Δ

    re:agenda
  73.  2nd demo app खॻ͖਺஋൑ఆΞϓϦΛ࡞Ζ͏

  74.  today’s demo https://goo.gl/KMgK3e

  75.  how it works 'MBTLΞϓϦέʔγϣϯ͔Βอଘֶͨ͠श݁ՌΛར༻͠·͢ http://localhost:5000 index.html /api/judge  ?data=0,0,255,86,0,0,255,10,…

    svm.pkl judge!! “3”
  76.  how to create ϨϙδτϦΛΫϩʔϯ͢Δ ϥΠϒϥϦҰཡΛಡΈࠐΉ ىಈͯ͠ΈΔ جૅ՝୊Λ࣮૷͢Δ ಈ࡞ςετΛ͢Δ https://github.com/yoheimune-python-lecture/hand-written-digit-recognition

  77.  Finish !

  78.  summary

  79.  summary 1ZUIPOºػցֶशͷମݧΛ͠·ͨ͠ 1ZUIPOͷʮ਺ߦͰ΍Γ͍ͨ͜ͱ͕Ͱ͖Δʯ΋ମݧͰ͖·͔ͨ͠ʁ ίʔυ͸ॻ͔ͳ͖Ό਎ʹ͔ͭͳ͍ʂ ָ͍͠1ZUIPOϥΠϑΛʂ

  80. https://www.coursera.org/learn/machine-learning

  81.  next step https://goo.gl/ccYxtS https://goo.gl/POq5el

  82.  next step https://goo.gl/EXcphV https://goo.gl/51viVt

  83.  https://goo.gl/JE8Vvx next step

  84.  next step https://goo.gl/J6jXhp

  85. Thank you http://www.yoheim.net @yoheiMune IUUQTqJDLSQN[N2,