Slide 1

Slide 1 text

ϚϚͷҰาΛࢧ͑Δ σʔλ෼ੳίϯϖʹ͓͍ͯ ಛ௃ྔ؅ཧʹർฐ͍ͯ͠Δશਓྨʹ఻͍͑ͨ૝͍ ʙֶशɾਪ࿦ύΠϓϥΠϯΛఴ͑ͯʙ $POOFIJUP*OD໺ᖒ఩র $POOFIJUP.BSDIÉWPMʙػցֶशɾσʔλ෼ੳࢢʙ

Slide 2

Slide 2 text

͜Μʹͪ͸ʂ ϚϚͷҰาΛࢧ͑Δ

Slide 3

Slide 3 text

͍͖ͳΓͰ͕͢ ϚϚͷҰาΛࢧ͑Δ

Slide 4

Slide 4 text

σʔλ෼ੳίϯϖʢ,BHHMF 4*(/"5&ͳͲʣ ฉ͍ͨ͜ͱ͋Δਓʙ! ϚϚͷҰาΛࢧ͑Δ

Slide 5

Slide 5 text

σʔλ෼ੳίϯϖʹࢀՃͨ͜͠ͱ͋Δਓʙ! ϚϚͷҰาΛࢧ͑Δ

Slide 6

Slide 6 text

ಛ௃ྔͷ؅ཧͬͯͲ͏ͯ͠·͔͢ʁ ʢςʔϒϧσʔλʹ͓͍ͯʣ ϚϚͷҰาΛࢧ͑Δ

Slide 7

Slide 7 text

Α͋͘Δύλʔϯʢ࣮ମݧʣ ϚϚͷҰาΛࢧ͑Δ

Slide 8

Slide 8 text

Α͋͘ΔύλʔϯʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ w ಛ௃ྔ࡞Δ DPM<Z " # $ %>ˠ<Z " # $ % & '> # e.g) train['A'] = train['A'].fillna(0) train['B'] = np.log1p(train['B']) train['E'] = train['A'] + train['B'] df_group = train.groupby('D')['E'].mean() train['F'] = train['D'].map(df_group) <> ɾ ɾ ɾ

Slide 9

Slide 9 text

Α͋͘ΔύλʔϯʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ w ࢖͏ಛ௃ྔͷΧϥϜ͚ͩࢦఆ͢Δ # e.g) feat_col = ['A', 'C', 'D', 'E', 'F', 'J'] x_train = train[feat_col] y_train = train['y'] # e.g) clf.fit(x_train, y_train) w ֶशͤ͞Δ <> <> ɾ ɾ ɾ

Slide 10

Slide 10 text

Α͋͘ΔύλʔϯʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ w ࢖͏ಛ௃ྔͷΧϥϜ͚ͩࢦఆ͢Δ # e.g) feat_col = ['A', 'C', 'D', 'E', 'F', 'J'] x_train = train[feat_col] y_train = train['y'] # e.g) clf.fit(x_train, y_train) w ֶशͤ͞Δ <> <> ɾ ɾ ɾ ͋Ε b'`ͬͯͲΜͳಛ௃ྔ͚ͩͬʁ

Slide 11

Slide 11 text

Α͋͘ΔύλʔϯʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ # e.g) train['A'] = train['A'].fillna(0) train['B'] = np.log1p(train['B']) train['E'] = train['A'] + train['B'] df_group = train.groupby('D')['E'].mean() train['F'] = train['D'].map(df_group) <> # e.g) feat_col = ['A', 'C', 'D', 'E', 'F', 'J'] x_train = train[feat_col] y_train = train['y'] # e.g) clf.fit(x_train, y_train) <> <>

Slide 12

Slide 12 text

Α͋͘ΔύλʔϯʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ # e.g) train['A'] = train['A'].fillna(0) train['B'] = np.log1p(train['B']) train['E'] = train['A'] + train['B'] df_group = train.groupby('D')['E'].mean() train['F'] = train['D'].map(df_group) <> # e.g) feat_col = ['A', 'C', 'D', 'E', 'F', 'J'] x_train = train[feat_col] y_train = train['y'] # e.g) clf.fit(x_train, y_train) <> <> ݟ͚ͭͨʂ ʢOPUFCPPLͷ্ͷํʣ

Slide 13

Slide 13 text

Α͋͘ΔύλʔϯʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ # e.g) train['A'] = train['A'].fillna(0) train['B'] = np.log1p(train['B']) train['E'] = train['A'] + train['B'] df_group = train.groupby('D')['E'].mean() train['F'] = train['D'].map(df_group) <> # e.g) feat_col = ['A', 'C', 'D', 'E', 'F', 'J'] x_train = train[feat_col] y_train = train['y'] # e.g) clf.fit(x_train, y_train) <> <> ݟ͚ͭͨʂ ʢOPUFCPPLͷ্ͷํʣ ಛ௃ྔ͕গͳ͍৔߹͸·ͩϚγ͕ͩɺ ଟ͘ͳͬͯ͘ΔͱͲΜͳܭࢉͰٻΊͨ ಛ௃ྔ͔ͩͬͨΛ͍͍ͪͪߟ͑Δʢ୳͢ʣ ͷ͸݁ߏେมͩ͠ɺ͕͔͔࣌ؒΔ

Slide 14

Slide 14 text

Α͋͘Δύλʔϯͦͷʢ࣮ମݧʣ ϚϚͷҰาΛࢧ͑Δ

Slide 15

Slide 15 text

Α͋͘ΔύλʔϯͦͷʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ Αͬ͠Ό͊ʂΊͬͪΌྑ͍είΞͰͨͥʙʙʙ ͜ͷOPUFCPPLΛ%VQMJDBUFͯ͠ɺ΋ͬͱྑ͍Ϟσϧ࡞ͬͪΌ͏ͧʂ

Slide 16

Slide 16 text

ҰํɺOPUFCPPLͷத਎͸ʜ ϚϚͷҰาΛࢧ͑Δ

Slide 17

Slide 17 text

Α͋͘ΔύλʔϯͦͷʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ <> import numpy as np import pandas as pd OPUFCPPLͷத਎ ɾ ɾ ɾ <> submission.to_csv('submission.csv', index=False)

Slide 18

Slide 18 text

Α͋͘ΔύλʔϯͦͷʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ <> import numpy as np import pandas as pd OPUFCPPLͷத਎ ɾ ɾ ɾ ɾ ɾ ɾ <> submission.to_csv('submission.csv', index=False)

Slide 19

Slide 19 text

Α͋͘ΔύλʔϯͦͷʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ <> import numpy as np import pandas as pd OPUFCPPLͷத਎ ɾ ɾ ɾ ɾ ɾ ɾ <> submission.to_csv('submission.csv', index=False)

Slide 20

Slide 20 text

Α͋͘ΔύλʔϯͦͷʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ <> import numpy as np import pandas as pd OPUFCPPLͷத਎ ɾ ɾ ɾ ɾ ɾ ɾ <> submission.to_csv('submission.csv', index=False) ಉ͡ܭࢉΛԿ౓΋΍Βͳ͍ͱ͍͚ͳ͍ ʴ ºʢ ʣˠແବ

Slide 21

Slide 21 text

Α͋͘ΔύλʔϯͦͷʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ ౓ॏͳΔ%VQMJDBUFʹΑΓɺOPUFCPPL஍ࠈʹؕΔՄೳੑ΋ʜ dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC ʜʜʜʜʜ

Slide 22

Slide 22 text

Α͋͘ΔύλʔϯͦͷʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ ౓ॏͳΔ%VQMJDBUFʹΑΓɺOPUFCPPL஍ࠈʹؕΔՄೳੑ΋ʜ dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC ʜʜʜʜʜ

Slide 23

Slide 23 text

ϚϚͷҰาΛࢧ͑Δ ࠓ೔࿩͢͜ͱ

Slide 24

Slide 24 text

ϚϚͷҰาΛࢧ͑Δ σʔλ෼ੳίϯϖʹ͓͍ͯ ಛ௃ྔ؅ཧʹർฐ͍ͯ͠Δશਓྨʹ఻͍͑ͨ૝͍ ʙֶशɾਪ࿦ύΠϓϥΠϯΛఴ͑ͯʙ

Slide 25

Slide 25 text

ΞδΣϯμ ࣗݾ঺հ ಛ௃ྔ؅ཧʹ͍ͭͯ ‣ ྻ͝ͱʹQJDLMFϑΝΠϧͰಛ௃ྔΛ؅ཧ ‣ ಛ௃ྔੜ੒࣌ɺಉ࣌ʹϝϞϑΝΠϧ΋ੜ੒ ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ‣ ίϚϯυҰൃͰֶशˠ4VCNJUϑΝΠϧ࡞੒·ͰΛ࣮ߦ ‣ ֶशʹ࢖༻ͨ͠ಛ௃ྔ΍Ϟσϧύϥϝʔλ͸MPHͱҰॹʹอଘ ‣ TIBQΛ༻͍ͯಛ௃ྔͷߩݙ౓ΛՄࢹԽ͠ɺ࣍ճֶश࣌ͷצॴΛݟ͚ͭΔ ϚϚͷҰาΛࢧ͑Δ

Slide 26

Slide 26 text

ΞδΣϯμ ࣗݾ঺հ ಛ௃ྔ؅ཧʹ͍ͭͯ ‣ ྻ͝ͱʹQJDLMFϑΝΠϧͰಛ௃ྔΛ؅ཧ ‣ ಛ௃ྔੜ੒࣌ɺಉ࣌ʹϝϞϑΝΠϧ΋ੜ੒ ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ‣ ίϚϯυҰൃͰֶशˠ4VCNJUϑΝΠϧ࡞੒·ͰΛ࣮ߦ ‣ ֶशʹ࢖༻ͨ͠ಛ௃ྔ΍Ϟσϧύϥϝʔλ͸MPHͱҰॹʹอଘ ‣ TIBQΛ༻͍ͯಛ௃ྔͷߩݙ౓ΛՄࢹԽ͠ɺ࣍ճֶश࣌ͷצॴΛݟ͚ͭΔ ϚϚͷҰาΛࢧ͑Δ ݰਓͷ஌ܙΛ͓आΓͨ͠Β ΊͬͪΌΑ͔ͬͨ ʢ˞ʣ ͍ͬͯ͏࿩Λ͠·͢ ʢ˞ʣ͋͘·Ͱओ؍Ͱ͢

Slide 27

Slide 27 text

ࣗݾ঺հ ϚϚͷҰาΛࢧ͑Δ

Slide 28

Slide 28 text

ࣗݾ঺հ ϚϚͷҰาΛࢧ͑Δ ໊લɿ໺ᖒ఩রʢ/P[BXB5BLBOPCVʣ ॴଐɿίωώτגࣜձࣾ ɹɹɿ͔ͨͺ͍!UBLBQZ w ʙίωώτʹ.-ΤϯδχΞͱͯ͠+0*/ w ػցֶशʢ/-1ɺਪનγεςϜʣΛϝΠϯʹ΍ΓͭͭΠϯϑϥʢ"84ʣ΋ษڧத w ,BHHMFͨ͠ΓɺϒϩάʢIUUQTXXXUBLBQZXPSLʣॻ͍ͨΓɺ໺ٿͨ͠Γɺ ϥʔϝϯ৯΂ͨΓ͍ͯ͠·͢

Slide 29

Slide 29 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ˞ԼههࣄΛࢀߟʹ͍͖ͤͯͨͩ͞·ͨ͠ɻ ɾ,BHHMFͰ࢖͑Δ'FBUIFSܗࣜΛར༻ͨ͠ಛ௃ྔ؅ཧ๏ IUUQTBNBMPHIBUFCMPKQFOUSZLBHHMFGFBUVSFNBOBHFNFOU ‣ ྻ͝ͱʹQJDLMFϑΝΠϧͰಛ௃ྔΛ؅ཧ ‣ ಛ௃ྔੜ੒࣌ɺಉ࣌ʹϝϞϑΝΠϧ΋ੜ੒

Slide 30

Slide 30 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ˞ԼههࣄΛࢀߟʹ͍͖ͤͯͨͩ͞·ͨ͠ɻ ɾ,BHHMFͰ࢖͑Δ'FBUIFSܗࣜΛར༻ͨ͠ಛ௃ྔ؅ཧ๏ IUUQTBNBMPHIBUFCMPKQFOUSZLBHHMFGFBUVSFNBOBHFNFOU ‣ ྻ͝ͱʹQJDLMFϑΝΠϧͰಛ௃ྔΛ؅ཧ ‣ ಛ௃ྔੜ੒࣌ɺಉ࣌ʹϝϞϑΝΠϧ΋ੜ੒ ࠷ॳʹΠϝʔδΛڞ༗͠·͢

Slide 31

Slide 31 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ lྻ͝ͱzʹಛ௃ྔΛQJDLMFϑΝΠϧͰ؅ཧ͢Δ

Slide 32

Slide 32 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ 4VSWJWFE 1DMBTT 4FY "HF &NCBSLFE NBMF 4 NBMF $ GFNBMF $ NBMF $ GFNBMF $ GFNBMF 4 NBMF 4 lྻ͝ͱzʹಛ௃ྔΛQJDLMFϑΝΠϧͰ؅ཧ͢Δ

Slide 33

Slide 33 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ 4VSWJWFE 1DMBTT 4FY "HF &NCBSLFE NBMF 4 NBMF $ GFNBMF $ NBMF $ GFNBMF $ GFNBMF 4 NBMF 4 TVSWJWFE@USBJOQLM QDMBTT@USBJOQLM QDMBTT@UFTUQLM TFY@USBJOQLM TFY@UFTUQLM BHF@USBJOQLM BHF@UFTUQLM FNCBSLFE@USBJOQLM FNCBSLFE@UFTUQLM lྻ͝ͱzʹಛ௃ྔΛQJDLMFϑΝΠϧͰ؅ཧ͢Δ

Slide 34

Slide 34 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ಛ௃ྔੜ੒࣌ɺಉ࣌ʹಛ௃ྔϝϞΛ࡞੒͢Δ

Slide 35

Slide 35 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ಛ௃ྔੜ੒࣌ɺಉ࣌ʹಛ௃ྔϝϞΛ࡞੒͢Δ

Slide 36

Slide 36 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ಛ௃ྔੜ੒࣌ɺಉ࣌ʹಛ௃ྔϝϞΛ࡞੒͢Δ ݁ߏେมͦ͏ɾɾɾ

Slide 37

Slide 37 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ಛ௃ྔੜ੒࣌ɺಉ࣌ʹಛ௃ྔϝϞΛ࡞੒͢Δ Ͱ΋ɺQZUIPOεΫϦϓτΛ࣮ͭߦ͢Δ͚ͩɻ

Slide 38

Slide 38 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ IPHFQZΛίϚϯυϥΠϯ͔Β࣮ߦ͢Δ͚ͩ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass'] self.test['Pclass'] = test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') class Sex(Feature): def create_features(self): self.train['Sex'] = train['Sex'] self.test['Sex'] = test['Sex'] create_memo('Sex','ੑผ') class Age(Feature): def create_features(self): self.train['Age'] = train['Age'] self.test['Age'] = test['Age'] create_memo('Age','೥ྸ') class Age_mis_val_median(Feature): def create_features(self): self.train['Age_mis_val_median'] = train['Age'].fillna(train['Age'].median()) self.test['Age_mis_val_median'] = test['Age'].fillna(test['Age'].median()) create_memo('Age_mis_val_median','೥ྸͷܽଛ஋Λதԝ஋Ͱิ׬ͨ͠΋ͷ')

Slide 39

Slide 39 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass'] self.test['Pclass'] = test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') class Sex(Feature): def create_features(self): self.train['Sex'] = train['Sex'] self.test['Sex'] = test['Sex'] create_memo('Sex','ੑผ') class Age(Feature): def create_features(self): self.train['Age'] = train['Age'] self.test['Age'] = test['Age'] create_memo('Age','೥ྸ') class Age_mis_val_median(Feature): def create_features(self): self.train['Age_mis_val_median'] = train['Age'].fillna(train['Age'].median()) self.test['Age_mis_val_median'] = test['Age'].fillna(test['Age'].median()) create_memo('Age_mis_val_median','೥ྸͷܽଛ஋Λதԝ஋Ͱิ׬ͨ͠΋ͷ') IPHFQZΛίϚϯυϥΠϯ͔Β࣮ߦ͢Δ͚ͩ

Slide 40

Slide 40 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass'] self.test['Pclass'] = test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') class Sex(Feature): def create_features(self): self.train['Sex'] = train['Sex'] self.test['Sex'] = test['Sex'] create_memo('Sex','ੑผ') class Age(Feature): def create_features(self): self.train['Age'] = train['Age'] self.test['Age'] = test['Age'] create_memo('Age','೥ྸ') class Age_mis_val_median(Feature): def create_features(self): self.train['Age_mis_val_median'] = train['Age'].fillna(train['Age'].median()) self.test['Age_mis_val_median'] = test['Age'].fillna(test['Age'].median()) create_memo('Age_mis_val_median','೥ྸͷܽଛ஋Λதԝ஋Ͱิ׬ͨ͠΋ͷ') ֤ಛ௃ྔ IPHFQZΛίϚϯυϥΠϯ͔Β࣮ߦ͢Δ͚ͩ

Slide 41

Slide 41 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass'] self.test['Pclass'] = test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') class Sex(Feature): def create_features(self): self.train['Sex'] = train['Sex'] self.test['Sex'] = test['Sex'] create_memo('Sex','ੑผ') class Age(Feature): def create_features(self): self.train['Age'] = train['Age'] self.test['Age'] = test['Age'] create_memo('Age','೥ྸ') class Age_mis_val_median(Feature): def create_features(self): self.train['Age_mis_val_median'] = train['Age'].fillna(train['Age'].median()) self.test['Age_mis_val_median'] = test['Age'].fillna(test['Age'].median()) create_memo('Age_mis_val_median','೥ྸͷܽଛ஋Λதԝ஋Ͱิ׬ͨ͠΋ͷ') ಛ௃ྔϝϞϑΝΠϧ IPHFQZΛίϚϯυϥΠϯ͔Β࣮ߦ͢Δ͚ͩ

Slide 42

Slide 42 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass'] self.test['Pclass'] = test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') DSFBUF@NFNPͷॲཧ֓ཁ

Slide 43

Slide 43 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass'] self.test['Pclass'] = test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') DSFBUF@NFNPͷॲཧ֓ཁ # ಛ௃ྔϝϞcsvϑΝΠϧ࡞੒ def create_memo(col_name, desc): file_path = Feature.dir + '/_features_memo.csv' if not os.path.isfile(file_path): with open(file_path,"w"):pass with open(file_path, 'r+') as f: lines = f.readlines() lines = [line.strip() for line in lines] # ॻ͖ࠐ΋͏ͱ͍ͯ͠Δಛ௃ྔ͕͢Ͱʹॻ͖ࠐ·Ε͍ͯͳ͍͔νΣοΫ col = [line for line in lines if line.split(',')[0] == col_name] if len(col) != 0:return writer = csv.writer(f) writer.writerow([col_name, desc])

Slide 44

Slide 44 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass'] self.test['Pclass'] = test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') DSFBUF@NFNPͷॲཧ֓ཁ # ಛ௃ྔϝϞcsvϑΝΠϧ࡞੒ def create_memo(col_name, desc): file_path = Feature.dir + '/_features_memo.csv' if not os.path.isfile(file_path): with open(file_path,"w"):pass with open(file_path, 'r+') as f: lines = f.readlines() lines = [line.strip() for line in lines] # ॻ͖ࠐ΋͏ͱ͍ͯ͠Δಛ௃ྔ͕͢Ͱʹॻ͖ࠐ·Ε͍ͯͳ͍͔νΣοΫ col = [line for line in lines if line.split(',')[0] == col_name] if len(col) != 0:return writer = csv.writer(f) writer.writerow([col_name, desc])

Slide 45

Slide 45 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass'] self.test['Pclass'] = test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') DSFBUF@NFNPͷॲཧ֓ཁ # ಛ௃ྔϝϞcsvϑΝΠϧ࡞੒ def create_memo(col_name, desc): file_path = Feature.dir + '/_features_memo.csv' if not os.path.isfile(file_path): with open(file_path,"w"):pass with open(file_path, 'r+') as f: lines = f.readlines() lines = [line.strip() for line in lines] # ॻ͖ࠐ΋͏ͱ͍ͯ͠Δಛ௃ྔ͕͢Ͱʹॻ͖ࠐ·Ε͍ͯͳ͍͔νΣοΫ col = [line for line in lines if line.split(',')[0] == col_name] if len(col) != 0:return writer = csv.writer(f) writer.writerow([col_name, desc]) $47ܗࣜͰอଘ͓ͯ͘͠ͱ(JUIVC͔Βࢀর͠΍͍͢ ʢ΋ͪΖΜɺ&YDFM΍/VNCFSTͱ͍ͬͨΞϓϦέʔγϣϯ͔ΒͰ΋៉ྷʹݟ͑Δʣ

Slide 46

Slide 46 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ৽͍͠ಛ௃ྔΛ࡞੒͢Δ৔߹

Slide 47

Slide 47 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Family_Size(Feature): def create_features(self): self.train['Family_Size'] = train['Parch'] + train['SibSp'] self.test['Family_Size'] = test['Parch'] + test['SibSp'] create_memo('Family_Size','Ո଒ͷ૯਺') IPHFQZʹ৽͍͠ಛ௃ྔੜ੒ॲཧΛهड़ ৽͍͠ಛ௃ྔΛ࡞੒͢Δ৔߹

Slide 48

Slide 48 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Family_Size(Feature): def create_features(self): self.train['Family_Size'] = train['Parch'] + train['SibSp'] self.test['Family_Size'] = test['Parch'] + test['SibSp'] create_memo('Family_Size','Ո଒ͷ૯਺') QZUIPOIPHFQZ ৽͍͠ಛ௃ྔΛ࡞੒͢Δ৔߹ IPHFQZʹ৽͍͠ಛ௃ྔੜ੒ॲཧΛهड़

Slide 49

Slide 49 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Family_Size(Feature): def create_features(self): self.train['Family_Size'] = train['Parch'] + train['SibSp'] self.test['Family_Size'] = test['Parch'] + test['SibSp'] create_memo('Family_Size','Ո଒ͷ૯਺') QZUIPOIPHFQZ ৽͍͠ಛ௃ྔͷΈੜ੒ ৽͍͠ಛ௃ྔΛ࡞੒͢Δ৔߹ IPHFQZʹ৽͍͠ಛ௃ྔੜ੒ॲཧΛهड़

Slide 50

Slide 50 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ σʔλΛಡΈࠐΉࡍ͸ɺಛ௃ྔΛࢦఆͯ͠ϩʔυ͢Δ͚ͩ # ಛ௃ྔͷࢦఆ features = [ "age_mis_val_median", "family__size", "cabin", "fare_mis_val_median" ] df = [pd.read_pickle(FEATURE_DIR_NAME + f’{f}_train.pkl') for f in features] df = pd.concat(df, axis=1)

Slide 51

Slide 51 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ σʔλΛಡΈࠐΉࡍ͸ɺಛ௃ྔΛࢦఆͯ͠ϩʔυ͢Δ͚ͩ # ಛ௃ྔͷࢦఆ features = [ "age_mis_val_median", "family__size", "cabin", "fare_mis_val_median" ] df = [pd.read_pickle(FEATURE_DIR_NAME + f’{f}_train.pkl') for f in features] df = pd.concat(df, axis=1) Կ͕خ͔͔ͬͨ͠

Slide 52

Slide 52 text

ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ w ͭͷεΫϦϓτϑΝΠϧʹಛ௃ྔੜ੒Λ·ͱΊΔ͜ͱͰɺಉ͡ܭࢉΛෳ਺ ճ࣮ߦ͢Δ͜ͱΛආ͚ɺ࣌ؒΛ༗ޮ׆༻Ͱ͖Δɻ ɹˠಛ௃ྔͷ࠶ݱੑ΋୲อɻ w ಛ௃ྔͷϝϞΛಉ࣌ʹੜ੒͢Δ͜ͱͰʮ͜ͷಛ௃ྔͳΜ͚ͩͬʁʯͱ಄Λ࢖ ΘͣʹࡁΜͩɻ w ಛ௃ྔΛྻ͝ͱʹ؅ཧ͢Δ͜ͱͰऔΓճָ͕͠ʹͳΔɻ ɹˠQJDLMFϑΝΠϧͩͱอଘ΋ಡΈࠐΈ΋଎͍ʂ ɹˠಛ௃ྔ͕๲େʹͳΔ৔߹͸ɺ͋Δఔ౓ͷ୯ҐͰ·ͱΊͯ؅ཧ͢Δํ͕ྑ͍͔΋ɻ

Slide 53

Slide 53 text

ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ˞Լهॻ੶ʹܝࡌ͞Ε͍ͯΔύΠϓϥΠϯΛࢀߟʹ͠·ͨ͠ɻ ɾ,BHHMFͰউͭσʔλ෼ੳͷٕज़ IUUQTHJIZPKQCPPL ‣ ίϚϯυҰൃͰֶशˠ4VCNJUϑΝΠϧ࡞੒·ͰΛ࣮ߦ ‣ ֶशʹ࢖༻ͨ͠ಛ௃ྔ΍Ϟσϧύϥϝʔλ͸MPHͱҰॹʹอଘ ‣ TIBQΛ༻͍ͯಛ௃ྔͷߩݙ౓ΛՄࢹԽ͠ɺ࣍ճֶश࣌ͷצॴΛݟ͚ͭΔ

Slide 54

Slide 54 text

ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ SVOQZΛ࣮ߦ͢Δ͜ͱͰɺֶशɾਪ࿦ɾ4VCNJUϑΝΠϧΛ࡞੒ # ಛ௃ྔͷࢦఆ features = [ "age_mis_val_median", "family__size", "cabin", "fare_mis_val_median" ] run_name = 'lgb_1102' # ࢖༻͢Δಛ௃ྔϦετͷอଘ with open(LOG_DIR_NAME + run_name + "_features.txt", 'wt') as f: for ele in features: f.write(ele+'\n') params_lgb = { 'boosting_type': 'gbdt', 'objective': 'binary', 'early_stopping_rounds': 20, 'verbose': 10, 'random_state': 99, 'num_round': 100 } # ࢖༻͢Δύϥϝʔλͷอଘ with open(LOG_DIR_NAME + run_name + "_param.txt", 'wt') as f: for key,value in sorted(params_lgb.items()): f.write(f'{key}:{value}\n') runner = Runner(run_name, ModelLGB, features, params_lgb, n_fold, name_prefix) runner.run_train_cv() # ֶश runner.run_predict_cv() # ਪ࿦ Submission.create_submission(run_name) # submit࡞੒

Slide 55

Slide 55 text

ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ SVOQZΛ࣮ߦ͢Δ͜ͱͰɺֶशɾਪ࿦ɾ4VCNJUϑΝΠϧΛ࡞੒ # ಛ௃ྔͷࢦఆ features = [ "age_mis_val_median", "family__size", "cabin", "fare_mis_val_median" ] run_name = 'lgb_1102' # ࢖༻͢Δಛ௃ྔϦετͷอଘ with open(LOG_DIR_NAME + run_name + "_features.txt", 'wt') as f: for ele in features: f.write(ele+'\n') params_lgb = { 'boosting_type': 'gbdt', 'objective': 'binary', 'early_stopping_rounds': 20, 'verbose': 10, 'random_state': 99, 'num_round': 100 } # ࢖༻͢Δύϥϝʔλͷอଘ with open(LOG_DIR_NAME + run_name + "_param.txt", 'wt') as f: for key,value in sorted(params_lgb.items()): f.write(f'{key}:{value}\n') runner = Runner(run_name, ModelLGB, features, params_lgb, n_fold, name_prefix) runner.run_train_cv() # ֶश runner.run_predict_cv() # ਪ࿦ Submission.create_submission(run_name) # submit࡞੒ ͜ͷSVO@OBNFΛQSFpYͱͯ͠ɺϑΝΠϧ΍ϞσϧΛอଘͯ͘͠ΕΔɻ ྫ w ࢖༻ͨ͠ಛ௃ྔϦετ w ࢖༻ͨ͠ϋΠύʔύϥϝʔλ w GPMEຖͷϞσϧ w ਪ࿦݁Ռ w TVCNJUϑΝΠϧ w TIBQͷܭࢉ݁ՌΠϝʔδϑΝΠϧͳͲ

Slide 56

Slide 56 text

ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ SVOQZΛ࣮ߦ͢Δ͜ͱͰɺֶशɾਪ࿦ɾ4VCNJUϑΝΠϧΛ࡞੒ # ಛ௃ྔͷࢦఆ features = [ "age_mis_val_median", "family__size", "cabin", "fare_mis_val_median" ] run_name = 'lgb_1102' # ࢖༻͢Δಛ௃ྔϦετͷอଘ with open(LOG_DIR_NAME + run_name + "_features.txt", 'wt') as f: for ele in features: f.write(ele+'\n') params_lgb = { 'boosting_type': 'gbdt', 'objective': 'binary', 'early_stopping_rounds': 20, 'verbose': 10, 'random_state': 99, 'num_round': 100 } # ࢖༻͢Δύϥϝʔλͷอଘ with open(LOG_DIR_NAME + run_name + "_param.txt", 'wt') as f: for key,value in sorted(params_lgb.items()): f.write(f'{key}:{value}\n') runner = Runner(run_name, ModelLGB, features, params_lgb, n_fold, name_prefix) runner.run_train_cv() # ֶश runner.run_predict_cv() # ਪ࿦ Submission.create_submission(run_name) # submit࡞੒ ੜ੒͞ΕΔϑΝΠϧྫ

Slide 57

Slide 57 text

ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ੜ੒͞ΕΔϑΝΠϧͷྫʢϑΥϧμ͸దٓ෼͚͍ͯ·͢ʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@QSFEQLMʢUFTUσʔλͰͷਪ࿦݁Ռʣ w MHC@@@TVCNJTTJPODTWʢਪ࿦݁ՌΛLBHHMFʹఏग़Ͱ͖ΔDTWʹม׵ͨ͠΋ͷʣ w MHC@@@GFBUVSFTUYUʢࠓճͷֶशʹ࢖༻ͨ͠ಛ௃ྔϦετʣ w MHC@@@QBSBNUYUʢࠓճͷֶशʹ࢖༻ͨ͠ϋΠύʔύϥϝʔλʣ w MHC@@@TIBQQOHʢTIBQͰܭࢉͨ͠ՄࢹԽΠϝʔδʣ w HFOFSBMMPHʢܭࢉϩάϑΝΠϧʣ w SFTVMUMPHʢϞσϧͷείΞ͚͕ͩهࡌ͞ΕͨϩάϑΝΠϧʣ

Slide 58

Slide 58 text

ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ੜ੒͞ΕΔϑΝΠϧͷྫʢϑΥϧμ͸దٓ෼͚͍ͯ·͢ʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@QSFEQLMʢUFTUσʔλͰͷਪ࿦݁Ռʣ w MHC@@@TVCNJTTJPODTWʢਪ࿦݁ՌΛLBHHMFʹఏग़Ͱ͖ΔDTWʹม׵ͨ͠΋ͷʣ w MHC@@@GFBUVSFTUYUʢࠓճͷֶशʹ࢖༻ͨ͠ಛ௃ྔϦετʣ w MHC@@@QBSBNUYUʢࠓճͷֶशʹ࢖༻ͨ͠ϋΠύʔύϥϝʔλʣ w MHC@@@TIBQQOHʢTIBQͰܭࢉͨ͠ՄࢹԽΠϝʔδʣ w HFOFSBMMPHʢܭࢉϩάϑΝΠϧʣ w SFTVMUMPHʢϞσϧͷείΞ͚͕ͩهࡌ͞ΕͨϩάϑΝΠϧʣ

Slide 59

Slide 59 text

ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ੜ੒͞ΕΔϑΝΠϧͷྫʢϑΥϧμ͸దٓ෼͚͍ͯ·͢ʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@QSFEQLMʢUFTUσʔλͰͷਪ࿦݁Ռʣ w MHC@@@TVCNJTTJPODTWʢਪ࿦݁ՌΛLBHHMFʹఏग़Ͱ͖ΔDTWʹม׵ͨ͠΋ͷʣ w MHC@@@GFBUVSFTUYUʢࠓճͷֶशʹ࢖༻ͨ͠ಛ௃ྔϦετʣ w MHC@@@QBSBNUYUʢࠓճͷֶशʹ࢖༻ͨ͠ϋΠύʔύϥϝʔλʣ w MHC@@@TIBQQOHʢTIBQͰܭࢉͨ͠ՄࢹԽΠϝʔδʣ w HFOFSBMMPHʢܭࢉϩάϑΝΠϧʣ w SFTVMUMPHʢϞσϧͷείΞ͚͕ͩهࡌ͞ΕͨϩάϑΝΠϧʣ

Slide 60

Slide 60 text

ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ੜ੒͞ΕΔϑΝΠϧͷྫʢϑΥϧμ͸దٓ෼͚͍ͯ·͢ʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@QSFEQLMʢUFTUσʔλͰͷਪ࿦݁Ռʣ w MHC@@@TVCNJTTJPODTWʢਪ࿦݁ՌΛLBHHMFʹఏग़Ͱ͖ΔDTWʹม׵ͨ͠΋ͷʣ w MHC@@@GFBUVSFTUYUʢࠓճͷֶशʹ࢖༻ͨ͠ಛ௃ྔϦετʣ w MHC@@@QBSBNUYUʢࠓճͷֶशʹ࢖༻ͨ͠ϋΠύʔύϥϝʔλʣ w MHC@@@TIBQQOHʢTIBQͰܭࢉͨ͠ՄࢹԽΠϝʔδʣ w HFOFSBMMPHʢܭࢉϩάϑΝΠϧʣ w SFTVMUMPHʢϞσϧͷείΞ͚͕ͩهࡌ͞ΕͨϩάϑΝΠϧʣ

Slide 61

Slide 61 text

ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ੜ੒͞ΕΔϑΝΠϧͷྫʢϑΥϧμ͸దٓ෼͚͍ͯ·͢ʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@QSFEQLMʢUFTUσʔλͰͷਪ࿦݁Ռʣ w MHC@@@TVCNJTTJPODTWʢਪ࿦݁ՌΛLBHHMFʹఏग़Ͱ͖ΔDTWʹม׵ͨ͠΋ͷʣ w MHC@@@GFBUVSFTUYUʢࠓճͷֶशʹ࢖༻ͨ͠ಛ௃ྔϦετʣ w MHC@@@QBSBNUYUʢࠓճͷֶशʹ࢖༻ͨ͠ϋΠύʔύϥϝʔλʣ w MHC@@@TIBQQOHʢTIBQͰܭࢉͨ͠ՄࢹԽΠϝʔδʣ w HFOFSBMMPHʢܭࢉϩάϑΝΠϧʣ w SFTVMUMPHʢϞσϧͷείΞ͚͕ͩهࡌ͞ΕͨϩάϑΝΠϧʣ

Slide 62

Slide 62 text

ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ੜ੒͞ΕΔϑΝΠϧͷྫʢϑΥϧμ͸దٓ෼͚͍ͯ·͢ʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@QSFEQLMʢUFTUσʔλͰͷਪ࿦݁Ռʣ w MHC@@@TVCNJTTJPODTWʢਪ࿦݁ՌΛLBHHMFʹఏग़Ͱ͖ΔDTWʹม׵ͨ͠΋ͷʣ w MHC@@@GFBUVSFTUYUʢࠓճͷֶशʹ࢖༻ͨ͠ಛ௃ྔϦετʣ w MHC@@@QBSBNUYUʢࠓճͷֶशʹ࢖༻ͨ͠ϋΠύʔύϥϝʔλʣ w MHC@@@TIBQQOHʢTIBQͰܭࢉͨ͠ՄࢹԽΠϝʔδʣ w HFOFSBMMPHʢܭࢉϩάϑΝΠϧʣ w SFTVMUMPHʢϞσϧͷείΞ͚͕ͩهࡌ͞ΕͨϩάϑΝΠϧʣ Կ͕خ͔͔ͬͨ͠

Slide 63

Slide 63 text

ϚϚͷҰาΛࢧ͑Δ w ʮ͜ͷಛ௃ྔʯͱʮ͜ͷύϥϝʔλʯΛ࢖ֶͬͯशͤͨ͞Ϟσϧ ʹؔͯ͠ɺʮ֤λεΫʹཁͨ࣌ؒ͠ʯͱʮ֤GPMEʴ࠷ऴతͳεί ΞʯΛҙࣝ͠ͳͯ͘΋؅ཧͰ͖ΔΑ͏ʹɻ w TIBQͷܭࢉ݁Ռ΍GFBUVSFJNQPSUBODFΛग़ྗ͓ͯ͘͜͠ͱ Ͱɺ࣍ͷֶश࣌ͷצॴ͕௫ΊΔΑ͏ʹɻ ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ

Slide 64

Slide 64 text

·ͱΊ ϚϚͷҰาΛࢧ͑Δ

Slide 65

Slide 65 text

·ͱΊ ϚϚͷҰาΛࢧ͑Δ w ಛ௃ྔ؅ཧ͍͍ͧʂ ͭͷεΫϦϓτϑΝΠϧʹಛ௃ྔੜ੒Λ·ͱΊΔ͜ͱͰɺಉ͡ܭࢉΛෳ਺ճ࣮ߦ͢Δ͜ͱ ΛճආͰ͖Δʂ ಛ௃ྔͷϝϞΛಉ࣌ʹੜ੒͢Δ͜ͱͰʮ͜ͷಛ௃ྔͳΜ͚ͩͬʁʯͱ಄Λ࢖͏ճ਺͕ݮ Δʂ ಛ௃ྔΛྻ͝ͱʹ؅ཧ͢Δ͜ͱͰऔΓճָ͕͠ʹͳͬͨʂʢ͕ɺಛ௃ྔ͕๲େͳ৔߹͸͋ Δఔ౓ͷ·ͱ·ΓͰ؅ཧͨ͠ํ͕ྑ͍͔΋ʣ w ύΠϓϥΠϯ͍͍ͧʂ ύΠϓϥΠϯΛߏங͢Δ͜ͱͰɺߴ଎ͳ1%$"Λ࣮ݱʂ ֶशʹ࢖༻ͨ͠ಛ௃ྔͱύϥϝʔλΛ؅ཧ͢Δ͜ͱͰɺ࠶ݱੑ΋୲อ͞Ε৺ཧత҆શੑ΋

Slide 66

Slide 66 text

ϚϚͷҰาΛࢧ͑Δ ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠