Slide 1

Slide 1 text

Improving Deep Knowledge Tracing: pre-training and encoder-decoder architecture —Towards analyzation of hidden vector representation— ஜ೾େֶ৘ใֶ܈ɹ஌ࣝ৘ใɾਤॻֶؗྨ 201511548ɹᖊळ࣮

Slide 2

Slide 2 text

എܠ

Slide 3

Slide 3 text

എܠ • e-LearningγεςϜʹ͓͍ͯɼֶशϩάΛ༻͍ͯɼֶश ऀʹΑΓద੾ͳ໰୊Λఏ͍ࣔͨ͠ɽ 3

Slide 4

Slide 4 text

Knowledge Tracing [1] • ֶशऀͷ࣌ؒܦաΛ൐ͬͨ஌ࣝঢ়ଶΛϞσϦϯά͢Δ λεΫɽ কདྷֶशऀ͕ͲͷΑ͏ʹԠ౴͢Δͷ͔Λ༧ଌ͢Δɽ 4

Slide 5

Slide 5 text

σʔληοτɿAssistments 2009-2010 corrected • ੜె਺ɹ 4,417 ໊ • ໰୊਺ɹ 328,291 ໰ • εΩϧɹ 124 छྨ • εΩϧ13(ฏํࠜͷཧղ), εΩϧ41(ฏํࠜͱ੔਺ͷൺֱ), εΩϧ 4673(׬શฏํͷཧղ)ͷΑ͏ʹɼཁٻ͢ΔεΩϧಉ࢜ʹ͸ؔ࿈ ͕ڧ͍΋ͷ͔Βऑ͍΋ͷ·Ͱ͋Δɽ 5

Slide 6

Slide 6 text

ઌߦݚڀ

Slide 7

Slide 7 text

Deep Knowledge Tracing [2] (1) • LSTMΛKTʹద༻ͨ͠DKTϞσϧ(Piech et al. 2016) 7

Slide 8

Slide 8 text

Deep Knowledge Tracing [2] (2) • ೖྗ ͸ճ౴ͨ͠εΩϧID ͱɼճ౴ͷਖ਼ ޡ݁Ռ ͷ৘ใΛ࣋ͭɽ ͸εΩϧ਺ɽ • ࣌఺ʹճ౴͢ΔεΩϧʹର͢Δਖ਼ޡ༧ଌ ͱɼ࣮ࡍͷਖ਼ޡ݁Ռ ͱͷLossɽ xt qt ∈ {0,…, m} at ∈ {0,1} m t + 1 ˜ y⊤δm (qt+1) at+1 ℒ = ∑ t ℓ (˜ y⊤δm (qt+1), at+1) 8

Slide 9

Slide 9 text

ભҠৼಈ໰୊ʢwavy transition problemʣ[3] • DKTͰ͸ɼλΠϜεςοϓؒͰ༧ଌ஋্͕Լͯ͠͠· ͏ɽ • ֶྗ͸ঃʑʹมԽ͢Δͱߟ͑Δͷ͕ࣗવ • ༧ଌ஋ ͱ ͱͷ͔ࠩΒL1ϊϧϜɼL2ϊϧϜͷධՁࢦ ඪ ʢYeung et al. 2018ʣ yt+1 yt w1 , w2 9 , w1 = ∑n i=1 ∑Ti −1 t=1 yi t+1 − yi t 1 M∑n i=1 (Ti − 1) w2 2 = ∑n i=1 ∑Ti −1 t=1 yi t+1 − yi t 2 2 M∑n i=1 (Ti − 1) ℒ′ = ℒ + λw1 w1 + λw2 w2 2

Slide 10

Slide 10 text

ਖ਼౴཰࠶ݱ໰୊ 1. ༧ଌ஋ ͕ඞͣ͠΋ೖྗγʔέϯεͷਖ਼ղ཰Λ൓ө͠ͳ͍ɽ 2. μϛʔͷ࿈ଓਖ਼౴/࿈ଓޡ౴σʔλΛͦΕͧΕ༩͑ͨ࣌ɼલऀʹΑΔ ༧ଌ஋ ͕ޙऀͷ΋ͷ ΛԼճΔ৔߹( )͕͋Δ →࿈ଓਖ਼౴ͨ͠ͷʹֶྗͷݟੵΓ͕௿Լ͢Δͱ͍͏௚ײʹ൓͢Δ݁Ռ ˜ y ̂ y′ cor ̂ y′ wro s = ̂ y′ cor − ̂ y′ wro < 0 10 ਤ:ॎ࣠͸༧ଌ஋ɼԣ࣠͸μϛʔೖྗʹؚ·ΕΔਖ਼ղͷݸ਺

Slide 11

Slide 11 text

ఏҊख๏

Slide 12

Slide 12 text

Knowledge State Vector loss • ༧ଌͱεΩϧͷొ৔ස౓ͷΞμϚʔϧੵ ͱɼεΩϧͷਖ਼ղස౓ ͱͷLossΛܭࢉɽ • ਖ਼ޡͷ2஋෼ྨͷͨΊʹ༧ଌਖ਼ղ֬཰ΛٻΊΔʢطଘʣ ͷͰ͸ͳ͘ɼೖྗʹొ৔͢ΔεΩϧͷਖ਼ղ཰΁ͷճؼ ໰୊ʹ͢Δɽ ㅟ ㅟ ㅟ ㅟ ㅟ Lksv = T ∑ t=1 ℓ ( ˜ yt ∘ t+1 ∑ s=2 δm (qs), t+1 ∑ s=2 as δm (qs) ) ℒ′ = ℒ + λksv Lksv ˜ yt ∘ ∑ s δm (qs) ∑ s as δm (qs) 12

Slide 13

Slide 13 text

pre-training • ࿈ଓਖ਼౴ɾ࿈ଓޡ౴ͷμϛʔσʔλΛ࡞੒͠ɼ ਖ਼౴ˠਖ਼౴ / ޡ౴ˠޡ౴ ͷؔ܎Λֶशͤ͞Δɽ • ࣮σʔλͰֶशΛߦ͏લʹ͜ͷؔ܎Λֶश͢Δ͜ͱͰɼ ਖ਼౴ˠޡ౴ / ޡ౴ˠਖ਼౴ ͷہॴղʹቕΔ͜ͱΛ๷͙ ′ wro = {(qi ,0), …, (qi ,0)}, y′ wro = (qi ,0) ′ cor = {(qi ,1), …, (qi ,1)}, y′ cor = (qi ,1) 13

Slide 14

Slide 14 text

Encoder-Decoder DKT (EDDKT) • encoderͰೖྗͷಛ௃Λந৅Խͨ͠ Λ֫ಘͯ͠decoderʹड͚౉͢͜ ͱͰɼϥϯμϜͳॳظ஋ͷ ͱൺ΂ͯߴ͍൚ԽੑೳΛ֫ಘͰ͖Δͱظ଴ ͞ΕΔɽ • ੜ੒Ϟσϧͱͯ͠decoderΛ܇࿅ՄೳͰɼֶशܭըͷੜ੒ͳͲ΁ͷԠ༻ ͕ߟ͑ΒΕΔɽ hk h0 14

Slide 15

Slide 15 text

࣮ݧ݁Ռ

Slide 16

Slide 16 text

࣮ݧ݁ՌҰཡ 16 දɿ࣮ݧ݁ՌൺֱදɽϕʔεϥΠϯΑΓվળͨ݁͠ՌΛଠࣈͱͨ͠ɽ

Slide 17

Slide 17 text

Knowledge State Vector loss • ͰKS Vector lossΛݮগͤ͞ΔΑ͏ʹ܇࿅͢Δ͜ ͱ͕Ͱ͖ͨɽ͞ΒʹɼAUC, , ͷ݁Ռ΋վળͨ͠ɽ • EDDKTͰ͸AUCΛϕʔεϥΠϯΑΓ্͛ͭͭɼ ʹґΒ ͳͯ͘΋KS Vector loss͕௿͘ͳͬͨɽ λksv = 0.5 w1 w2 λksv 17 ਤɿDKT (baseline)ͱDKT with ͷൺֱ λksv = 0.5

Slide 18

Slide 18 text

pre-training • ͱͳΔεΩϧΛ12Ҏ্ˠ5ͭʹݮΒ ͢͜ͱ͕Ͱ͖ͨɽ • ͕ϕʔεϥΠϯΑΓ΍΍௿͘ͳͬͨɽ s = ̂ y′ cor − ̂ y′ wro < 0 w1 , w2 18 ਤ:ॎ࣠͸༧ଌ஋ɼԣ࣠͸μϛʔೖྗʹؚ·ΕΔਖ਼ղͷݸ਺

Slide 19

Slide 19 text

EDDKT • ௨ৗͷϞσϧͱൺ΂ͯɼTeacher forcingΛߦ͏ͨΊ্ ԼʹϒϨ͕ੜ͡Δɽ • είΞΛେ͖͘ଛͳΘͣʹੜ੒Ϟσϧͱͯ͠܇࿅͢Δ͜ ͱ͕Ͱ͖ͨɽ 19 ਤɿੜ੒Ϟσϧͷֶशۂઢɽ

Slide 20

Slide 20 text

·ͱΊ

Slide 21

Slide 21 text

·ͱΊ • ਖ਼ղ཰࠶ݱ໰୊Λࢦఠͨ͠ɽ • pre-trainingʹΑΓਖ਼ղ཰࠶ݱ໰୊Λվળͨ͠ɽ • ਖ਼ղ཰Λtargetͱֶͯ͠श͢ΔKS Vector lossΛఏҊ͠ ͨɽ • EDDKTΛఏҊ͠ɼੜ੒Ϟσϧͱֶͯ͠शՄೳͰ͋Δ͜ ͱΛࣔͨ͠ɽ 21

Slide 22

Slide 22 text

ࢀߟจݙ • [1] Corbett, A. T. and Anderson, J. R.: Knowledge tracing: Modeling the acquisition of procedural knowledge, User Modeling and User- adapted Interaction, Vol. 4, No. 4, pp. 253–278 (1994). • [2] Piech, C., Bassen, J., Huang, J., Ganguli, S., Sahami, M., Guibas, L. J. and Sohl-Dickstein, J.: Deep knowledge tracing, Advances in Neural Information Processing Systems, pp. 505–513 (2015). • [3] Yeung, C.-K. and Yeung, D.-Y.: Addressing two problems in deep knowledge tracing via prediction-consistent regularization, arXiv preprint arXiv:1806.02180 (2018). 22