EDM-2017: Few hundred parameters outperform few hundred thousand thousand?

Few hundred parameters outperform few hundred thousand? Amar Lalwani, Sweety
Agrawal

Our Study: Goal • Knowledge Tracing • BKT (Bayesian Knowledge
Tracing) • Extensions of BKT (Khajah et al., 2016) • PFA (Performance Factor Analysis) • DKT (Deep Knowledge Tracing) • Funtoot data • LG (Learning Gap) as skill

Funtoot: Ontology LG 1 LG 2 LG 3 Rules of
Congruency Applications of Congruency contains contains Math Triangles Congruency contains contains depends on induces Subject Concept Sub-concept Sub-sub- concept Learning Gaps Image source: From paper “Few hundred parameters outperform few hundred thousand?”

Sub-sub-concept Difficulty Level 2 Difficulty Level 1 Difficulty Level 3
Difficulty Level 4 Difficulty Level 5 Most Difficult Least Difficult

LG (Learning Gap) as a skill • What made student
to take an unsuccessful attempt • Possible reason/explanation behind wrong answer • A misunderstanding of a concept • Lack of knowledge about a concept • Each incorrect pattern/response is tagged with one or more LGs • Need to know all possible incorrect patterns/responses

LG: committance and avoidance • 1: Avoidance • 0: Committance
• Consider a question with 3 LGs Attempt No. LG1 LG2 LG3 Status 1 0 1 1 Failure 2 0 1 1 Failure 3 0 0 1 Failure 4 1 1 1 Success Overall Outcome 0 0 1

Dataset • 6th Grade Math CBSE Curriculum • 22 topics,
69 sub-topics, 119 sub-sub-topics • 442 LGs, 1523 problems • 7780 students, 176 schools • 2.4 million problem attempts • 5.6 million data-points • 76% avoidances (positive class:1)

Data Distribution Image source: From paper “Few hundred parameters outperform
few hundred thousand?”

Knowledge Tracing Models • BKT • BKT • BKT+F (Forgetting)
• BKT+A (Abilities) • BKT+S (Skill Discovery) • BKT+FA • BKT+FSA • DKT • DKT • Multi-Skill DKT • PFA

Hypothetical Example • Student Alice is working on funtoot •
Consider LGs: A,B,C TimeStamp Questi on A B C T1 Q1 1 0 0 T2 (T2>T1) Q2 N.A. 0 1

BKT Skill Response Series A 1 B 0, 0 C
0, 1 TimeStamp Questi on A B C T1 Q1 1 0 0 T2 (T2>T1) Q2 N.A. 0 1

PFA Skill # Failures # Successes Response A 0 0
1 B 0 0 0 C 0 0 0 B 1 0 0 C 1 0 1 TimeStamp Questi on A B C T1 Q1 1 0 0 T2 (T2>T1) Q2 N.A. 0 1

DKT Serial No. Question Input Skill Response Output 1 Q1
0, 0, 0, 0, 0, 0 A 1 1, X, X 2 Q1 1, 1, 0, 0, 0, 0 B 0 X, 0, X 3 Q1 0, 0, 1, 0, 0, 0 C 0 X, X, 0 4 Q2 0, 0, 0, 0, 1, 0 B 0 X, 0, X 5 Q2 0, 0, 1, 0, 0, 0 C 1 X, X, 1 TimeStamp Questi on A B C T1 Q1 1 0 0 T2 (T2>T1) Q2 N.A. 0 1

DKT: skills randomly shuffled Serial No. Question Input Skill Response
Output 1 Q1 0, 0, 0, 0, 0, 0 B 0 X, 0, X 2 Q1 0, 0, 1, 0, 0, 0 A 1 1, X, X 3 Q1 1, 1, 0, 0, 0, 0 C 0 X, X, 0 4 Q2 0, 0, 0, 0, 1, 0 C 1 X, X, 1 5 Q2 0, 0, 0, 0, 1, 1 B 0 X, 0, X TimeStamp Questi on A B C T1 Q1 1 0 0 T2 (T2>T1) Q2 N.A. 0 1

Multi-Skill DKT Serial No. Input Output 1 0, 0, 0,
0, 0, 0 1, 0, 0 2 1, 1, 1, 0, 1, 0 X, 0, 1 TimeStamp Questi on A B C T1 Q1 1 0 0 T2 (T2>T1) Q2 N.A. 0 1

Results Image source: From paper “Few hundred parameters outperform few
hundred thousand?”

AUC over all data-points • Variance in performance among algorithms
is very less • PFA & DKT perform equally well • Multi-Skill DKT lags behind DKT (0.03 AUC units) • All variants of BKT lag behind DKT/PFA (0.03-0.05 AUC units) • BKT+FSA & Multi-Skill DKT perform equally well

AUC averaged over skills • The variance in the performance
among the algorithms is high • PFA (0.88 AUC) performs the best • Gain of 17.3 % over DKT (0.75 AUC) • Gain of 35.3 % over BKT (0.65 AUC) • Multi-Skill DKT lags behind DKT by 0.04 AUC units • DKT & BKT+FSA perform equally well • BKT+F performs the worst with 0.64 AUC

AUC averaged over skills • Forgetting adds no value to
BKT • BKT: 0.65 AUC, BKT+F: 0.64 AUC • BKT+A: 0.68 AUC, BT+FA: 0.67 AUC • Skill Discovery provides reasonable gains • BKT+S achieved 9 % gain over BKT • BKT+FSA achieved 12 % gain over BKT+FA • 145-175 skills discovered against 442 tagged skills • Adding Abilities saw very small gains of 0.03 AUC units • (BTK, BKT+A), (BKT+F, BKT+FA) • BKT+FSA performed best with 15% gain over BKT

Conclusion • DKT outperforms BKT • BKT Extensions comparable to
DKT • PFA outperforms DKT • Knowledge Tracing is shallow

Model Parameters • DKT: few hundred thousands • Time Series
Data: noisy • PFA: 3 x # skills • Coefficients for difficulty, # prior successes, # prior failures • Abstract, simple features • BKT: 4 x # skills • pInit, pLearn, pGuess, pSlip • Parameters: DKT >> PFA • Performance: PFA > DKT

Future Work • 442 skills, 119 sub-sub-topics • Skills Discovered:
145-175 • Explore DKT for skill discovery • Usage of secondary features • Attempts • Time durations • Hints • Item context and hierarchy

Questions??

EDM-2017: Few hundred parameters outperform few...

EDM-2017: Few hundred parameters outperform few hundred thousand thousand?

Amar

More Decks by Amar

Other Decks in Technology

Featured

Transcript

Few hundred parameters outperform few hundred thousand? Amar Lalwani, Sweety

Our Study: Goal • Knowledge Tracing • BKT (Bayesian Knowledge

Funtoot: Ontology LG 1 LG 2 LG 3 Rules of

Sub-sub-concept Difficulty Level 2 Difficulty Level 1 Difficulty Level 3

LG (Learning Gap) as a skill • What made student

LG: committance and avoidance • 1: Avoidance • 0: Committance

Dataset • 6th Grade Math CBSE Curriculum • 22 topics,

Data Distribution Image source: From paper “Few hundred parameters outperform

Knowledge Tracing Models • BKT • BKT • BKT+F (Forgetting)

Hypothetical Example • Student Alice is working on funtoot •

BKT Skill Response Series A 1 B 0, 0 C

PFA Skill # Failures # Successes Response A 0 0

DKT Serial No. Question Input Skill Response Output 1 Q1

DKT: skills randomly shuffled Serial No. Question Input Skill Response

Multi-Skill DKT Serial No. Input Output 1 0, 0, 0,

Results Image source: From paper “Few hundred parameters outperform few

AUC over all data-points • Variance in performance among algorithms

AUC averaged over skills • The variance in the performance

AUC averaged over skills • Forgetting adds no value to

Conclusion • DKT outperforms BKT • BKT Extensions comparable to

Model Parameters • DKT: few hundred thousands • Time Series

Future Work • 442 skills, 119 sub-sub-topics • Skills Discovered:

Questions??