Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Knowledge Inference

8ee9106f551806f5ecea96b9221e970e?s=47 Amar
June 13, 2016

Knowledge Inference

This presentation is about techniques to model students' changing knowledge state during the process of skill (knowledge) acquisition, also known as Knowledge Tracing. This model enables the system to maintain the estimate of the probability that the student has mastered the concepts. Based on these probability estimates, the system individualizes the learning path and provides assistance as necessary.

8ee9106f551806f5ecea96b9221e970e?s=128

Amar

June 13, 2016
Tweet

More Decks by Amar

Other Decks in Technology

Transcript

  1. KNOWLEDGE INFERENCE Amar Lalwani

  2. Goal Of Knowledge Inference • Measuring what a student knows

    at a specific time • Measuring what relevant knowledge components a student knows at a specific time
  3. Knowledge Component • Anything a student can know that is

    meaningful to the current learning situation • Skill • Fact • Concept • Principle • Schema
  4. Knowledge Inference • Knowledge is latent • Not directly measurable

  5. Why measure student knowledge? • Primary goal of education •

    Enhancing student knowledge • Measure efficacy of the system • Report to the stakeholders, instructors • Make automated pedagogical decisions
  6. Different than measuring performance • Inferring if a student’s performance

    right now is associated with successfully demonstrating a skill • Not the same as knowing whether the student has the latent skill • Guessing • Slipping (carelessness, lack of concentration)
  7. How do we get at latent knowledge? • Can’t measure

    it directly • Can’t look directly into the brain, yet! • But, can look at the performance • Performance over time • More information than performance at one specific instant
  8. Bayesian Knowledge Tracing • The classical approach for measuring tightly

    defined skill in online learning • Based on the idea that practice on a skill leads to mastery of that skill • Goal: Track student knowledge over time • Measuring how well a student knows a specific skill/knowledge component at a specific time • Based on their past history of performance with that skill/KC
  9. Tightly defined skills • Unlike Item Response Theory • The

    goal is not to measure overall skill for a broadly- defined construct • Such as arithmetic • But to measure a specific skill or knowledge component • Such as addition of two-digit numbers where no carrying is needed
  10. Typical use of BKT • Assess a student’s knowledge of

    skill/KC X • Based on a sequence of items that are dichotomously scored • E.g. the student can get a score of 0 or 1 on each item • Where each item corresponds to a single skill • Where the student can learn on each item, due to help, feedback, scaffolding, etc.
  11. Key Assumptions • Single latent trait/skill per item • Each

    skill has four parameters • From these skill parameters and student’s historical performances, we can compute • Latent Knowledge P(Ln) • The probability P(correct) that the learner will get the item correct
  12. Key Assumptions • Two state learning model • Each skill

    is either learned/unlearned • Each problem is an opportunity for the student to apply the skill and hence learn it • Once known, the student does not forget the skill • Guess, slip
  13. BKT • For some skill K • Given student’s response

    sequence 1 to n, predict n+1 0 0 0 1 1 1 ? 1 ……… n n+1 Chronological response sequence for student Y [ 0 = Incorrect response 1 = Correct response]
  14. BKT • Track knowledge over time (model of learning) 0

    0 0 1 1 1 1
  15. BKT K K K Q Q Q P(T) P(T) meters

    ability of initial knowledge ability of learning ability of guess ability of slip esentation dge node n node e (0 or 1) te (0 or 1) P(L 0 ) P(G) P(S) Knowledge Tracing Latent Observed Node representations K = Knowledge node Q = Question node Node states K = Two state (0 or 1) Q = Two state (0 or 1)
  16. K K K Q Q Q P(T) P(T) of initial

    knowledge f learning of guess f slip on e 1) 1) P(L 0 ) P(G) P(S) Knowledge Tracing Four parameters of the KT model: P(L 0 ) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip P(L 0 ) P(T) P(T) P(G) P(G) P(G) P(S) Probability of forgetting assumed to be zero (fixed)
  17. Simple HMM Learned (know) UnLearned (Does not know) Incorrect CCorrect

    P(L0 ) 1-P(L0 ) P(T) 1-P(G) 1-P(S) P(G) P(S)
  18. BKT • Formulas for inference and prediction If (−1 )

    = −1 ∗(1− ) −1 ∗ 1− + (1− −1 )∗( ) (1) (−1 ) = −1 ∗ −1 ∗()+ (1− −1 )∗(1− ) (2) = (−1 ∗ (1 − ) + (1 − (−1 ) ∗ ()) (3)
  19. BKT • Predicting Current student correctness • Whenever the student

    has an opportunity to use the skill • The probability that the student knows the skill is updated
  20. Example

  21. Influence of Parameter Values P(L0 ): 0.50 P(T): 0.20 P(G):

    0.14 P(S): 0.09 Student reached 95% probability of knowledge After 4th opportunity Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1
  22. Influence of Parameter Values P(L0 ): 0.50 P(T): 0.20 P(G):

    0.14 P(S): 0.09 P(L0 ): 0.50 P(T): 0.20 P(G): 0.64 P(S): 0.03 Student reached 95% probability of knowledge After 8th opportunity Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1
  23. BKT • Only uses first problem attempt on each item

    • Throws out information… • But uses the clearest information…
  24. Parameter Constraints • Typically, the potential values of BKT parameters

    are constrained • To avoid model degeneracy • A knowledge model is degenerate when it violates the basic idea of BKT • When knowing a skill leads to worse performance • When getting a skill wrong means you know it
  25. Constraints Proposed • P(G) + P(S) < 1.0 • P(G)

    < 0.5, P(S) < 0.5 • P(G) < 0.3, P(S) < 0.1
  26. Knowledge Tracing • How do we know if a knowledge

    tracing model is any good? • Our primary goal is to predict knowledge • But knowledge is latent • So we instead check our knowledge predictions • by checking how well the model predicts performance
  27. Fitting the Model • EM (Expectation Maximization) Algorithm • Grid

    Search • Genetic Algorithms
  28. Performance Factor Analysis (PFA) • An alternative to BKT •

    Addresses some of the limitations of BKT • But does not have all of the nice features of BKT
  29. PFA • Measures how much latent skill a student has,

    while they are learning • But expresses it in terms of probability of correctness, the next time the skill is encountered • No direct expression of the amount of latent skill, except this probability of correctness
  30. Key Assumptions • Each item may involve multiple latent skills

    or knowledge components • Different from BKT • Each skill has success learning rate γ and failure learning rate ρ • Different from BKT where learning rate is the same, success or failure
  31. Key Assumptions • There is also a difficulty parameter β,

    but its semantics can vary • From these parameters, and the number of successes and failures the student has had on each relevant skill so far, we can compute the probability P(m) that the learner will get the item correct
  32. PFA • , ∈ , , = + ( +

    ) • P(m) = 1 1 + e−m
  33. Example

  34. Degenerate Example

  35. Negative Learning

  36. Key Points • Values of ρ below 0 don’t actually

    mean negative learning • They mean that failure provides more evidence on lack of knowledge • Than the learning opportunity causes improvement • Parameters in PFA combine information from correctness with improvement from practice improvement • Makes PFA models a little harder to interpret than BKT
  37. Adjusting 

  38. Adjusting 

  39. Adjusting 

  40.  Parameters • Three different β Parameters proposed • Item

    • Item-Type • Skill • Result in different number of parameters • And greater or lesser potential concern about over-fitting
  41. Fitting PFA • EM (Expectation Maximization) Algorithm • Vulnerable to

    local minima • Randomized restarts
  42. Item Response Theory (IRT) • Classical approach for assessments, used

    in tests • Measures how much of an overall trait a person has • Assess a student’s current knowledge of topic X
  43. Key Assumptions • There is only one latent trait or

    skill being measured per set of items • No learning is occurring in between items • E.g. a testing situation with no help or feedback • Learner has ability  • Item has difficulty b, discriminability a • Based on these, we can compute P() , that the learner will get item correct
  44. Note • The assumption that all items tap the same

    latent construct, but have different difficulties • Is a very different assumption than is seen in PFA or BKT
  45. The Rasch Model • Simplest IRT Model, 1 parameter model

    • P() = 1 1 + e−(−)
  46. Item Characteristic Curve • b=0 • When =b (knowledge=difficulty), p=0.5

  47. P(correct) increases with student skill

  48. Changing difficulty parameter • Easy (green, b=-2), Hard (Orange, b=2)

  49. Note • The good student finds the easy and medium

    items almost equally difficult • The weak student finds the medium and hard items almost equally hard • When b=θ, Performance is 50%
  50. The 2-parameter Model • Discriminability parameter, “a” added • P()

    = 1 1 + e−a(−)
  51. Different values of a • a=2 (higher discriminability) • a=0.5(lower

    discriminability)
  52. Discriminability at extremes • a=0, a approaches infinity

  53. Model Degeneracy • a below 0

  54. The 3-parameter Model • A more complex model • Adds

    a guessing parameter c • P() = • Either you guess (and get it right) • Or you don’t guess (and get it right based on knowledge) c + (1-c) 1 1 + e−a(− )
  55. 3-parameter model

  56. Fitting IRT Models • Can be done with Expectation Maximization

    • Estimate knowledge and difficulty together • Then, given item difficulty estimates, you can assess a student’s knowledge in real time
  57. Uses and Applications • IRT is used quite a bit

    in computer-adaptive testing • Not used quite so often in online learning, where student knowledge is changing as we assess it • For those situations, BKT and PFA are more popular
  58. Non KT (Knowledge Tracing) Approach • Motivation • Bayesian method

    only uses KC, opportunity count and success/failure as features. Much information is left unutilized. Another machine learning method is required • Strategy: • Engineer additional features from the dataset and use other learning algorithms to train a model
  59. Features like Features extracted from training set: • Student progress

    features – Number of data points [today, since the start of unit] – Number of correct responses out of the last [3, 5, 10] – Zscore sum for step duration, hint requests, incorrects – Skill specific version of all these features • Percent correct features – % correct of unit, section, problem and step and total for each skill and also for each student (10 features) • Student Modeling Approach features – The predicted probability of correct for the test row – The number of data points used in training the parameters – The final EM log likelihood fit of the parameters / data points
  60. Non KT Approach • Modelled as Classification Problem • ML

    algorithms like Logistic Regression, SVM, Neural Networks, Decision Trees can be used • Combining user features with skill features is very powerful classification approach • Model tracing based predictions perform formidably against pure machine learning techniques
  61. Thank You!