kaggle Eedi solution - Speaker Deck

Slide 1

Slide 1 text

Eedi - Mining Misconceptions in Mathematics Copyright 2024 @kuto_bopro Private LB: 0.456 (164th) Stage1 - Retrieval Part (CV:0.464 / public LB: 0.464 / Private LB: 0.440) Use the LLM as an embedding model to retrieve misconception candidates based on cosine similarity between the query embeddings and the misconception embeddings generated by the LLM.The data is treated as individual Question-Incorrect Answer pairs. Stage2 - Reranking Part (CV:0.495 / Public LB: 0.521 / Private LB: 0.456 ) Use the LLM as a reranker by inputting a prompt containing misconception candidates retrieved in Stage 1 and outputting the most likely misconception. Instead of providing all 25 retrieved misconceptions at once, we divided them into smaller batches (n=9) and ran the inference multiple times. MultipleNegative RankingLoss triplet text input LoRA params: r=64 alpha=128 dropout=0.05 Quantize params: bnb 4bit train params: epoch=10 lr=1e-4 batch_size=64 positive (Misconception) Misconception Name CV Strategy (Common to both Stage 1 and Stage 2) 9 1 embeddings The test data contains unseen misconceptions missing from the training data, making it difficult to establish a correlation between CV and LB using simple GroupKFold. I adjusted the split to achieve an unseen misconception rate of ~0.6. ② Split each dataset using a different method 9 => groupkfold(n_splits=5) by MisconceptionId 1 => stratifiedkfold(n_splits=5) by MisconceptionId ① Split the train data randomly into a 9:1 ratio train valid ③ Create 5-fold data (using only as fold-out) prompt Query = Construct + Subject + Question + Correct Answer + InCorrect Answer text Qwen/Qwen2.5-32B-Instruct LoRA Misconception Qwen/Qwen2.5-32B-Instruct-AWQ LoRA … Retrieve the top 25 misconceptions for a single query (Question-Answer). Query Query Misconception Q i M i,1 M i,25 M i,2 LoRA Fine tuning using next token prediciton LoRA params: r=8 alpha=16 dropout=0.05 train params: epoch=2 lr=2e-5 batch_size=16 in-batch negative(bs=64) hard negative (size=1) If inbatch-negative contains positives, mask to exclude loss calculation. anchor (Query) negative (Misconception) embeddings Retrieval model training Retrieve 100 negative misconceptions with high similarity to the anchor using the HuggingFace pretrained model. Prepare anchor, positive, and negative triplet datasets. Train the model with contrastive learning using QLoRA. cosine similarity Query Retrieval Misconception task description You are a Mathematics teacher. Your task is to reason and identify the misconception behind the Incorrect Answer with the Question.Answer concisely what misconception it is to lead to getting the incorrect answer. Pick the correct misconception number from the below: use vLLM to accelerate inference. top_p: 0.99 temperature: 0 max_tokens: 1 (LLM output misconception index) use MultipleChoiceLogitsProcessor in NVIDIA/logits-processor-zoo