3/xx コンペ概要:Kaggle - LLM Science Exam 科学に基づく難しい内容に関して質問応答モデルを作成し、正しい回答を選択する 言語 AIモデル 質問: Which of the following statements accurately describes the impact of Modified Newtonian Dynamics (MOND) on the observed "missing baryonic mass" discrepancy in galaxy clusters? 推論時のフロー 選択肢:5択 A:MOND is a theory that reduces the observed missing baryonic mass in galaxy clusters by postulating the existence of a new form of matter called "fuzzy dark matter." B:MOND is a theory that increases the discrepancy between the observed missing baryonic mass in galaxy clusters and the measured velocity dispersions from a factor of around 10 to a factor of about 20. C:MOND is a theory that explains the missing baryonic mass in galaxy clusters that was previously considered dark matter by demonstrating that the mass is in the form of neutrinos and axions. D:MOND is a theory that reduces the discrepancy between the observed missing baryonic mass in galaxy clusters and the measured velocity dispersions from a factor of around 10 to a factor of about 2. E:MOND is a theory that eliminates the observed missing baryonic mass in galaxy clusters by imposing a new mathematical formulation of gravity that does not require the existence of dark matter. 回答: D これを作成し、 回答精度で競う
5/xx データについて • train.csv – a set of 200 questions with the answer column • test.csv – your task it to predict the top three most probable answers given the prompt. – NOTE: the test data you see here just a copy of the training data without the answers. The unseen re-run test set is comprised of ~4,000 different prompts. • ※回答しなきゃいけない質問は、システムで隠蔽されて見ることができません。 • ただし、、、 – 外部のオープンなデータセットに関しては利用可能!!! • Wikiの科学記事など。 配布されるQAペアは200、回答しなきゃいけない質問は4000未満!!
8/xx データの一部を見てみよう 番号 質問 0 Which of the following statements accurately describes the impact of Modified Newtonian Dynamics (MOND) on the observed "missing baryonic mass" discrepancy in galaxy clusters? 1 Which of the following is an accurate definition of dynamic scaling in self-similar systems? 2 Which of the following statements accurately describes the origin and significance of the triskeles symbol? 3 What is the significance of regularization in terms of renormalization problems in physics? 4 Which of the following statements accurately describes the relationship between the dimensions of a diffracting object and the angular spacing of features in the diffraction pattern? 5 Which of the following statements accurately depicts the relationship between Gauss's law, electric flux, electric field, and symmetry in electric fields? 6 Which of the following statements accurately describes the dimension of an object in a CW complex? 7 Which of the following statements accurately describes the blocking temperature of an antiferromagnetic layer in a spin valve? 8 What is the term used in astrophysics to describe light-matter interactions resulting in energy shifts in the radiation field? ほうほう。基本的に5W1Hで質問されてそうですね。
22/xx 中期にBest Scoreになった公開ノートブック Data source Retrieve method sBert L6-v2 sBert L6-v2 Wikipedia 6.5M data 5pages 20sentences 5pages 20sentences Retrieve data Model Deberta-v3 large Deberta-v3 large average Final submit Ensemble 50% 50% Wikipedia 6.5M data 概要:BERTの学習をする際に、wikipediaの関連文書を探索し、同時に入力する方法 このノートブックでリーダーボードの数字は大きく変化 • 推論フロー • wikiperiaデータを参照し、sBERTを用いて特徴量化して関連文書を検索 • 2つのモデルで推論しアンサンブル
24/xx 補足:公開ノートブックの処理フロー詳細 Wiki data abst取り出し Wiki abst embedding Sentence Transformers Page search Test data Prompt Sentence Transformers Pageの中身 (5page分) Text embedding Sentence Transformers text search tex20行分 取り出し C: Multiple-choice Transfomer ページの中身の文章を 特徴量化 5ページの中身から 関連する文章を検索 Wikidataから関連する ページを検索 Test data Prompt + answers Sentence Transformers
25/xx 戦い方の検討:公開ノートブックをベースに検討ポイントを考察 sBert L6-v2 sBert L6-v2 Wikipedia all data 5pages 20sentences 5pages 20sentences Deberta-v3 large Deberta-v3 large average Final submit 1. Data Variation 2. Method variation 3. Num of reference 4. interpretation and inference capability 5. Method 50% 50% Wikipedia 6.5M data 最終的な仮説:実験を進めるうちに1. と2. が重要であることに気付きアプローチを検討! 性能改善のための検討ポイントは5つ。これらをどれだけ検討できるかがポイント ★Examination points Data source Retrieve method Retrieve data Model Ensemble
26/xx TF-IDF TF-IDF Data source Retrieve method sBert bge-small wikipedia 270k stem-wiki-cohere-no-emb sBert L6-12 Wikipedia all data 20chunk 20 paragraph 20pages 6sentences Retrieve data model Deberta-v3 large Deberta-v3 large Deberta-v3 large Deberta-v3 large average Final submit ensemble 20% 20% 20% 20% 20chunk sBert bge-small extract 20 sentence from 20 paragraph Deberta-v3 large wikipedia270k all-paraphs-parsed-expanded 20% wikipedia270k all-paraphs-parsed-expanded wikipedia270k all-paraphs-parsed-expanded 31th Place Solution : Kaggle - LLM Science Exam Public model : llm-science-run-context2 U-bex’s Main contribution Sugupoko 大きな変更点:データソースと参照する文書数を増やしアンサンブル!
33/xx 補足:gpt-3.5を使ったデータセット生成 • 70$かけて70k行の文書探索精度を上げ るためのデータセットを作成 – Kaggle - LLM Science Exam | Kaggle – 私は使いこなせなかったためkaggleで公開 • 3位の人が使いこなした。#嬉しい。 – プロンプトは→ • 文書のQAと、QAを作るためにつかった sentenceを同時に生成 system_message = f""" You will be provided with TEXT from wikipedia. ¥ The TEXT will be delimited with {delimiter} characters. Output a python list of 3 dict objects, where each object is ¥ a multiple choice question whose answers should be in ¥ the given TEXT and that has 5 choices each. Each object should have the following format: 'question': <question on the TEXT> 'option_1': <question answer option> 'option_2': <question answer option> 'option_3': <question answer option> 'option_4': <question answer option> 'option_5': <question answer option> 'answer': <answer option key label> 'reference_sentence': <original sentence from the TEXT that supports the answer> You should tell me which one of your proposed options is right ¥ by assigning the corresponding option's key label in the 'answer' field. Also, provide the original sentence ¥ from the TEXT that supports the answer in the 'reference_sentence' field. The question, the answer, and question answer options should be broad, ¥ challenging, long, detailed, and based on the TEXT provided. Additionally, ensure the token distribution of question follows these statistics: - Mean: 14.22 tokens - Std Deviation: 7.223939 tokens - Min: 4 token - 25th Percentile: 9 tokens - Median: 13 tokens - 75th Percentile: 17.25 tokens - Max: 49 tokens Additionally, ensure the token distribution of each answer follows these statistics: - Mean: 30.840 tokens - Std Deviation: 19.883692 tokens - Min: 1 token - 25th Percentile: 16 tokens - Median: 27.5 tokens - 75th Percentile: 43.25 tokens - Max: 100 tokens Only output the list of objects, with nothing else.
46/xx Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering | Transactions of the Association for Computational Linguistics | MIT Press