Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

Slide 1

Slide 1 text

論文紹介 Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

Slide 2

Slide 2 text

1 論文概要  論文タイトル – Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL  著者 –  発表 – arxiv ⚫ https://arxiv.org/abs/2406.08426  概要 – Text-to-SQLに関するSurvey論文

Slide 3

Slide 3 text

2 Text-to-SQL  Text-to-SQLとは自然言語の質問をSQLクエリに変換することを目的としたタスク

Slide 4

Slide 4 text

3 本論文で扱う内容  Datasets and Benchmarks – LLMベースのText-to-SQLシステムを評価するために使用されるDatasetとBenchmark  Evaluation Metrics – LLMベースのText-to-SQLシステムの性能を評価するために使用される評価指標 ⚫ Context matching based と Execution based  Methods and Models – LLMベースのText-to-SQLに採用される手法とモデル ⚫ In-context learning, fine-tuning  Expectations and Future Directions – LLMベースのText-to-SQLに残された課題と限界、将来の研究の方向性

Slide 5

Slide 5 text

4 本論文で扱う内容  サーベイの構造と内容の要約

Slide 6

Slide 6 text

5 Overview  Text-to-SQLの課題 – 言語的な複雑さと曖昧さ – スキーマの理解と表現 – レアで複雑なSQL操作 – クロスドメインの汎化  進化プロセス – ルールベース – Deep Learningベース – Pretrained Language Modelベース – LLMベース

Slide 7

Slide 7 text

6 データセット  データセットを2つに分類 – Original Dataset – Post-annotated Dataset

Slide 8

Slide 8 text

7 評価指標  Content Matching-based Metrics – Component Matching (CM) ⚫ SQL component (SELECT, WHERE, GROUP BY, ORDER BY, KEYWORDS) の完全一致(F1-score)で評価 – Exact Matching (EM) ⚫ クエリの完全一致の割合で評価  Execution-based Metrics – Execution Accuracy (EX) ⚫ クエリの実行結果の正確さで評価 – Valid Efficiency Score (VES) ⚫ クエリの実行結果の一致とクエリの効率性（処理時間）で評価

Slide 9

Slide 9 text

8 手法  手法は In-context Learning と Fine-tuning の２つに大別

Slide 10

Slide 10 text

9 手法 In-context Learning  𝑌：実行可能なSQLクエリ  𝑄：ユーザーの質問  𝑆：データベースのschema/content – 𝑆 = 𝐶, 𝑇, 𝐾 に分解 ⚫ 𝐶 = {𝑐1 , 𝑐2 , … }：columnの集合 ⚫ 𝑇 = {𝑡1 , 𝑡2 , … }：tableの集合 ⚫ 𝐾：潜在的な外部知識（例：外部キーの関係、schema linking、domain知識）  𝐼：text-to-SQLタスクのInstruction  𝑓(・|𝜃)：パラメータ𝜃をもつLLM

Slide 11

Slide 11 text

10 手法 In-context Learning  In-context Learningの手法は5つに分類 – 𝐶0-Trivial Prompt – 𝐶1-Decomposition – 𝐶2-Prompt Optimization – 𝐶3-Reasoning Enhancement – 𝐶4-Execution Refinement

Slide 12

Slide 12 text

11 手法 In-context Learning: 𝑪𝟎 -Trivial Prompt  𝐶0-Trivial Prompt： – Zero-shotプロンプト 𝑃0 = 𝐼 ⊕ 𝑆 ⊕ 𝑄 ⚫ 𝑆 = 𝐶, 𝑇, 𝐾 – Columnの集合 𝐶 = 𝑐1 , 𝑐2 , … – Tableの集合 𝑇 = {𝑡1 , 𝑡2 , … } – 潜在的な外部知識 𝐾 » 外部キー関係 » Schema Linking » ドメイン知識 – Few-shotプロンプト 𝑃𝑛 = {𝐹1 , 𝐹2 , … , 𝐹𝑛 } ⊕ 𝑃0 ⚫ 𝐹𝑖 = (𝑆𝑖 , 𝑄𝑖 , 𝑌𝑖 )

Slide 13

Slide 13 text

12 𝐶0 -Trivial Prompt の関連研究  [7] Evaluating the Text-to-SQL Capabilities of Large Language Models [Rajkumar+ 2022]  [27] A comprehensive evaluation of chatgpt’s zero-shot text-to-sql capability [Liu+ 2023]  [30] C3: Zero-shot text-to-sql with chatgpt [Dong+ 2023]  [46] Benchmarking the text-to-sql capability of large language models: A comprehensive evaluation [Zhang+ 2024] [Dong+ 2023] Dong, Xuemei, et al. "C3: Zero-shot text-to-sql with chatgpt." arXiv preprint arXiv:2307.07306 (2023). [Rajkumar+ 2022] Rajkumar, Nitarshan, Raymond Li, and Dzmitry Bahdanau. "Evaluating the text-to-sql capabilities of large language models." arXiv preprint arXiv:2204.00498 (2022). [Liu+ 2023] Liu, Aiwei, et al. "A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability." arXiv preprint arXiv:2303.13547 (2023). [Zhang+ 2024] Zhang, Bin, et al. "Benchmarking the text-to-sql capability of large language models: A comprehensive evaluation." arXiv preprint arXiv:2403.02951 (2024).

Slide 14

Slide 14 text

13 手法 In-context Learning: 𝑪𝟏 -Decomposition  𝐶1-Decomposition – Sub-task decomposition： – Sub-question decomposition：ユーザーの質問をsub-質問に分解する  代表的な手法 – DIN-SQL[8] – Coder-Reviewer[56]

Slide 15

Slide 15 text

14 In-context Learning の研究一覧

Slide 16

Slide 16 text

15 WIP  WIP

Slide 17

Slide 17 text

16 論文中で引用された文献の詳細について  Zero-shot Promptについて一部要約 – Evaluating the Text-to-SQL Capabilities of Large Language Models – A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability – C3: Zero-shot Text-to-SQL with ChatGPT – Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation

Slide 18

Slide 18 text

先行研究 Zero-shot Prompt Evaluating the Text-to-SQL Capabilities of Large Language Models 17

Slide 19

Slide 19 text

18 Evaluating the Text-to-SQL Capabilities of Large Language Models  CodexモデルのText-to-SQL性能の評価  実験では5種類のPromptを使用 – Question – API Docs – Select {n} – Create Table – Create Table + Select {n}

Slide 20

Slide 20 text

19 Evaluating the Text-to-SQL Capabilities of Large Language Models  Question  API Docs

Slide 21

Slide 21 text

20 Evaluating the Text-to-SQL Capabilities of Large Language Models  Select 3

Slide 22

Slide 22 text

21 Evaluating the Text-to-SQL Capabilities of Large Language Models  Create Table

Slide 23

Slide 23 text

22 Evaluating the Text-to-SQL Capabilities of Large Language Models  Create Table + Select 3 (右に続く)

Slide 24

Slide 24 text

先行研究 Zero-shot Prompt A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability 23

Slide 25

Slide 25 text

24 A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability  ChatGPTのText-to-SQL能力に関する包括的な分析  プロンプトの探索は行わず、OpenAIのデモで使用されたプロンプトを採用

Slide 26

Slide 26 text

25 A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability

Slide 27

Slide 27 text

先行研究 Zero-shot Prompt C3: Zero-shot Text-to-SQL with ChatGPT 26

Slide 28

Slide 28 text

27 C3: Zero-shot Text-to-SQL with ChatGPT  ChatGPTベースのZero-shot Text-to-SQL手法のC3を提案 – Clear Prompting (CP) + Calibration with Hints (CH) + Consistency Output (CO)

Slide 29

Slide 29 text

28 C3: Zero-shot Text-to-SQL with ChatGPT  ChatGPTベースのZero-shot Text-to-SQL手法のC3を提案 – Clear Prompting (CP) + Calibration with Hints (CH) + Consistency Output (CO)

Slide 30

Slide 30 text

29 C3: Zero-shot Text-to-SQL with ChatGPT  Clear Prompting – Clear Layout ⚫ クリアなレイアウトは ChatGPTに理解されやすく高い性能を示す – Clear Context ⚫ プロンプトにschema全体を含めた場合の課題 – 無関係な項目を含めると出力に無関係な項目を生成する可能性が高まる – APIコストが高額になる ⚫ Schema linking – 質問に関連するテーブルとカラムを呼び出す – Zero-shot Promptで実現 – Table Recall と Column Recall の 2steps

Slide 31

Slide 31 text

30 C3: Zero-shot Text-to-SQL with ChatGPT  Clear Context – Table Recall ⚫ ユーザーの質問との関連性に基づいてテーブルをランク付け ⚫ Self-Consistency法を使用（出力を10回をサンプリング） – 各サンプルの上位4テーブルを抽出 – 抽出した4テーブルのリストについて投票メカニズムで最終的なテーブルリストを選択 » 4テーブルのリスト単位で投票メカニズムを適用 – Column Recall ⚫ Table Recall の結果に基づいて、候補テーブル内のカラムを質問との関連性でランク付け ⚫ Self-Consistency法を使用（出力を10回をサンプリング） – 各サンプルの上位4カラムを抽出 – 抽出したカラムについて投票メカニズムで最終的な5つのカラムを選択 » カラム単位で投票メカニズムを適用

Slide 32

Slide 32 text

31 C3: Zero-shot Text-to-SQL with ChatGPT  Table Recall Prompt Given the database schema and question, perform the following actions: 1 - Rank all the tables based on the possibility of being used in the SQL according to the question from the most relevant to the least relevant, Table or its column that matches more with the question words is highly relevant and must be placed ahead. 2 - Check whether you consider all the tables. 3 - Output a list object in the order of step 2, Your output should contain all the tables. The format should be like: [ "table_1", "table_2", ... ] Schema: # continents ( contid, continent ) # countries ( countryid, countryname, continent ) # car_makers ( id, maker, fullname, country ) # model_list ( moddeli, maker, model ) # car_names ( makeid, model, make ) # cars_data ( id, mpg, cylinders, edispl, horsepower, weight, accelerate, year ) Question: ### What is the name of the different car makers who produced a car in 1970?

Slide 33

Slide 33 text

32 C3: Zero-shot Text-to-SQL with ChatGPT  Table Recall Prompt (日本語) データベースのスキーマと質問を基に、以下のアクションを実行します。 1 - 質問のSQLで使用される可能性に基づいて、すべてのテーブルを最も関連性の高いものから最も関連性の低いものまで順位付けします。質問の単語と一致するテーブルまたはその列は、関連性が高いので、優先的に配置する必要があります。 2 - すべてのテーブルを考慮しているかどうかを確認します。 3 - ステップ2の順序でリストオブジェクトを出力します。出力にはすべてのテーブルを含める必要があります。フォーマットは次のようになります。 [ "table_1", "table_2", ... ] Schema: # continents ( contid, continent ) # countries ( countryid, countryname, continent ) # car_makers ( id, maker, fullname, country ) # model_list ( moddeli, maker, model ) # car_names ( makeid, model, make ) # cars_data ( id, mpg, cylinders, edispl, horsepower, weight, accelerate, year ) Question: ### 1970年に車を製造した自動車メーカーの名称は？

Slide 34

Slide 34 text

33 C3: Zero-shot Text-to-SQL with ChatGPT  Column Recall Prompt Given the database tables and question, perform the following actions: 1 - Rank the columns in each table based on the possibility of being used in the SQL, Column that matches more with the question words or the foreign key is highly relevant and must be placed ahead. You should output them in the order of the most relevant to the least relevant. Explain why you choose each column. 2 - Output a JSON object that contains all the columns in each table according to your explanation. The format should be like: { "table_1": ["column_1", "column_2", ......], "table_2": ["column_1", "column_2", ......], "table_3": ["column_1", "column_2", ......], ...... } Schema: # car_makers ( id, maker, fullname, country ) # model_list ( modelid, maker, model ) # car_names ( makeid, model, make ) # cars_data ( id, mpg, cylinders, edispl, horsepower, weight, accelerate, year ) Foreign keys: # model_list.maker = car_makers.id # car_names.model = model_list.model # cars_data.id = car_names.makeid Question: ### What is the name of the different car makers who produced a car in 1970?

Slide 35

Slide 35 text

34 C3: Zero-shot Text-to-SQL with ChatGPT  Column Recall Prompt (日本語) データベーステーブルと質問を考慮し、以下の操作を実行します。 1 - 各テーブルのカラムを、SQLで使用される可能性に基づいてランク付けします。質問の単語または外部キーと一致するカラムは関連性が高く、優先的に配置する必要があります。最も関連性が高い順から低い順に、それらを出力する必要があります。各カラムを選択した理由を説明してください。 2 - 説明に従って、各テーブルのすべてのカラムを含むJSONオブジェクトを出力します。フォーマットは次のようになります。 { "table_1": ["column_1", "column_2", ......], "table_2": ["column_1", "column_2", ......], "table_3": ["column_1", "column_2", ......], ...... } Schema: # car_makers ( id, maker, fullname, country ) # model_list ( modelid, maker, model ) # car_names ( makeid, model, make ) # cars_data ( id, mpg, cylinders, edispl, horsepower, weight, accelerate, year ) Foreign keys: # model_list.maker = car_makers.id # car_names.model = model_list.model # cars_data.id = car_names.makeid Question: ### What is the name of the different car makers who produced a car in 1970?

Slide 36

Slide 36 text

35 C3: Zero-shot Text-to-SQL with ChatGPT  ChatGPTベースのZero-shot Text-to-SQL手法のC3を提案 – Clear Prompting (CP) + Calibration with Hints (CH) + Consistency Output (CO)

Slide 37

Slide 37 text

36 C3: Zero-shot Text-to-SQL with ChatGPT  Calibration of Model Bias – ChatGPTは余分な列や余分な実行結果を提供する傾向があった

Slide 38

Slide 38 text

37 C3: Zero-shot Text-to-SQL with ChatGPT  Calibration with Hints – Hint 1: 必要なColumnだけを選択できるようなヒント – Hint 2: SQL keywords を誤用しないようなヒント（右に続く）

Slide 39

Slide 39 text

38 C3: Zero-shot Text-to-SQL with ChatGPT  ChatGPTベースのZero-shot Text-to-SQL手法のC3を提案 – Clear Prompting (CP) + Calibration with Hints (CH) + Consistency Output (CO)

Slide 40

Slide 40 text

39 C3: Zero-shot Text-to-SQL with ChatGPT  Self-Consistency [Wang+ 2023] をText-to-SQLに実装する – 複数の推論パスをサンプリングし、多様なSQL出力を生成 – 各SQL出力を実行し実行結果を収集 – 実行結果から投票メカニズムを適用し、最終的なSQLを選択 ⚫ エラーとなる実行結果は除く [Wang+ 2023] Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-consistency improves chain of thought reasoning in language models. In ICLR.

Slide 41

Slide 41 text

40 C3: Zero-shot Text-to-SQL with ChatGPT  実験設定 – データセット：Spider – 評価指標：Execution Accuracy (EX) ⚫ クエリの実行結果の正確さで評価  結果  実験設定（C3） – ChatGPT API: gpt-3.5-turbo-0301 – CP法/CH法によってプロンプトを生成 – CO法では20のSQLクエリを生成

Slide 42

Slide 42 text

41 C3: Zero-shot Text-to-SQL with ChatGPT  Clear Layout の有効性

Slide 43

Slide 43 text

42 C3: Zero-shot Text-to-SQL with ChatGPT  Clear Context, Caliburation with Hints の有効性

Slide 44

Slide 44 text

43 C3: Zero-shot Text-to-SQL with ChatGPT  Self-Consistency の有効性

Slide 45

Slide 45 text

先行研究 Zero-shot Prompt Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation 44

Slide 46

Slide 46 text

45 Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation  Text-to-SQL

Slide 47

Slide 47 text

46 Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation  Text-to-SQL

Slide 48

Slide 48 text

47 Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation  Text-to-SQL

Slide 49

Slide 49 text

48 Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation  Text-to-SQL