Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

論文紹介 Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

1 論文概要  論文タイトル – Next-Generation Database Interfaces: A Survey
of LLM-based Text-to-SQL  著者 –  発表 – arxiv ⚫ https://arxiv.org/abs/2406.08426  概要 – Text-to-SQLに関するSurvey論文

2 Text-to-SQL  Text-to-SQLとは自然言語の質問をSQLクエリに変換することを目的としたタスク

3 本論文で扱う内容  Datasets and Benchmarks – LLMベースのText-to-SQLシステムを評価するために使用されるDatasetとBenchmark  Evaluation
Metrics – LLMベースのText-to-SQLシステムの性能を評価するために使用される評価指標 ⚫ Context matching based と Execution based  Methods and Models – LLMベースのText-to-SQLに採用される手法とモデル ⚫ In-context learning, fine-tuning  Expectations and Future Directions – LLMベースのText-to-SQLに残された課題と限界、将来の研究の方向性

4 本論文で扱う内容  サーベイの構造と内容の要約

5 Overview  Text-to-SQLの課題 – 言語的な複雑さと曖昧さ – スキーマの理解と表現 – レアで複雑なSQL操作
– クロスドメインの汎化  進化プロセス – ルールベース – Deep Learningベース – Pretrained Language Modelベース – LLMベース

6 データセット  データセットを2つに分類 – Original Dataset – Post-annotated Dataset

7 評価指標  Content Matching-based Metrics – Component Matching (CM)
⚫ SQL component (SELECT, WHERE, GROUP BY, ORDER BY, KEYWORDS) の完全一致(F1-score)で評価 – Exact Matching (EM) ⚫ クエリの完全一致の割合で評価  Execution-based Metrics – Execution Accuracy (EX) ⚫ クエリの実行結果の正確さで評価 – Valid Efficiency Score (VES) ⚫ クエリの実行結果の一致とクエリの効率性（処理時間）で評価

8 手法  手法は In-context Learning と Fine-tuning の２つに大別

9 手法 In-context Learning  𝑌：実行可能なSQLクエリ  𝑄：ユーザーの質問  𝑆：データベースのschema/content
– 𝑆 = 𝐶, 𝑇, 𝐾 に分解 ⚫ 𝐶 = {𝑐1 , 𝑐2 , … }：columnの集合 ⚫ 𝑇 = {𝑡1 , 𝑡2 , … }：tableの集合 ⚫ 𝐾：潜在的な外部知識（例：外部キーの関係、schema linking、domain知識）  𝐼：text-to-SQLタスクのInstruction  𝑓(・|𝜃)：パラメータ𝜃をもつLLM

10 手法 In-context Learning  In-context Learningの手法は5つに分類 – 𝐶0-Trivial Prompt
– 𝐶1-Decomposition – 𝐶2-Prompt Optimization – 𝐶3-Reasoning Enhancement – 𝐶4-Execution Refinement

11 手法 In-context Learning: 𝑪𝟎 -Trivial Prompt  𝐶0-Trivial Prompt：
– Zero-shotプロンプト 𝑃0 = 𝐼 ⊕ 𝑆 ⊕ 𝑄 ⚫ 𝑆 = 𝐶, 𝑇, 𝐾 – Columnの集合 𝐶 = 𝑐1 , 𝑐2 , … – Tableの集合 𝑇 = {𝑡1 , 𝑡2 , … } – 潜在的な外部知識 𝐾 » 外部キー関係 » Schema Linking » ドメイン知識 – Few-shotプロンプト 𝑃𝑛 = {𝐹1 , 𝐹2 , … , 𝐹𝑛 } ⊕ 𝑃0 ⚫ 𝐹𝑖 = (𝑆𝑖 , 𝑄𝑖 , 𝑌𝑖 )

12 𝐶0 -Trivial Prompt の関連研究  [7] Evaluating the Text-to-SQL
Capabilities of Large Language Models [Rajkumar+ 2022]  [27] A comprehensive evaluation of chatgpt’s zero-shot text-to-sql capability [Liu+ 2023]  [30] C3: Zero-shot text-to-sql with chatgpt [Dong+ 2023]  [46] Benchmarking the text-to-sql capability of large language models: A comprehensive evaluation [Zhang+ 2024] [Dong+ 2023] Dong, Xuemei, et al. "C3: Zero-shot text-to-sql with chatgpt." arXiv preprint arXiv:2307.07306 (2023). [Rajkumar+ 2022] Rajkumar, Nitarshan, Raymond Li, and Dzmitry Bahdanau. "Evaluating the text-to-sql capabilities of large language models." arXiv preprint arXiv:2204.00498 (2022). [Liu+ 2023] Liu, Aiwei, et al. "A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability." arXiv preprint arXiv:2303.13547 (2023). [Zhang+ 2024] Zhang, Bin, et al. "Benchmarking the text-to-sql capability of large language models: A comprehensive evaluation." arXiv preprint arXiv:2403.02951 (2024).

13 手法 In-context Learning: 𝑪𝟏 -Decomposition  𝐶1-Decomposition – Sub-task
decomposition： – Sub-question decomposition：ユーザーの質問をsub-質問に分解する  代表的な手法 – DIN-SQL[8] – Coder-Reviewer[56]

14 In-context Learning の研究一覧

15 WIP  WIP

16 論文中で引用された文献の詳細について  Zero-shot Promptについて一部要約 – Evaluating the Text-to-SQL Capabilities
of Large Language Models – A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability – C3: Zero-shot Text-to-SQL with ChatGPT – Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation

先行研究 Zero-shot Prompt Evaluating the Text-to-SQL Capabilities of Large Language
Models 17

18 Evaluating the Text-to-SQL Capabilities of Large Language Models 
CodexモデルのText-to-SQL性能の評価  実験では5種類のPromptを使用 – Question – API Docs – Select {n} – Create Table – Create Table + Select {n}

Question  API Docs

Select 3

Create Table

Create Table + Select 3 (右に続く)

先行研究 Zero-shot Prompt A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL
capability 23

24 A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability 
ChatGPTのText-to-SQL能力に関する包括的な分析  プロンプトの探索は行わず、OpenAIのデモで使用されたプロンプトを採用

25 A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability

先行研究 Zero-shot Prompt C3: Zero-shot Text-to-SQL with ChatGPT 26

27 C3: Zero-shot Text-to-SQL with ChatGPT  ChatGPTベースのZero-shot Text-to-SQL手法のC3を提案 –
Clear Prompting (CP) + Calibration with Hints (CH) + Consistency Output (CO)

29 C3: Zero-shot Text-to-SQL with ChatGPT  Clear Prompting –
Clear Layout ⚫ クリアなレイアウトは ChatGPTに理解されやすく高い性能を示す – Clear Context ⚫ プロンプトにschema全体を含めた場合の課題 – 無関係な項目を含めると出力に無関係な項目を生成する可能性が高まる – APIコストが高額になる ⚫ Schema linking – 質問に関連するテーブルとカラムを呼び出す – Zero-shot Promptで実現 – Table Recall と Column Recall の 2steps

30 C3: Zero-shot Text-to-SQL with ChatGPT  Clear Context –
Table Recall ⚫ ユーザーの質問との関連性に基づいてテーブルをランク付け ⚫ Self-Consistency法を使用（出力を10回をサンプリング） – 各サンプルの上位4テーブルを抽出 – 抽出した4テーブルのリストについて投票メカニズムで最終的なテーブルリストを選択 » 4テーブルのリスト単位で投票メカニズムを適用 – Column Recall ⚫ Table Recall の結果に基づいて、候補テーブル内のカラムを質問との関連性でランク付け ⚫ Self-Consistency法を使用（出力を10回をサンプリング） – 各サンプルの上位4カラムを抽出 – 抽出したカラムについて投票メカニズムで最終的な5つのカラムを選択 » カラム単位で投票メカニズムを適用

31 C3: Zero-shot Text-to-SQL with ChatGPT  Table Recall Prompt
Given the database schema and question, perform the following actions: 1 - Rank all the tables based on the possibility of being used in the SQL according to the question from the most relevant to the least relevant, Table or its column that matches more with the question words is highly relevant and must be placed ahead. 2 - Check whether you consider all the tables. 3 - Output a list object in the order of step 2, Your output should contain all the tables. The format should be like: [ "table_1", "table_2", ... ] Schema: # continents ( contid, continent ) # countries ( countryid, countryname, continent ) # car_makers ( id, maker, fullname, country ) # model_list ( moddeli, maker, model ) # car_names ( makeid, model, make ) # cars_data ( id, mpg, cylinders, edispl, horsepower, weight, accelerate, year ) Question: ### What is the name of the different car makers who produced a car in 1970?

32 C3: Zero-shot Text-to-SQL with ChatGPT  Table Recall Prompt
(日本語) データベースのスキーマと質問を基に、以下のアクションを実行します。 1 - 質問のSQLで使用される可能性に基づいて、すべてのテーブルを最も関連性の高いものから最も関連性の低いものまで順位付けします。質問の単語と一致するテーブルまたはその列は、関連性が高いので、優先的に配置する必要があります。 2 - すべてのテーブルを考慮しているかどうかを確認します。 3 - ステップ2の順序でリストオブジェクトを出力します。出力にはすべてのテーブルを含める必要があります。フォーマットは次のようになります。 [ "table_1", "table_2", ... ] Schema: # continents ( contid, continent ) # countries ( countryid, countryname, continent ) # car_makers ( id, maker, fullname, country ) # model_list ( moddeli, maker, model ) # car_names ( makeid, model, make ) # cars_data ( id, mpg, cylinders, edispl, horsepower, weight, accelerate, year ) Question: ### 1970年に車を製造した自動車メーカーの名称は？

33 C3: Zero-shot Text-to-SQL with ChatGPT  Column Recall Prompt
Given the database tables and question, perform the following actions: 1 - Rank the columns in each table based on the possibility of being used in the SQL, Column that matches more with the question words or the foreign key is highly relevant and must be placed ahead. You should output them in the order of the most relevant to the least relevant. Explain why you choose each column. 2 - Output a JSON object that contains all the columns in each table according to your explanation. The format should be like: { "table_1": ["column_1", "column_2", ......], "table_2": ["column_1", "column_2", ......], "table_3": ["column_1", "column_2", ......], ...... } Schema: # car_makers ( id, maker, fullname, country ) # model_list ( modelid, maker, model ) # car_names ( makeid, model, make ) # cars_data ( id, mpg, cylinders, edispl, horsepower, weight, accelerate, year ) Foreign keys: # model_list.maker = car_makers.id # car_names.model = model_list.model # cars_data.id = car_names.makeid Question: ### What is the name of the different car makers who produced a car in 1970?

34 C3: Zero-shot Text-to-SQL with ChatGPT  Column Recall Prompt
(日本語) データベーステーブルと質問を考慮し、以下の操作を実行します。 1 - 各テーブルのカラムを、SQLで使用される可能性に基づいてランク付けします。質問の単語または外部キーと一致するカラムは関連性が高く、優先的に配置する必要があります。最も関連性が高い順から低い順に、それらを出力する必要があります。各カラムを選択した理由を説明してください。 2 - 説明に従って、各テーブルのすべてのカラムを含むJSONオブジェクトを出力します。フォーマットは次のようになります。 { "table_1": ["column_1", "column_2", ......], "table_2": ["column_1", "column_2", ......], "table_3": ["column_1", "column_2", ......], ...... } Schema: # car_makers ( id, maker, fullname, country ) # model_list ( modelid, maker, model ) # car_names ( makeid, model, make ) # cars_data ( id, mpg, cylinders, edispl, horsepower, weight, accelerate, year ) Foreign keys: # model_list.maker = car_makers.id # car_names.model = model_list.model # cars_data.id = car_names.makeid Question: ### What is the name of the different car makers who produced a car in 1970?

36 C3: Zero-shot Text-to-SQL with ChatGPT  Calibration of Model
Bias – ChatGPTは余分な列や余分な実行結果を提供する傾向があった

37 C3: Zero-shot Text-to-SQL with ChatGPT  Calibration with Hints
– Hint 1: 必要なColumnだけを選択できるようなヒント – Hint 2: SQL keywords を誤用しないようなヒント（右に続く）

39 C3: Zero-shot Text-to-SQL with ChatGPT  Self-Consistency [Wang+ 2023]
をText-to-SQLに実装する – 複数の推論パスをサンプリングし、多様なSQL出力を生成 – 各SQL出力を実行し実行結果を収集 – 実行結果から投票メカニズムを適用し、最終的なSQLを選択 ⚫ エラーとなる実行結果は除く [Wang+ 2023] Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-consistency improves chain of thought reasoning in language models. In ICLR.

40 C3: Zero-shot Text-to-SQL with ChatGPT  実験設定 – データセット：Spider
– 評価指標：Execution Accuracy (EX) ⚫ クエリの実行結果の正確さで評価  結果  実験設定（C3） – ChatGPT API: gpt-3.5-turbo-0301 – CP法/CH法によってプロンプトを生成 – CO法では20のSQLクエリを生成

41 C3: Zero-shot Text-to-SQL with ChatGPT  Clear Layout の有効性

42 C3: Zero-shot Text-to-SQL with ChatGPT  Clear Context, Caliburation
with Hints の有効性

43 C3: Zero-shot Text-to-SQL with ChatGPT  Self-Consistency の有効性

先行研究 Zero-shot Prompt Benchmarking the Text-to-SQL Capability of Large Language
Models: A Comprehensive Evaluation 44

45 Benchmarking the Text-to-SQL Capability of Large Language Models: A
Comprehensive Evaluation  Text-to-SQL

Next-Generation Database Interfaces: A Survey o...

Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

More Decks by ty

Featured

Transcript