大テーブルと小テーブルのJOINのコスト計算の話

大テーブルと小テーブルの JOINのコスト計算の話 MySQL アンカンファレンス #002 @amamanamam

くぼ • DBREやってます • MySQLが好きです • ビールをよく飲みます • Xはこちら ◦
https://twitter.com/amamanamam

前回のアンカンファレンス(#001)でhogeさんの以下の発表を聞きました https://www.docswell.com/s/hoge/ZNRP8V-2024-02-03-184213

激しめの要約 • 大きめのテーブル（100万行）と小さめのテーブル（3行）をLEFT JOINするクエリが妙に遅い • 後者の結合タイプが全検索となりhash joinが選択されていることが分かった •
後者のテーブルの行数を10行に増やすと、後者の結合タイプがeq_refに変わる → 手元で再現して原因をいろいろ調べてみることにした ※当発表の内容はブログ記事にも載せてます https://amamanamam.hatenablog.com/entry/2024/02/11/005331

環境/設定 mysql> select version(); +--------------+ | version() | +--------------+ |
8.0.28-debug | +--------------+ 1 row in set (0.01 sec)

環境/設定 mysql> select count(*) from mytable; +----------+ | count(*) |
+----------+ | 524283 | +----------+ 1 row in set (0.92 sec) mysql> select count(*) from types; +----------+ | count(*) | +----------+ | 3 | +----------+ 1 row in set (0.10 sec)

ここまででフワフワ想像したこと • 100万行×3行の時もeq_refは選択肢としてあったのであろう。なんかイロイロ計算してALLの方が良いと判断したのであろう。 • オプティマイザトレースしたらその判断過程がきっと記録されているはずであろう

オプティマイザトレースしてみた結果が長くてツライのでブログ記事へ https://amamanamam.hatenablog.com/entry/2 024/02/11/005331

分かったこと • eq_refよりもALLの方が確かにコストが安い • 100万行×10行にするとeq_refとALLのコストの大小が逆転する • JOIN先のレコード数は少ないので、全検索した方が早いんじゃないかー？といった判断をしているぽい

疑問 • eq_refのプランもALLのプランも"chosen": trueなのはなぜ？結局どのタイミングでALLでいくぞと決断している？ • 全検索の方が早くねー？という感覚はわかるが、実際どのようなコスト計算をしているの？ • "recheck_reason":
"not_first_table"って何？ → ソースを見る

疑問 • eq_refのプランもALLのプランも"chosen": trueなのはなぜ？結局どのタイミングでALLでいくぞと決断している？ • 全検索の方が早くねー？という感覚はわかるが、実際どのようなコスト計算をしているの？ • "recheck_reason":
"not_first_table"って何？

best_access_path（sql_planner.cc） void Optimize_table_order::best_access_path(JOIN_TAB *tab, const table_map remaining_tables, const uint idx,
bool disable_jbuf, const double prefix_rowcount, POSITION *pos) { ... // The 'ref' access method with lowest cost as found by find_best_ref() Key_use *best_ref = nullptr; // Look for the best ref access if the storage engine supports index access. if (tab->keyuse() != nullptr && (table->file->ha_table_flags() & HA_NO_INDEX_ACCESS) == 0) best_ref = find_best_ref(tab, remaining_tables, idx, prefix_rowcount, &found_condition, &ref_depend_map, &used_key_parts); double rows_fetched = best_ref ? best_ref->fanout : DBL_MAX; /* Cost of executing the best access method prefix_rowcount number of times */ double best_read_cost = best_ref ? best_ref->read_cost : DBL_MAX;

best_access_path（sql_planner.cc） double scan_read_cost = calculate_scan_cost( tab, idx, best_ref, prefix_rowcount, found_condition,
disable_jbuf, &rows_after_filtering, &trace_access_scan); /* We estimate the cost of evaluating WHERE clause for found records as row_evaluate_cost(prefix_rowcount * rows_after_filtering). This cost plus scan_cost gives us total cost of using TABLE/INDEX/RANGE SCAN. */ const double scan_total_cost = scan_read_cost + cost_model->row_evaluate_cost(prefix_rowcount * rows_after_filtering); trace_access_scan.add("resulting_rows", rows_after_filtering); trace_access_scan.add("cost", scan_total_cost); if (best_ref == nullptr || (scan_total_cost < best_read_cost + cost_model->row_evaluate_cost(prefix_rowcount * rows_fetched))) {

best_access_path（sql_planner.cc） /* If the table has a range (tab->quick is
set) make_join_query_block() will ensure that this will be used */ best_read_cost = scan_read_cost; rows_fetched = rows_after_filtering; if (tab->found_records) { /* Although join buffering may be used for this table, this filter calculation is not done to calculate the cost of join buffering itself (that is done inside calculate_scan_cost()). The is_join_buffering parameter is therefore 'false'. */ const float full_filter = calculate_condition_filter( tab, nullptr, ~remaining_tables & ~excluded_tables, static_cast<double>(tab->found_records), false, false, trace_access_scan); filter_effect = static_cast<float>(std::min( 1.0, tab->found_records * full_filter / rows_after_filtering)); } best_ref = nullptr; best_uses_jbuf = !disable_jbuf; ref_depend_map = 0; }

分かったこと • まずrefアクセスのコストを計算し、良いrefアクセスを見つけた時点で chosen:trueになる • その後にALLのコストを計算し、そちらの方が安ければALLを選択して chosen:trueになる（前のrefアクセスのchosenが書き換えられることはない）

疑問 • eq_refのプランもALLのプランも"chosen": trueなのはなぜ？結局どのタイミングでALLでイクゾと決断している？ • 全検索の方が早くねー？という感覚はわかるが、実際どのようなコスト計算をしているの？ • "recheck_reason":
"not_first_table"って何？

コスト比較の部分のソース if (best_ref == nullptr || (scan_total_cost < best_read_cost +
cost_model->row_evaluate_cost(prefix_rowcount * rows_fetched))) {

find_best_ref（sql_planner.cc） Key_use *Optimize_table_order::find_best_ref( … // Check if we found full
key if (all_key_parts_covered && !ref_or_null_part) /* use eq key */ { cur_used_keyparts = (uint)~0; if (keyinfo->flags & HA_NOSAME && ((keyinfo->flags & HA_NULL_PART_KEY) == 0 || all_key_parts_non_null)) { cur_read_cost = prev_record_reads(join, idx, table_deps) * table->cost_model()->page_read_cost(1.0);

まずは前者  Key_use *Optimize_table_order::find_best_ref( … // Check if we found full

prev_read_cost（sql_planner.cc） static double prev_record_reads(JOIN *join, uint idx, table_map found_ref) {
double found = 1.0; POSITION *pos_end = join->positions - 1; for (POSITION *pos = join->positions + idx - 1; pos != pos_end; pos--) { const double fanout = pos->rows_fetched * pos->filter_effect; if (pos->table->table_ref->map() & found_ref) { found_ref |= pos->ref_depend_map; if (pos->rows_fetched > DBL_EPSILON) found *= fanout; } else if (fanout < 1.0) { found *= fanout; } } return found;

prev_read_cost（sql_planner.cc） static double prev_record_reads(JOIN *join, uint idx, table_map found_ref) {
double found = 1.0; POSITION *pos_end = join->positions - 1; for (POSITION *pos = join->positions + idx - 1; pos != pos_end; pos--) { const double fanout = pos->rows_fetched * pos->filter_effect; if (pos->table->table_ref->map() & found_ref) { found_ref |= pos->ref_depend_map; if (pos->rows_fetched > DBL_EPSILON) found *= fanout; } else if (fanout < 1.0) { found *= fanout; } } return found; JOIN元のテーブルのrowsとﬁlteredの乗算つまり、前のテーブルから渡される行数

今度は後者 Key_use *Optimize_table_order::find_best_ref( … // Check if we found full

page_read_cost（opt_costmodel.cc） double Cost_model_table::page_read_cost(double pages) const { assert(m_initialized); assert(pages >= 0.0);
const double in_mem = m_table->file->table_in_memory_estimate(); const double pages_in_mem = pages * in_mem; const double pages_on_disk = pages - pages_in_mem; assert(pages_on_disk >= 0.0); const double cost = buffer_block_read_cost(pages_in_mem) + io_block_read_cost(pages_on_disk); return cost; }

const double in_mem = m_table->file->table_in_memory_estimate(); const double pages_in_mem = pages * in_mem; const double pages_on_disk = pages - pages_in_mem; assert(pages_on_disk >= 0.0); const double cost = buffer_block_read_cost(pages_in_mem) + io_block_read_cost(pages_on_disk); return cost; } in_memはメモリ内に該当テーブルが何％のってるかを表す割合

const double in_mem = m_table->file->table_in_memory_estimate(); const double pages_in_mem = pages * in_mem; const double pages_on_disk = pages - pages_in_mem; assert(pages_on_disk >= 0.0); const double cost = buffer_block_read_cost(pages_in_mem) + io_block_read_cost(pages_on_disk); return cost; }

const double in_mem = m_table->file->table_in_memory_estimate(); const double pages_in_mem = pages * in_mem; const double pages_on_disk = pages - pages_in_mem; assert(pages_on_disk >= 0.0); const double cost = buffer_block_read_cost(pages_in_mem) + io_block_read_cost(pages_on_disk); return cost; } メモリから該当データページを取り出すコスト＋ディスクから該当データページを取り出すコスト

何の話だったっけ Key_use *Optimize_table_order::find_best_ref( … // Check if we found full

何の話だったっけ Key_use *Optimize_table_order::find_best_ref( … // Check if we found full
key if (all_key_parts_covered && !ref_or_null_part) /* use eq key */ { cur_used_keyparts = (uint)~0; if (keyinfo->flags & HA_NOSAME && ((keyinfo->flags & HA_NULL_PART_KEY) == 0 || all_key_parts_non_null)) { cur_read_cost = prev_record_reads(join, idx, table_deps) * table->cost_model()->page_read_cost(1.0); JOIN元テーブルから渡される行数×(メモリから該当データページを取り出すコスト＋ディスクから該当データページを取り出すコスト)

prefix_rowcountとrows_fetched • prefix_rowcountはJOIN先に結合する行数（今回で言うとmytableの全行数） • rows_fetchedはJOIN先のテーブルの行数（今回で言うとtypesの全行数） • これらの乗算はJOIN結果の行組み合わせの数

コスト比較の部分のソース  if (best_ref == nullptr || (scan_total_cost < best_read_cost +
cost_model->row_evaluate_cost(prefix_rowcount * rows_fetched))) { 行組み合わせ数分の評価コスト

best_access_path（sql_planner.cc） double scan_read_cost = calculate_scan_cost( tab, idx, best_ref, prefix_rowcount, found_condition,
disable_jbuf, &rows_after_filtering, &trace_access_scan); /* We estimate the cost of evaluating WHERE clause for found records as row_evaluate_cost(prefix_rowcount * rows_after_filtering). This cost plus scan_cost gives us total cost of using TABLE/INDEX/RANGE SCAN. */ const double scan_total_cost = scan_read_cost + cost_model->row_evaluate_cost(prefix_rowcount * rows_after_filtering); trace_access_scan.add("resulting_rows", rows_after_filtering); trace_access_scan.add("cost", scan_total_cost); if (best_ref == nullptr || (scan_total_cost < best_read_cost + cost_model->row_evaluate_cost(prefix_rowcount * rows_fetched))) {

page_read_cost（opt_costmodel.cc） double Optimize_table_order::calculate_scan_cost( const JOIN_TAB *tab, const uint idx, const
Key_use *best_ref, const double prefix_rowcount, const bool found_condition, const bool disable_jbuf, double *rows_after_filtering, Opt_trace_object *trace_access_scan) { ... else scan_cost = table->file->table_scan_cost(); // table scan const double single_scan_read_cost = scan_cost.total_cost(); ... const double buffer_count = 1.0 + ((double)cache_record_length(join, idx) * prefix_rowcount / (double)thd->variables.join_buff_size); scan_and_filter_cost = buffer_count * (single_scan_read_cost + cost_model->row_evaluate_cost( tab->records() - *rows_after_filtering)); ややこしわからん

分かったこと • refアクセスのエンジン側のコストは以下で見積もられる ◦ JOIN元テーブルから渡される行数×(メモリから該当データページを取り出すコスト＋ディスクから該当データページを取り出すコスト) • refアクセスのサーバー側のコストは以下で見積もられる ◦
行組み合わせの数×条件評価のコスト • その後にそれらの和とALLのコストと比較する • scanアクセスのコストのサーバー側のコストは大体同じ。エンジン側のコストはややこしそう

引き続き調べたいこと • scanのコスト計算過程 • "recheck_reason": "not_first_table"の正体

大テーブルと小テーブルのJOINのコスト計算の話

大テーブルと小テーブルのJOINのコスト計算の話

More Decks by kubo ayumu

Other Decks in Technology

Featured

Transcript