您的位置：首頁 > 軟件教程 > 教程 > 淺析MySQL 8.0直方圖原理

淺析MySQL 8.0直方圖原理

來源：好特整理　|　時間：2024-05-27 09:45:47 |　閱讀：57　|　標簽： S 　 |　分享到：

本文將對直方圖概念進行介紹，借助舉例描述直方圖的使用方式，對創(chuàng)建/刪除直方圖的原理進行淺析，并通過例子說明其應(yīng)用場景。

本文分享自華為云社區(qū) 《【MySQL 技術(shù)專欄】MySQL8.0 直方圖介紹》，作者：GaussDB 數(shù)據(jù)庫。

背景

數(shù)據(jù)庫查詢優(yōu)化器負責將SQL查詢轉(zhuǎn)換為盡可能高效的執(zhí)行計劃，但因為數(shù)據(jù)環(huán)境不斷變化導(dǎo)致優(yōu)化器對查詢數(shù)據(jù)了解的不夠充足，可能無法生成最優(yōu)的執(zhí)行計劃進而影響查詢效率，因此MySQL8.0推出了直方圖(histogram)功能來解決該問題。

直方圖用于統(tǒng)計字段值的分布情況，向優(yōu)化器提供統(tǒng)計信息。利用直方圖，可以對一張表的一列數(shù)據(jù)做分布統(tǒng)計，估算where條件中過濾字段的選擇率，從而幫助優(yōu)化器更準確地估計查詢過程中的行數(shù)，選擇更高效的查詢計劃。

本文將對直方圖概念進行介紹，借助舉例描述直方圖的使用方式，對創(chuàng)建/刪除直方圖的原理進行淺析，并通過例子說明其應(yīng)用場景。

MySQL8.0直方圖介紹

數(shù)據(jù)庫中，查詢優(yōu)化器所生成執(zhí)行計劃的好壞關(guān)乎執(zhí)行耗時的多少，優(yōu)化器若是不清楚表中數(shù)據(jù)的分布情況，可能會導(dǎo)致無法生成最優(yōu)的執(zhí)行計劃，造成執(zhí)行時浪費時間。

假設(shè)一條SQL語句要查詢相等間隔的兩個不同時間段內(nèi)出行的人數(shù)，若不知道每個時間段內(nèi)的人數(shù)，優(yōu)化器會假設(shè)人數(shù)在兩個不同時間段內(nèi)是均勻分布的。如果兩個時間段內(nèi)人數(shù)相差較大，這樣優(yōu)化器估算的統(tǒng)計數(shù)據(jù)就出現(xiàn)嚴重偏差，從而可能選擇錯誤的執(zhí)行計劃。那么，如何使優(yōu)化器比較清楚地知道數(shù)據(jù)統(tǒng)計情況進而生成好的執(zhí)行計劃呢？

一種解決方法就是，在列上建立直方圖，從而近似地獲取一列上的數(shù)據(jù)分布情況。利用好直方圖，將會帶來很多方面收益：

(1)查詢優(yōu)化：提供關(guān)于數(shù)據(jù)分布的統(tǒng)計信息，幫助優(yōu)化查詢計劃，選擇合適的索引和優(yōu)化查詢語句，從而提高查詢性能；

(2)索引設(shè)計：通過分析數(shù)據(jù)的分布情況，幫助確定哪些列適合創(chuàng)建索引，以提高查詢效率；

(3)數(shù)據(jù)分析：提供數(shù)據(jù)的分布情況，幫助用戶了解數(shù)據(jù)的特征和趨勢。

直方圖分為兩類：等寬直方圖(singleton)和等高直方圖(equi-height)。等寬直方圖是每個桶保存一個值以及這個值累積頻率：

SCHEMA_NAME: xxx//庫名

TABLE_NAME: xxx//表名

COLUMN_NAME: xxx//列名

HISTOGRAM: {

"buckets":[

[

xxx, //桶中數(shù)值

xxx //取值頻率

......

"data-type":"xxx", //數(shù)據(jù)類型

"null-values":xxx, //是否有NULL值

"collation-id":xxx,

"last-updated":"xxxx-xx-xx xx:xx:xx.xxxxxx", //更新時間

"sampling-rate":xxx, //采樣率，1表示采集所有數(shù)據(jù)

"histogram-type":"singleton", //桶類型，等寬

"number-of-buckets-specified":xxx //桶數(shù)量

}

等高直方圖每個桶需要保存不同值的個數(shù)，上下限以及累積頻率等：

SCHEMA_NAME: xxx

TABLE_NAME: xxx

COLUMN_NAME: xxx

HISTOGRAM: {

"buckets":[

[

xxx, //最小值

xxx, //最大值

xxx, //桶值出現(xiàn)的頻率

xxx //桶值出現(xiàn)的次數(shù)

......

"data-type":"xxx",

"null-values":xxx,

"collation-id":xxx,

"last-updated":"xxxx-xx-xx xx:xx:xx.xxxxxx",

"sampling-rate":xxx,

"histogram-type":"equi-height", //桶類型，等高

"number-of-buckets-specified":xxx

}

MySQL8.0直方圖使用方式

創(chuàng)建和刪除直方圖時涉及analyze語句，常用語法格式為：

創(chuàng)建直方圖：

ANALYZE TABLE tbl_name UPDATE HISTOGRAM ON col_name [, col_name] ... [WITH N BUCKETS]

刪除直方圖：

ANALYZE TABLE tbl_name DROP HISTOGRAM ON col_name [, col_name] ...

具體示例：

mysql> create table t1(c1 int,c2 int,c3 int,c4 int,c5 int,c6 int,c7 int,c8 int,c9 int,c10 int,c11 int,c12 int,c13 datetime,c14 int,c15 int,c16 int,primary key(c1));

Query OK, 0 rows affected (0.01 sec)

mysql> insert into t1 values(1,2,3,4,5,6,7,8,9,10,11,12,'0000-01-01',14,15,16),(2,2,3,4,5,6,7,8,9,10,11,12,'0500-01-01',14,15,16),(3,2,3,4,5,6,7,8,9,10,11,12,'1000-01-01',14,15,16),(4,2,3,4,5,6,7,8,9,10,11,12,'1500-01-01',14,15,16),(5,2,3,4,5,6,7,8,9,10,11,12,'1500-01-01',14,15,16);

Query OK, 5 rows affected (0.00 sec)

Records: 5 Duplicates: 0 Warnings: 0

創(chuàng)建直方圖：

mysql> analyze table t1 update histogram on c13;

+---------+-----------+----------+------------------------------------------------+

+---------+-----------+----------+------------------------------------------------+

+---------+-----------+----------+------------------------------------------------+

1 row in set (0.01 sec)

查看直方圖信息：

mysql> select json_pretty(histogram)result from information_schema.column_statistics where table_name = 't1' and column_name = 'c13'\G

*************************** 1. row ***************************

result: {

"buckets": [

[

"0000-01-01 00:00:00.000000", //統(tǒng)計的列值

0.2 //統(tǒng)計的相對頻率，下同

[

"0500-01-01 00:00:00.000000",

0.4

[

"1000-01-01 00:00:00.000000",

0.6

[

"1500-01-01 00:00:00.000000",

1.0

]

"data-type": "datetime", //統(tǒng)計的數(shù)據(jù)類型

"null-values": 0.0, //NULL值的比例

"collation-id": 8, //直方圖數(shù)據(jù)的排序規(guī)則ID

"last-updated": "2023-09-30 16:05:28.533732", //最近更新直方圖的時間

"sampling-rate": 1.0, //直方圖構(gòu)建采樣率

"histogram-type": "singleton", //直方圖類型，等寬

"number-of-buckets-specified": 100 //桶數(shù)量

}

1 row in set (0.00 sec)

刪除直方圖：

mysql> analyze table t1 drop histogram on c13;

+---------+-----------+----------+------------------------------------------------+

+---------+-----------+----------+------------------------------------------------+

+---------+-----------+----------+------------------------------------------------+

1 row in set (0.00 sec)

MySQL8.0直方圖原理淺析

直方圖原理整體框架可概括為下圖所示：

直方圖代碼主要包含在sql/histograms路徑下，帶有 equi_height前綴的相關(guān)文件涉及等高直方圖，帶有 singleton前綴的相關(guān)文件涉及等寬直方圖，帶有value_map前綴的相關(guān)文件涉及保存統(tǒng)計值結(jié)構(gòu)，histogram.h/histogram.cc涉及直方圖相關(guān)調(diào)用接口。

Sql_cmd_analyze_table::handle_histogram_command為對直方圖操作的整體處理入口，目前只支持在一張表上進行直方圖相關(guān)操作。創(chuàng)建直方圖的主要調(diào)用堆棧如下所示，update_histogram為創(chuàng)建直方圖的入口。

mysql_execute_command

->Sql_cmd_analyze_table::execute

->Sql_cmd_analyze_table::handle_histogram_command

->Sql_cmd_analyze_table::update_histogram

->histograms::update_histogram

->prepare_value_maps

->fill_value_maps

->build_histogram

->store_histogram

->dd::cache::Dictionary_client::update

->dd::cache::Storage_adapter::store

->dd::Column_statistics_impl::store_attributes

->histograms::Singleton::histogram_to_json

對于創(chuàng)建流程展開描述，prepare_value_maps中主要根據(jù)直方圖列類型創(chuàng)建對應(yīng)的value_map做準備，之后利用histogram_generation_max_mem_size參數(shù)值(限制生成直方圖時所允許使用的最大內(nèi)存大小)和單行數(shù)據(jù)大小計算后控制統(tǒng)計采樣率，fill_value_maps將反復(fù)讀取數(shù)據(jù)填充到對應(yīng)類型的value_map中，key為列實際值，value為其出現(xiàn)的次數(shù)。調(diào)用build_histogram以完成對直方圖的構(gòu)建，如果桶個數(shù)(num_buckets)比不同值個數(shù)(value_map.size())要大，則自動創(chuàng)建一個等寬直方圖，否則創(chuàng)建一個等高直方圖。兩種直方圖的創(chuàng)建邏輯分別在Singleton:: build_histogram和Equi_height:: build_histogram中。

構(gòu)建直方圖完成后調(diào)用store_histogram，將結(jié)果以JSON的形式存儲在系統(tǒng)表中，通過INFORMATION_SCHEMA.COLUMN_STATISTICS對用戶呈現(xiàn)，histogram_to_json會將直方圖結(jié)果轉(zhuǎn)換為Json_object格式，例如last-updated使用Json_datetime格式保存、histogram-type使用Json_string格式保存、sampling rate使用Json_double格式保存等，再依次調(diào)用json_object->add_clone將各json類型字段保存。

刪除直方圖的主要堆棧如下所示。drop_histograms邏輯中在刪除直方圖前會先嘗試獲取以檢查對應(yīng)直方圖是否真的存在，不存在的話就提前終止邏輯，存在則刪除。

mysql_execute_command

->Sql_cmd_analyze_table::execute

->Sql_cmd_analyze_table::handle_histogram_command

->Sql_cmd_analyze_table::update_histogram

->histograms::update_histogram

MySQL8.0直方圖優(yōu)化場景

優(yōu)化方面，如本文在前所描述的直方圖作用，利用直方圖信息估算where條件中各謂詞的選擇率，幫助選擇最優(yōu)的執(zhí)行計劃。例如，表存在如下所示數(shù)據(jù)傾斜場景。

mysql> select sys_id,order_status,count(*) from my_table_1 group by sys_id,order_status order by 1,2,3;

+--------+--------------+----------+

| sys_id | order_status | count(*) |

+--------+--------------+----------+

| 3 | 1 | 1 |

| 3 | 2 | 200766 |

| 3 | 3 | 3353 |

| 3 | 4 | 1325 |

| 5 | 1 | 13 |

| 5 | 2 | 2478373 |

| 5 | 3 | 43243 |

| 5 | 4 | 13529 |

| 6 | 2 | 171388 |

| 6 | 3 | 254 |

| 6 | 4 | 716 |

+--------+--------------+----------+

執(zhí)行如下SQL語句時，因為存在數(shù)據(jù)傾斜而優(yōu)化器未能準確估計導(dǎo)致執(zhí)行計劃選擇錯誤，執(zhí)行耗時約為1.35s。

mysql> explain analyze select t1.id, t1.order_number, t1.create_time, t1.order_status from my_table_1 t1 left join my_table_2 t2 on t1.id = t2.order_id WHERE t1.sys_id = 5 and t1.order_status in (1) and t1.create_time >= '2022-09-10 00:00:00' and t1.create_time <= '2022-09-16 23:59:59' order by t1.id desc LIMIT 20\G

*************************** 1. row ***************************

EXPLAIN: -> Limit: 20 row(s) (cost=4163.10 rows=20) (actual time=1350.825..1350.825 rows=0 loops=1)

-> Nested loop left join (cost=4163.10 rows=49) (actual time=1350.825..1350.825 rows=0 loops=1)

-> Filter: ((t1.order_status = 1) and (t1.sys_id = 5) and (t1.create_time >= TIMESTAMP'2022-09-10 00:00:00') and (t1.create_time <= TIMESTAMP'2022-09-16 23:59:59')) (cost=215.79 rows=49) (actual time=1350.823..1350.823 rows=0 loops=1)

-> Index scan on t1 using PRIMARY (reverse) (cost=215.79 rows=8828) (actual time=0.088..1209.201 rows=2910194 loops=1)

-> Index lookup on t2 using idx_order_id (order_id=t1.id) (cost=0.63 rows=1) (never executed)

通過執(zhí)行ANALYZE table my_table_1 UPDATE HISTOGRAM ON order_status, sys_id, create_time語句創(chuàng)建直方圖后，再次執(zhí)行上述SQL語句時，執(zhí)行計劃中的索引發(fā)生了變化，執(zhí)行耗時為0.11s。因此可以看出，優(yōu)化器利用更準確的數(shù)據(jù)分布信息選擇了更優(yōu)的執(zhí)行計劃。

*************************** 1. row ***************************

EXPLAIN: -> Limit: 20 row(s) (cost=38385.46 rows=20) (actual time=114.217..114.217 rows=0 loops=1)

-> Nested loop left join (cost=38385.46 rows=62764) (actual time=114.216..114.216 rows=0 loops=1)

-> Sort: t1.id DESC, limit input to 20 row(s) per chunk (cost=28200.86 rows=62668) (actual time=114.215..114.215 rows=0 loops=1)

-> Filter: (t1.order_status = 1) (cost=28200.86 rows=62668) (actual time=114.207..114.207 rows=0 loops=1)

-> Index range scan on t1 using idx_sys_id_create_time, with index condition: ((t1.sys_id = 5) and (t1.create_time >= TIMESTAMP'2022-09-10 00:00:00') and (t1.create_time <= TIMESTAMP'2022-09-16 23:59:59')) (cost=28200.86 rows=62668) (actual time=0.326..112.912 rows=31142 loops=1)

-> Index lookup on t2 using idx_order_id (order_id=t1.id) (cost=0.62 rows=1) (never executed)

另外，當where條件中變量值不同時，優(yōu)化器也根據(jù)數(shù)據(jù)分布情況選擇了準確的執(zhí)行計劃，使得執(zhí)行效率提高。

mysql> explain format=tree select t1.id, t1.order_number, t1.create_time, t1.order_status from my_table_1 t1 left join my_table_2 t2 on t1.id = t2.order_id WHERE t1.sys_id = 5 and t1.order_status in (2) and t1.create_time >= '2020-10-01 00:00:00' and t1.create_time <= '2020-10-09 23:59:59' order by t1.id desc LIMIT 20\G

*************************** 1. row ***************************

EXPLAIN: -> Limit: 20 row(s) (cost=13541.27 rows=20)

-> Nested loop left join (cost=13541.27 rows=44)

-> Filter: ((t1.order_status = 2) and (t1.sys_id = 5) and (t1.create_time >= TIMESTAMP'2020-10-01 00:00:00') and (t1.create_time <= TIMESTAMP'2020-10-09 23:59:59')) (cost=15.79 rows=44)

-> Index scan on t1 using PRIMARY (reverse) (cost=15.79 rows=338)

-> Index lookup on t2 using idx_order_id (order_id=t1.id) (cost=0.25 rows=1)

1 row in set (0.00 sec)

mysql> explain format=tree select t1.id, t1.order_number, t1.create_time, t1.order_status from my_table_1 t1 left join my_table_2 t2 on t1.id = t2.order_id WHERE t1.sys_id = 5 and t1.order_status in (4) and t1.create_time >= '2020-10-01 00:00:00' and t1.create_time <= '2020-10-09 23:59:59' order by t1.id desc LIMIT 20\G

*************************** 1. row ***************************

EXPLAIN: -> Limit: 20 row(s) (cost=30559.31 rows=20)

-> Nested loop left join (cost=30559.31 rows=55852)

-> Sort: t1.id DESC, limit input to 20 row(s) per chunk (cost=24966.26 rows=55480)

-> Filter: (t1.order_status = 4) (cost=24966.26 rows=55480)

-> Index range scan on t1 using idx_sys_id_create_time, with index condition: ((t1.sys_id = 5) and (t1.create_time >= TIMESTAMP'2020-10-01 00:00:00') and (t1.create_time <= TIMESTAMP'2020-10-09 23:59:59')) (cost=24966.26 rows=55480)

-> Index lookup on t2 using idx_order_id (order_id=t1.id) (cost=0.25 rows=1)

1 row in set (0.00 sec)

所以，通過所提供的統(tǒng)計信息，幫助優(yōu)化查詢計劃進而提高查詢性能是如前所述應(yīng)用直方圖的一個收益點。

點擊關(guān)注，第一時間了解華為云新鮮技術(shù)~

小編推薦閱讀

首頁

找游戲

游戲庫

開測表

搶禮包

看攻略

手游排行榜

新聞中心

游戲中心

熱門專區(qū)

熱門頻道

小編推薦

特色欄目

抖音熱游

一刀999

絕地吃雞

沙雕游戲

BT手游

經(jīng)典街機

真人互動

淺析MySQL 8.0直方圖原理

背景

MySQL8.0直方圖介紹

MySQL8.0直方圖使用方式

MySQL8.0直方圖原理淺析

MySQL8.0直方圖優(yōu)化場景

好特網(wǎng)發(fā)布此文僅為傳遞信息，不代表好特網(wǎng)認同期限觀點或證實其描述。

相關(guān)視頻攻略

更多

同類最新

更多

熱門資訊

更多

更多

更多

首頁

找游戲

游戲庫

開測表

搶禮包

看攻略

手游排行榜

新聞中心

游戲中心

熱門專區(qū)

熱門頻道

小編推薦

特色欄目

抖音熱游

一刀999

絕地吃雞

沙雕游戲

BT手游

經(jīng)典街機

真人互動

淺析MySQL 8.0直方圖原理

背景

MySQL8.0直方圖介紹

MySQL8.0直方圖使用方式

MySQL8.0直方圖原理淺析

MySQL8.0直方圖優(yōu)化場景

好特網(wǎng)發(fā)布此文僅為傳遞信息，不代表好特網(wǎng)認同期限觀點或證實其描述。

相關(guān)視頻攻略

更多

同類最新

更多

熱門資訊

更多

更多

更多

好特網(wǎng)發(fā)布此文僅為傳遞信息，不代表好特網(wǎng)認同期限觀點或證實其描述。