MySql 字符集不同导致 left join 慢查询的问题解决！

admin • 2024年6月20日 17:41 • SQL技巧

MySql 字符集不同导致 left join 慢查询的问题解决！

当两个表的字符集不一样,在使用字符型字段进行表连接查询时,就需要特别注意下查询耗时是否符合预期,本文主要介绍了MySql 字符集不同导致 left join 慢查询的问题解决,感兴趣的可以了解一下。

在 MySql 建表时候一般会指定字符集，大多数情况下为了更好的兼容性无脑选了 utf8mb4。但是有时会因为选错，或历史遗留问题，导致使用了 utf8 字符集。当两个表的字符集不一样，在使用字符型字段进行表连接查询时，就需要特别注意下查询耗时是否符合预期。

有次使用 left join 写一个 SQL，发现用时明显超过预期，经过一顿折腾才发现是两个表字符集不一样，特此记录一下。

问题分析

mysql> SELECT COUNT( *) from app_bind_rel t left join app_config_control_sn p on t.host_sn = p.host_sn ;
+-----------+
| COUNT( *) |
+-----------+
|     13447 |
+-----------+
1 row in set (0.89 sec)

例如上面的 SQL，左表 1W 条数据，右表 400 多条数据，在 host_sn 字段上都有索引，查询竟然用了近 900ms，怎么会这么慢？

mysql> explain SELECT COUNT( *) from app_bind_rel t left join app_config_control_sn p on t.host_sn = p.host_sn ;
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-----------------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key         | key_len | ref  | rows  | filtered | Extra                                                           |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-----------------------------------------------------------------+
|  1 | SIMPLE      | t     | NULL       | index | NULL          | idx_host_sn | 122     | NULL | 10791 |   100.00 | Using index                                                     |
|  1 | SIMPLE      | p     | NULL       | index | NULL          | idx_host_sn | 152     | NULL |   457 |   100.00 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-----------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

查看下执行计划，的确是使用了索引，但是细看 Extra 列发现较正常的连表查询多了“Using join buffer (Block Nested Loop)”这一信息，这个具体是什么意思我们后面再说。
然后我们再看下详细的执行计划，使用 explain formart=json。

{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "988640.52"
},
"nested_loop": [
{
"table": {
"table_name": "t",
"access_type": "index",
"key": "idx_host_sn",
"used_key_parts": [
"host_sn"
],
"key_length": "122",
"rows_examined_per_scan": 10791,
"rows_produced_per_join": 10791,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "161.00",
"eval_cost": "2158.20",
"prefix_cost": "2319.20",
"data_read_per_join": "2M"
},
"used_columns": [
"host_sn"
]
}
},
{
"table": {
"table_name": "p",
"access_type": "index",
"key": "idx_host_sn",
"used_key_parts": [
"host_sn"
],
"key_length": "152",
"rows_examined_per_scan": 457,
"rows_produced_per_join": 4931487,
"filtered": "100.00",
"using_index": true,
"using_join_buffer": "Block Nested Loop",
"cost_info": {
"read_cost": "23.92",
"eval_cost": "986297.40",
"prefix_cost": "988640.52",
"data_read_per_join": "865M"
},
"used_columns": [
"host_sn"
],
"attached_condition": "<if>(is_not_null_compl(p), (`db0`.`t`.`host_sn` = convert(`db0`.`p`.`host_sn` using utf8mb4)), true)"
}
}
]
}
}

特别需要关注的是这一对 KV

"attached_condition": "<if>(is_not_null_compl(p), (`collection_bullet_0000`.`t`.`host_sn` = convert(`collection_bullet_0000`.`p`.`host_sn` using utf8mb4)), true)"

看字面意思就是当 p 表不为空的时候，执行表连接需要先将 p 表的 host_sn 字段转变为 utf8mb4 字符集。我们应该都知道在表连接中使用了函数的话，是无法使用索引的。
所以再回到上面我看到的“Using join buffer (Block Nested Loop)”问题，来解释下这是一个什么过程。

Nested-Loop Join

MySql 官网对 Nested-Loop Join 有做过解释，其实做开发的同学看到名字就大体知道是啥，不就是循环嵌套嘛。

MySql 中分为 Nested-Loop Join 算法跟 Block Nested-Loop Join 算法。

例如，有如下三个表，t1、t2、t3 使用了这三种 join type。

Table Join Type
t1 range
t2 ref
t3 ALL

当使用 Nested-Loop Join 算法时，其 join 过程如下所示，其实就是简单的三层循环。

for each row in t1 matching range {
for each row in t2 matching reference key {
for each row in t3 {
if row satisfies join conditions, send to client
}
}
}

Block Nested-Loop Join（BNL）算法是对 Nested-Loop Join 算法的一种优化。BNL 算法缓冲外部循环中读取的行来减少内部循环中读取表的次数。例如，将 10 行数据读取到缓冲器中，并且将缓冲器传递到下一个循环内部，内部循环中读取的每一行与缓冲器中的所有 10 行进行比较。这将使读取内部表的次数减少一个数量级。

for each row in t1 matching range {
for each row in t2 matching reference key {
store used columns from t1, t2 in join buffer
if buffer is full {
for each row in t3 {
for each t1, t2 combination in join buffer {
if row satisfies join conditions, send to client
}
}
empty join buffer
}
}
}
if buffer is not empty {
for each row in t3 {
for each t1, t2 combination in join buffer {
if row satisfies join conditions, send to client
}
}
}

算法实现如上，只有当 “join buffer” 满的时候才会触发 t3 表的读取，如果 “join buffer” 的 size = 10 那么就可以减少 10 倍的 t3 表被读取次数，从内存中读取数据的效率显然要比从磁盘读取的效率高的多。从而提升 join 的效率。

但其实再好的优化毕竟也是嵌套循环，做开发的同学应该都知道 O(N²) 的时间复杂度是无法接受的。这也是我们这个查询这么慢的根因。

解决办法

解决办法其实很简单，修改右表的字符集就可以解决。

在变更数据集之前我们先用 show table status 查看下当前表的状态。

mysql> show table status like 'app_config_control_sn';
+-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+
| Name                  | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation       | Checksum | Create_options | Comment |
+-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+
| app_config_control_sn | InnoDB |      10 | Dynamic    |  457 |            143 |       65536 |               0 |        32768 |         0 |           1041 | 2023-04-17 03:25:45 | 2023-04-17 03:27:24 | NULL       | utf8_general_ci |     NULL |                | SN      |
+-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+
1 row in set (0.00 sec)

接着使用如下命令变更表的字符集。

mysql> ALTER TABLE app_config_control_sn CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
Query OK, 457 rows affected (0.09 sec)
Records: 457  Duplicates: 0  Warnings: 0

再次使用 show table status 命令查看下表的状态。

mysql> show table status like 'app_config_control_sn';
+-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
| Name                  | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation          | Checksum | Create_options | Comment |
+-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
| app_config_control_sn | InnoDB |      10 | Dynamic    |  457 |            143 |       65536 |               0 |        32768 |         0 |           1041 | 2023-04-17 03:50:11 | 2023-04-17 03:50:11 | NULL       | utf8mb4_general_ci |     NULL |                | SN      |
+-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
1 row in set (0.01 sec)

可以看到表的字符集已经发生了变化，那我们再次执行开始的 SQL 及 explain 语句，确认下问题是否已经解决。

mysql> SELECT COUNT( *) from app_bind_rel t left join app_config_control_sn p on t.host_sn = p.host_sn ;
+-----------+
| COUNT( *) |
+-----------+
|     13447 |
+-----------+
1 row in set (0.03 sec)
mysql> explain SELECT COUNT( *) from app_bind_rel t left join app_config_control_sn p on t.host_sn = p.host_sn ;
+----+-------------+-------+------------+-------+---------------+-------------+---------+---------------+-------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key         | key_len | ref           | rows  | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+-------------+---------+---------------+-------+----------+--------------------------+
|  1 | SIMPLE      | t     | NULL       | index | NULL          | idx_host_sn | 122     | NULL          | 10791 |   100.00 | Using index              |
|  1 | SIMPLE      | p     | NULL       | ref   | idx_host_sn   | idx_host_sn | 202     | db0.t.host_sn |     2 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-------------+---------+---------------+-------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)

可以看到耗时已经只需要 30ms 左右，这个就比较符合预期，而在执行计划中也不再会有“Using join buffer (Block Nested Loop)”信息。

其他

mysql> SELECT COUNT( *) from app_bind_rel t join app_config_control_sn p on t.host_sn = p.host_sn ;
+-----------+
| COUNT( *) |
+-----------+
|       730 |
+-----------+
1 row in set (0.01 sec)

在没有变更字符集之前，当我们将 left join 修改为 join 的时候会发现耗时减少了 100 倍，只用了 10 ms，这是为什么呢？

mysql> explain SELECT COUNT( *) from app_bind_rel t join app_config_control_sn p on t.host_sn = p.host_sn ;
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key         | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | p     | NULL       | index | NULL          | idx_host_sn | 152     | NULL |  457 |   100.00 | Using index              |
|  1 | SIMPLE      | t     | NULL       | ref   | idx_host_sn   | idx_host_sn | 122     | func |    1 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)

查看执行计划，发现使用 join 的时候不会有 “Using join buffer (Block Nested Loop)”。再细看执行计划，发现驱动表已经由 t 表变为了 p 表。

{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "643.80"
},
"nested_loop": [
{
"table": {
"table_name": "p",
"access_type": "index",
"key": "idx_host_sn",
"used_key_parts": [
"host_sn"
],
"key_length": "152",
"rows_examined_per_scan": 457,
"rows_produced_per_join": 457,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "4.00",
"eval_cost": "91.40",
"prefix_cost": "95.40",
"data_read_per_join": "82K"
},
"used_columns": [
"host_sn"
]
}
},
{
"table": {
"table_name": "t",
"access_type": "ref",
"possible_keys": [
"idx_host_sn"
],
"key": "idx_host_sn",
"used_key_parts": [
"host_sn"
],
"key_length": "122",
"ref": [
"func"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 457,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "457.00",
"eval_cost": "91.40",
"prefix_cost": "643.80",
"data_read_per_join": "117K"
},
"used_columns": [
"host_sn"
],
"attached_condition": "(`db0`.`t`.`host_sn` = convert(`db0`.`p`.`host_sn` using utf8mb4))"
}
}
]
}
}

查看详细的执行计划，可以看到

"attached_condition": "(`collection_bullet_0000`.`t`.`host_sn` = convert(`collection_bullet_0000`.`p`.`host_sn` using utf8mb4))"

这对 KV 依然是存在的，但是 "using_join_buffer": "Block Nested Loop" 已经不存在了。这个其实主要是因为当 p 表变为驱动表的时候，会先将自己的 host_sn 字段转为 utf8mb4 字符集，再与 t 表进行关联。t 表由于本来就是 utf8mb4 字符集且存在索引，就可以正常走数据库索引了，所以查询耗时也就大大降低。而使用 left join 时候，t 表作为驱动表是无法优化改变的。

可见在表连接中即使使用了函数也不一定就没法走索引，关键还是要看用法及明确处理过程。
记得刚学习数据库的时候，老师还特别强调驱动表一定要写在左边，而随着数据库技术的不断迭代发展，数据库已经能更智能的自动帮我们优化处理过程，之前很多的数据库规则也不需要了。

学习资料见知识星球。

以上就是今天要分享的技巧，你学会了吗？若有什么问题，欢迎在下方留言。

快来试试吧，小琥 my21ke007。获取 1000个免费 Excel模板福利！

更多技巧， www.excelbook.cn

欢迎加入 零售创新 知识星球，知识星球主要以数据分析、报告分享、数据工具讨论为主；

你将获得：

1、价值上万元的专业的PPT报告模板。

2、专业案例分析和解读笔记。

3、实用的Excel、Word、PPT技巧。

4、VIP讨论群，共享资源。

5、优惠的会员商品。

6、一次付费只需129元，即可下载本站文章涉及的文件和软件。

文章版权声明 1、本网站名称：Excelbook
2、本站永久网址：http://www.excelbook.cn
3、本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长王小琥进行删除处理。
4、本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报。
6、本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。

THE END

二维码

Python中GPU计算的库pycuda的使用！

< <上一篇

小众软件，FixWin 11 – 适用于Windows 10/11的修复工具！

下一篇>>