我是一个老派的MySQL用户,总是更喜欢JOIN而不是子查询。但是现在每个人都用子查询,我讨厌它;我不知道为什么。

我缺乏理论知识来判断是否有任何不同。子查询是否与JOIN一样好,因此没有什么可担心的?


当前回答

根据我的观察,就像两种情况,如果一个表的记录少于10万条,那么连接将工作得很快。

但是如果一个表有超过100,000条记录,那么子查询是最好的结果。

我有一个表,其中有500,000条记录,我在查询下面创建了它,它的结果时间是

SELECT * 
FROM crv.workorder_details wd 
inner join  crv.workorder wr on wr.workorder_id = wd.workorder_id;

结果:13.3秒

select * 
from crv.workorder_details 
where workorder_id in (select workorder_id from crv.workorder)

结果:1.65秒

其他回答

只有当第二个连接表的数据明显多于主表时,才会出现这种差异。我有过这样的经历……

我们有一个用户表,有10万个条目,他们的会员数据(友谊)大约有30万个条目。这是一个join语句,目的是获取好友及其数据,但有很大的延迟。但是当成员表中只有少量数据时,它工作得很好。一旦我们将其更改为使用子查询,它就可以正常工作。

但与此同时,连接查询正在处理比主表拥有更少条目的其他表。

所以我认为连接和子查询语句工作得很好,这取决于数据和情况。

我只是在考虑同样的问题,但我在FROM部分使用子查询。 我需要连接和查询大表,“从”表有2800万条记录,但结果只有128个这样小的结果大数据!我在它上面使用MAX()函数。

首先,我使用LEFT JOIN,因为我认为这是正确的方式,mysql可以优化等。 第二次只是为了测试,我重写了针对JOIN的子选择。

LEFT JOIN运行时:1.12s SUB-SELECT运行时间:0.06秒

子选择比连接快18倍!只是在chokito广告。subselect看起来很糟糕,但结果…

MySQL版本:5.5.28-0ubuntu0.12.04.2-log

在我的印象中,JOIN总是比MySQL中的子查询更好,但EXPLAIN是更好的判断方式。下面是一个子查询比join更好的例子。

这是我的查询与3个子查询:

EXPLAIN SELECT vrl.list_id,vrl.ontology_id,vrl.position,l.name AS list_name, vrlih.position AS previous_position, vrl.moved_date 
FROM `vote-ranked-listory` vrl 
INNER JOIN lists l ON l.list_id = vrl.list_id 
INNER JOIN `vote-ranked-list-item-history` vrlih ON vrl.list_id = vrlih.list_id AND vrl.ontology_id=vrlih.ontology_id AND vrlih.type='PREVIOUS_POSITION' 
INNER JOIN list_burial_state lbs ON lbs.list_id = vrl.list_id AND lbs.burial_score < 0.5 
WHERE vrl.position <= 15 AND l.status='ACTIVE' AND l.is_public=1 AND vrl.ontology_id < 1000000000 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=43) IS NULL 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=55) IS NULL 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=246403) IS NOT NULL 
ORDER BY vrl.moved_date DESC LIMIT 200;

解释说明:

+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+
| id | select_type        | table    | type   | possible_keys                                       | key          | key_len | ref                                             | rows | Extra                    |
+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+
|  1 | PRIMARY            | vrl      | index  | PRIMARY                                             | moved_date   | 8       | NULL                                            |  200 | Using where              |
|  1 | PRIMARY            | l        | eq_ref | PRIMARY,status,ispublic,idx_lookup,is_public_status | PRIMARY      | 4       | ranker.vrl.list_id                              |    1 | Using where              |
|  1 | PRIMARY            | vrlih    | eq_ref | PRIMARY                                             | PRIMARY      | 9       | ranker.vrl.list_id,ranker.vrl.ontology_id,const |    1 | Using where              |
|  1 | PRIMARY            | lbs      | eq_ref | PRIMARY,idx_list_burial_state,burial_score          | PRIMARY      | 4       | ranker.vrl.list_id                              |    1 | Using where              |
|  4 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
|  3 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+

使用join的相同查询是:

EXPLAIN SELECT vrl.list_id,vrl.ontology_id,vrl.position,l.name AS list_name, vrlih.position AS previous_position, vrl.moved_date 
FROM `vote-ranked-listory` vrl 
INNER JOIN lists l ON l.list_id = vrl.list_id 
INNER JOIN `vote-ranked-list-item-history` vrlih ON vrl.list_id = vrlih.list_id AND vrl.ontology_id=vrlih.ontology_id AND vrlih.type='PREVIOUS_POSITION' 
INNER JOIN list_burial_state lbs ON lbs.list_id = vrl.list_id AND lbs.burial_score < 0.5 
LEFT JOIN list_tag lt1 ON lt1.list_id = vrl.list_id AND lt1.tag_id = 43 
LEFT JOIN list_tag lt2 ON lt2.list_id = vrl.list_id AND lt2.tag_id = 55 
INNER JOIN list_tag lt3 ON lt3.list_id = vrl.list_id AND lt3.tag_id = 246403 
WHERE vrl.position <= 15 AND l.status='ACTIVE' AND l.is_public=1 AND vrl.ontology_id < 1000000000 
AND lt1.list_id IS NULL AND lt2.tag_id IS NULL 
ORDER BY vrl.moved_date DESC LIMIT 200;

输出为:

+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys                                       | key          | key_len | ref                                         | rows | Extra                                        |
+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | lt3   | ref    | list_tag_key,list_id,tag_id                         | tag_id       | 5       | const                                       | 2386 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | l     | eq_ref | PRIMARY,status,ispublic,idx_lookup,is_public_status | PRIMARY      | 4       | ranker.lt3.list_id                          |    1 | Using where                                  |
|  1 | SIMPLE      | vrlih | ref    | PRIMARY                                             | PRIMARY      | 4       | ranker.lt3.list_id                          |  103 | Using where                                  |
|  1 | SIMPLE      | vrl   | ref    | PRIMARY                                             | PRIMARY      | 8       | ranker.lt3.list_id,ranker.vrlih.ontology_id |   65 | Using where                                  |
|  1 | SIMPLE      | lt1   | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.lt3.list_id,const                    |    1 | Using where; Using index; Not exists         |
|  1 | SIMPLE      | lbs   | eq_ref | PRIMARY,idx_list_burial_state,burial_score          | PRIMARY      | 4       | ranker.vrl.list_id                          |    1 | Using where                                  |
|  1 | SIMPLE      | lt2   | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.lt3.list_id,const                    |    1 | Using where; Using index                     |
+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+

rows列的比较表明了差异,使用join的查询使用的是using temporary;使用filesort。

当然,当我运行这两个查询时,第一个查询在0.02秒内完成,第二个查询甚至在1分钟后都没有完成,所以EXPLAIN正确地解释了这些查询。

如果我在list_tag表上没有INNER JOIN,即如果我删除

AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=246403) IS NOT NULL  

从第一个查询和相应的:

INNER JOIN list_tag lt3 ON lt3.list_id = vrl.list_id AND lt3.tag_id = 246403

从第二个查询开始,那么EXPLAIN为两个查询返回相同的行数,并且这两个查询的运行速度相同。

这取决于几个因素,包括正在运行的特定查询、数据库中的数据量。子查询首先运行内部查询,然后从结果集中再次过滤出实际结果。而在join中运行并产生结果。

最好的策略是同时测试连接解决方案和子查询解决方案,以获得优化的解决方案。

MSDN文档SQL Server说

Many Transact-SQL statements that include subqueries can be alternatively formulated as joins. Other questions can be posed only with subqueries. In Transact-SQL, there is usually no performance difference between a statement that includes a subquery and a semantically equivalent version that does not. However, in some cases where existence must be checked, a join yields better performance. Otherwise, the nested query must be processed for each result of the outer query to ensure elimination of duplicates. In such cases, a join approach would yield better results.

所以如果你需要

select * from t1 where exists select * from t2 where t2.parent=t1.id

尝试使用join代替。在其他情况下,这没有什么区别。

我说:为子查询创建函数可以消除混乱的问题,并允许您为子查询实现额外的逻辑。因此,我建议尽可能为子查询创建函数。

代码中的混乱是一个大问题,几十年来业界一直在努力避免它。