我创建了一个SQL命令,在9个表上使用INNER JOIN,无论如何这个命令需要很长时间(超过5分钟)。所以我的朋友建议我把INNER JOIN改为LEFT JOIN,因为LEFT JOIN的性能更好,尽管我知道。经过我的修改,查询的速度得到了显著的提高。

我想知道为什么LEFT JOIN比INNER JOIN快?

我的SQL命令如下所示: Select * from a inner join b on…内部连接c在…内部连接D等等

更新: 这是我图式的简介。

FROM sidisaleshdrmly a -- NOT HAVE PK AND FK
    INNER JOIN sidisalesdetmly b -- THIS TABLE ALSO HAVE NO PK AND FK
        ON a.CompanyCd = b.CompanyCd 
           AND a.SPRNo = b.SPRNo 
           AND a.SuffixNo = b.SuffixNo 
           AND a.dnno = b.dnno
    INNER JOIN exFSlipDet h -- PK = CompanyCd, FSlipNo, FSlipSuffix, FSlipLine
        ON a.CompanyCd = h.CompanyCd
           AND a.sprno = h.AcctSPRNo
    INNER JOIN exFSlipHdr c -- PK = CompanyCd, FSlipNo, FSlipSuffix
        ON c.CompanyCd = h.CompanyCd
           AND c.FSlipNo = h.FSlipNo 
           AND c.FSlipSuffix = h.FSlipSuffix 
    INNER JOIN coMappingExpParty d -- NO PK AND FK
        ON c.CompanyCd = d.CompanyCd
           AND c.CountryCd = d.CountryCd 
    INNER JOIN coProduct e -- PK = CompanyCd, ProductSalesCd
        ON b.CompanyCd = e.CompanyCd
           AND b.ProductSalesCd = e.ProductSalesCd 
    LEFT JOIN coUOM i -- PK = UOMId
        ON h.UOMId = i.UOMId 
    INNER JOIN coProductOldInformation j -- PK = CompanyCd, BFStatus, SpecCd
        ON a.CompanyCd = j.CompanyCd
            AND b.BFStatus = j.BFStatus
            AND b.ProductSalesCd = j.ProductSalesCd
    INNER JOIN coProductGroup1 g1 -- PK = CompanyCd, ProductCategoryCd, UsedDepartment, ProductGroup1Cd
        ON e.ProductGroup1Cd  = g1.ProductGroup1Cd
    INNER JOIN coProductGroup2 g2 -- PK = CompanyCd, ProductCategoryCd, UsedDepartment, ProductGroup2Cd
        ON e.ProductGroup1Cd  = g2.ProductGroup1Cd

当前回答

Have done a number of comparisons between left outer and inner joins and have not been able to find a consisten difference. There are many variables. Am working on a reporting database with thousands of tables many with a large number of fields, many changes over time (vendor versions and local workflow) . It is not possible to create all of the combinations of covering indexes to meet the needs of such a wide variety of queries and handle historical data. Have seen inner queries kill server performance because two large (millions to tens of millions of rows) tables are inner joined both pulling a large number of fields and no covering index exists.

但最大的问题似乎并没有出现在上面的讨论中。也许您的数据库设计良好,具有触发器和设计良好的事务处理,以确保良好的数据。我的经常在不期望的地方有NULL值。是的,表定义可以强制no- null,但在我的环境中,这不是一个选项。

所以问题是…您设计查询时是否只考虑速度?对于每分钟运行相同代码数千次的事务处理来说,这是一个更高的优先级。或者你追求左外连接所提供的准确性。请记住,内部连接必须在两边都找到匹配项,因此意外的NULL不仅会从两个表中删除数据,而且可能会删除整行信息。它发生得很好,没有错误消息。

您可以非常快速地获得所需数据的90%,而不会发现内部连接已经无声地删除了信息。有时内部连接可以更快,但我不相信任何人做出这样的假设,除非他们已经审查了执行计划。速度固然重要,但准确更重要。

其他回答

Have done a number of comparisons between left outer and inner joins and have not been able to find a consisten difference. There are many variables. Am working on a reporting database with thousands of tables many with a large number of fields, many changes over time (vendor versions and local workflow) . It is not possible to create all of the combinations of covering indexes to meet the needs of such a wide variety of queries and handle historical data. Have seen inner queries kill server performance because two large (millions to tens of millions of rows) tables are inner joined both pulling a large number of fields and no covering index exists.

但最大的问题似乎并没有出现在上面的讨论中。也许您的数据库设计良好,具有触发器和设计良好的事务处理,以确保良好的数据。我的经常在不期望的地方有NULL值。是的,表定义可以强制no- null,但在我的环境中,这不是一个选项。

所以问题是…您设计查询时是否只考虑速度?对于每分钟运行相同代码数千次的事务处理来说,这是一个更高的优先级。或者你追求左外连接所提供的准确性。请记住,内部连接必须在两边都找到匹配项,因此意外的NULL不仅会从两个表中删除数据,而且可能会删除整行信息。它发生得很好,没有错误消息。

您可以非常快速地获得所需数据的90%,而不会发现内部连接已经无声地删除了信息。有时内部连接可以更快,但我不相信任何人做出这样的假设,除非他们已经审查了执行计划。速度固然重要,但准确更重要。

您的性能问题更有可能是由于您正在执行的连接数量以及您正在连接的列是否具有索引。

在最坏的情况下,您可能很容易为每个连接执行9个整表扫描。

我在SQL server中发现了一些有趣的东西,当检查内部连接是否比左连接更快时。

如果你不包括左连接表的项,在选择语句中,左连接将比使用内连接的相同查询更快。

如果在选择语句中包含左连接表,则具有相同查询的内部连接与左连接相同或更快。

如果一切都按照它应该的方式工作,那就不应该,但是我们都知道一切都不是按照它应该的方式工作,特别是当涉及到查询优化器、查询计划缓存和统计时。

首先,我建议重新构建索引和统计数据,然后清除查询计划缓存,以确保不会搞砸事情。然而,即使这样做了,我也遇到了问题。

我经历过一些左连接比内连接更快的情况。

The underlying reason is this: If you have two tables and you join on a column with an index (on both tables). The inner join will produce the same result no matter if you loop over the entries in the index on table one and match with index on table two as if you would do the reverse: Loop over entries in the index on table two and match with index in table one. The problem is when you have misleading statistics, the query optimizer will use the statistics of the index to find the table with least matching entries (based on your other criteria). If you have two tables with 1 million in each, in table one you have 10 rows matching and in table two you have 100000 rows matching. The best way would be to do an index scan on table one and matching 10 times in table two. The reverse would be an index scan that loops over 100000 rows and tries to match 100000 times and only 10 succeed. So if the statistics isn't correct the optimizer might choose the wrong table and index to loop over.

如果优化器选择按照左连接的编写顺序优化它,那么它将比内部连接执行得更好。

但是,优化器也可以将左连接次优化为左半连接。要让它选择你想要的,你可以使用强制顺序提示。

通过比较,我发现他们有完全相同的执行计划。有三种情况:

If and when they return the same results, they have the same speed. However, we must keep in mind that they are not the same queries, and that LEFT JOIN will possibly return more results (when some ON conditions aren't met) --- this is why it's usually slower. When the main table (first non-const one in the execution plan) has a restrictive condition (WHERE id = ?) and the corresponding ON condition is on a NULL value, the "right" table is not joined --- this is when LEFT JOIN is faster. As discussed in Point 1, usually INNER JOIN is more restrictive and returns fewer results and is therefore faster.

两者都使用(相同的)索引。