我有一个表,我想获得每组的最新条目。下面是表格:

DocumentStatusLogs表

|ID| DocumentID | Status | DateCreated |
| 2| 1          | S1     | 7/29/2011   |
| 3| 1          | S2     | 7/30/2011   |
| 6| 1          | S1     | 8/02/2011   |
| 1| 2          | S1     | 7/28/2011   |
| 4| 2          | S2     | 7/30/2011   |
| 5| 2          | S3     | 8/01/2011   |
| 6| 3          | S1     | 8/02/2011   |

该表将按documententid分组,并按DateCreated降序排序。对于每个documententid,我希望获得最新的状态。

我的首选输出:

| DocumentID | Status | DateCreated |
| 1          | S1     | 8/02/2011   |
| 2          | S3     | 8/01/2011   |
| 3          | S1     | 8/02/2011   |

Is there any aggregate function to get only the top from each group? See pseudo-code GetOnlyTheTop below: SELECT DocumentID, GetOnlyTheTop(Status), GetOnlyTheTop(DateCreated) FROM DocumentStatusLogs GROUP BY DocumentID ORDER BY DateCreated DESC If such function doesn't exist, is there any way I can achieve the output I want? Or at the first place, could this be caused by unnormalized database? I'm thinking, since what I'm looking for is just one row, should that status also be located in the parent table?

更多信息请参见父表:

当前文档表

| DocumentID | Title  | Content  | DateCreated |
| 1          | TitleA | ...      | ...         |
| 2          | TitleB | ...      | ...         |
| 3          | TitleC | ...      | ...         |

父表应该是这样的,以便我可以轻松地访问它的状态吗?

| DocumentID | Title  | Content  | DateCreated | CurrentStatus |
| 1          | TitleA | ...      | ...         | s1            |
| 2          | TitleB | ...      | ...         | s3            |
| 3          | TitleC | ...      | ...         | s1            |

更新 我刚刚学会了如何使用“apply”,它可以更容易地解决这类问题。


当前回答

如果你担心性能问题,你也可以用MAX()这样做:

SELECT *
FROM DocumentStatusLogs D
WHERE DateCreated = (SELECT MAX(DateCreated) FROM DocumentStatusLogs WHERE ID = D.ID)

ROW_NUMBER()要求对SELECT语句中的所有行进行排序,而MAX则不需要。应该会大大加快你的查询速度。

其他回答

SELECT documentid, 
       status, 
       datecreated 
FROM   documentstatuslogs dlogs 
WHERE  status = (SELECT status 
                 FROM   documentstatuslogs 
                 WHERE  documentid = dlogs.documentid 
                 ORDER  BY datecreated DESC 
                 LIMIT  1) 

I've done some timings over the various recommendations here, and the results really depend on the size of the table involved, but the most consistent solution is using the CROSS APPLY These tests were run against SQL Server 2008-R2, using a table with 6,500 records, and another (identical schema) with 137 million records. The columns being queried are part of the primary key on the table, and the table width is very small (about 30 bytes). The times are reported by SQL Server from the actual execution plan.

Query                                  Time for 6500 (ms)    Time for 137M(ms)

CROSS APPLY                                    17.9                17.9
SELECT WHERE col = (SELECT MAX(COL)…)           6.6               854.4
DENSE_RANK() OVER PARTITION                     6.6               907.1

我认为真正令人惊讶的是,无论涉及的行数是多少,CROSS APPLY的时间都是如此一致。

在你想避免使用row_count()的场景中,你也可以使用左连接:

select ds.DocumentID, ds.Status, ds.DateCreated 
from DocumentStatusLogs ds
left join DocumentStatusLogs filter 
    ON ds.DocumentID = filter.DocumentID
    -- Match any row that has another row that was created after it.
    AND ds.DateCreated < filter.DateCreated
-- then filter out any rows that matched 
where filter.DocumentID is null 

对于示例模式,你也可以使用"not in subquery",它通常编译到与左连接相同的输出:

select ds.DocumentID, ds.Status, ds.DateCreated 
from DocumentStatusLogs ds
WHERE ds.ID NOT IN (
    SELECT filter.ID 
    FROM DocumentStatusLogs filter
    WHERE ds.DocumentID = filter.DocumentID
        AND ds.DateCreated < filter.DateCreated)

注意,如果表没有至少一个单列唯一键/约束/索引(在本例中是主键“Id”),那么子查询模式将无法工作。

这两个查询往往比row_count()查询(由query Analyzer衡量)更“昂贵”。但是,您可能会遇到它们更快地返回结果或启用其他优化的情况。

我的代码从每组中选择top 1

select a.* from #DocumentStatusLogs a where 
 datecreated in( select top 1 datecreated from #DocumentStatusLogs b
where 
a.documentid = b.documentid
order by datecreated desc
)

一些数据库引擎*开始支持允许过滤窗口函数结果的qualifier子句(接受的答案使用该子句)。

所以公认的答案可以变成

SELECT *, ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
QUALIFY rn = 1

查看这篇文章以获得更深入的解释:https://jrandrews.net/the-joy-of-qualify

您可以使用此工具查看哪个数据库支持此子句:https://www.jooq.org/translate/ 当目标方言不支持qualifier子句时,可以选择转换它。

*Teradata, BigQuery, H2, Snowflake…