我有一个表,我想获得每组的最新条目。下面是表格:

DocumentStatusLogs表

|ID| DocumentID | Status | DateCreated |
| 2| 1          | S1     | 7/29/2011   |
| 3| 1          | S2     | 7/30/2011   |
| 6| 1          | S1     | 8/02/2011   |
| 1| 2          | S1     | 7/28/2011   |
| 4| 2          | S2     | 7/30/2011   |
| 5| 2          | S3     | 8/01/2011   |
| 6| 3          | S1     | 8/02/2011   |

该表将按documententid分组,并按DateCreated降序排序。对于每个documententid,我希望获得最新的状态。

我的首选输出:

| DocumentID | Status | DateCreated |
| 1          | S1     | 8/02/2011   |
| 2          | S3     | 8/01/2011   |
| 3          | S1     | 8/02/2011   |

Is there any aggregate function to get only the top from each group? See pseudo-code GetOnlyTheTop below: SELECT DocumentID, GetOnlyTheTop(Status), GetOnlyTheTop(DateCreated) FROM DocumentStatusLogs GROUP BY DocumentID ORDER BY DateCreated DESC If such function doesn't exist, is there any way I can achieve the output I want? Or at the first place, could this be caused by unnormalized database? I'm thinking, since what I'm looking for is just one row, should that status also be located in the parent table?

更多信息请参见父表:

当前文档表

| DocumentID | Title  | Content  | DateCreated |
| 1          | TitleA | ...      | ...         |
| 2          | TitleB | ...      | ...         |
| 3          | TitleC | ...      | ...         |

父表应该是这样的,以便我可以轻松地访问它的状态吗?

| DocumentID | Title  | Content  | DateCreated | CurrentStatus |
| 1          | TitleA | ...      | ...         | s1            |
| 2          | TitleB | ...      | ...         | s3            |
| 3          | TitleC | ...      | ...         | s1            |

更新 我刚刚学会了如何使用“apply”,它可以更容易地解决这类问题。


当前回答

这是这个话题中最容易找到的问题之一,所以我想给出一个现代的答案(既供我参考,也帮助其他人)。通过使用first_value和over,你可以简化上面的查询:

Select distinct DocumentID
  , first_value(status) over (partition by DocumentID order by DateCreated Desc) as Status
  , first_value(DateCreated) over (partition by DocumentID order by DateCreated Desc) as DateCreated
From DocumentStatusLogs

这应该工作在Sql Server 2008及更高。First_value可以被认为是在使用over子句时实现Select Top 1的一种方法。Over允许在选择列表中分组,而不是编写嵌套的子查询(像许多现有的答案那样),这以一种更可读的方式进行。希望这能有所帮助。

其他回答

试试这个:

SELECT [DocumentID]
    ,[tmpRez].value('/x[2]', 'varchar(20)') AS [Status]
    ,[tmpRez].value('/x[3]', 'datetime') AS [DateCreated]
FROM (
    SELECT [DocumentID]
        ,cast('<x>' + max(cast([ID] AS VARCHAR(10)) + '</x><x>' + [Status] + '</x><x>' + cast([DateCreated] AS VARCHAR(20))) + '</x>' AS XML) AS [tmpRez]
    FROM DocumentStatusLogs
    GROUP BY DocumentID
    ) AS [tmpQry]

这里有3种不同的方法来解决这个问题,以及为每个查询建立索引的最佳选择(请自己尝试索引,并查看逻辑读取、消耗时间和执行计划。我根据自己的经验提供了关于此类查询的建议,但没有针对这个特定问题执行)。

方法1:使用ROW_NUMBER()。如果rowstore索引不能提高性能,对于具有聚合和分组的查询以及始终按不同列排序的表,可以尝试使用非聚集/聚集的columnstore索引,columnstore索引通常是最佳选择。

;WITH CTE AS
    (
       SELECT   *,
                RN = ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
       FROM     DocumentStatusLogs
    )
    SELECT  ID      
        ,DocumentID 
        ,Status     
        ,DateCreated
    FROM    CTE
    WHERE   RN = 1;

方法2:使用FIRST_VALUE。如果rowstore索引不能提高性能,对于具有聚合和分组的查询以及始终按不同列排序的表,可以尝试使用非聚集/聚集的columnstore索引,columnstore索引通常是最佳选择。

SELECT  DISTINCT
    ID      = FIRST_VALUE(ID) OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
    ,DocumentID
    ,Status     = FIRST_VALUE(Status) OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
    ,DateCreated    = FIRST_VALUE(DateCreated) OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
FROM    DocumentStatusLogs;

方法3:使用CROSS APPLY。在DocumentStatusLogs表上创建涵盖查询中使用的列的行存储索引应该足以覆盖查询,而不需要columnstore索引。

SELECT  DISTINCT
    ID      = CA.ID
    ,DocumentID = D.DocumentID
    ,Status     = CA.Status 
    ,DateCreated    = CA.DateCreated
FROM    DocumentStatusLogs D
    CROSS APPLY (
            SELECT  TOP 1 I.*
            FROM    DocumentStatusLogs I
            WHERE   I.DocumentID = D.DocumentID
            ORDER   BY I.DateCreated DESC
            ) CA;
SELECT documentid, 
       status, 
       datecreated 
FROM   documentstatuslogs dlogs 
WHERE  status = (SELECT status 
                 FROM   documentstatuslogs 
                 WHERE  documentid = dlogs.documentid 
                 ORDER  BY datecreated DESC 
                 LIMIT  1) 

CROSS APPLY是我在解决方案中使用的方法,因为它对我和客户的需求都有效。从我所读到的,应该提供最好的整体性能,如果他们的数据库大幅增长。

如果你担心性能问题,你也可以用MAX()这样做:

SELECT *
FROM DocumentStatusLogs D
WHERE DateCreated = (SELECT MAX(DateCreated) FROM DocumentStatusLogs WHERE ID = D.ID)

ROW_NUMBER()要求对SELECT语句中的所有行进行排序,而MAX则不需要。应该会大大加快你的查询速度。