根据MSDN, Median在Transact-SQL中不能作为聚合函数使用。但是,我想知道是否可以创建此功能(使用create Aggregate函数、用户定义函数或其他方法)。
最好的方法(如果可能的话)是什么——允许在聚合查询中计算中值(假设是数值数据类型)?
根据MSDN, Median在Transact-SQL中不能作为聚合函数使用。但是,我想知道是否可以创建此功能(使用create Aggregate函数、用户定义函数或其他方法)。
最好的方法(如果可能的话)是什么——允许在聚合查询中计算中值(假设是数值数据类型)?
当前回答
我只是在寻找一个基于集的中位数的解决方案时偶然发现了这一页。在研究了一些解决方案之后,我想到了以下几点。希望是有用的。
DECLARE @test TABLE(
i int identity(1,1),
id int,
score float
)
INSERT INTO @test (id,score) VALUES (1,10)
INSERT INTO @test (id,score) VALUES (1,11)
INSERT INTO @test (id,score) VALUES (1,15)
INSERT INTO @test (id,score) VALUES (1,19)
INSERT INTO @test (id,score) VALUES (1,20)
INSERT INTO @test (id,score) VALUES (2,20)
INSERT INTO @test (id,score) VALUES (2,21)
INSERT INTO @test (id,score) VALUES (2,25)
INSERT INTO @test (id,score) VALUES (2,29)
INSERT INTO @test (id,score) VALUES (2,30)
INSERT INTO @test (id,score) VALUES (3,20)
INSERT INTO @test (id,score) VALUES (3,21)
INSERT INTO @test (id,score) VALUES (3,25)
INSERT INTO @test (id,score) VALUES (3,29)
DECLARE @counts TABLE(
id int,
cnt int
)
INSERT INTO @counts (
id,
cnt
)
SELECT
id,
COUNT(*)
FROM
@test
GROUP BY
id
SELECT
drv.id,
drv.start,
AVG(t.score)
FROM
(
SELECT
MIN(t.i)-1 AS start,
t.id
FROM
@test t
GROUP BY
t.id
) drv
INNER JOIN @test t ON drv.id = t.id
INNER JOIN @counts c ON t.id = c.id
WHERE
t.i = ((c.cnt+1)/2)+drv.start
OR (
t.i = (((c.cnt+1)%2) * ((c.cnt+2)/2))+drv.start
AND ((c.cnt+1)%2) * ((c.cnt+2)/2) <> 0
)
GROUP BY
drv.id,
drv.start
其他回答
在UDF中,写:
Select Top 1 medianSortColumn from Table T
Where (Select Count(*) from Table
Where MedianSortColumn <
(Select Count(*) From Table) / 2)
Order By medianSortColumn
中找到
这是查找属性中值的最简单方法。
Select round(S.salary,4) median from employee S
where (select count(salary) from station
where salary < S.salary ) = (select count(salary) from station
where salary > S.salary)
虽然Justin grant的解决方案看起来很可靠,但我发现当您在给定的分区键中有许多重复值时,ASC重复值的行号最终会不按顺序排列,因此它们不能正确对齐。
以下是我的研究结果的一个片段:
KEY VALUE ROWA ROWD
13 2 22 182
13 1 6 183
13 1 7 184
13 1 8 185
13 1 9 186
13 1 10 187
13 1 11 188
13 1 12 189
13 0 1 190
13 0 2 191
13 0 3 192
13 0 4 193
13 0 5 194
我使用Justin的代码作为这个解决方案的基础。尽管考虑到使用多个派生表效率不高,但它确实解决了我遇到的行排序问题。任何改进都会受到欢迎,因为我在T-SQL方面不是那么有经验。
SELECT PKEY, cast(AVG(VALUE)as decimal(5,2)) as MEDIANVALUE
FROM
(
SELECT PKEY,VALUE,ROWA,ROWD,
'FLAG' = (CASE WHEN ROWA IN (ROWD,ROWD-1,ROWD+1) THEN 1 ELSE 0 END)
FROM
(
SELECT
PKEY,
cast(VALUE as decimal(5,2)) as VALUE,
ROWA,
ROW_NUMBER() OVER (PARTITION BY PKEY ORDER BY ROWA DESC) as ROWD
FROM
(
SELECT
PKEY,
VALUE,
ROW_NUMBER() OVER (PARTITION BY PKEY ORDER BY VALUE ASC,PKEY ASC ) as ROWA
FROM [MTEST]
)T1
)T2
)T3
WHERE FLAG = '1'
GROUP BY PKEY
ORDER BY PKEY
以下是我的解决方案:
with tempa as
(
select value,row_number() over (order by value) as Rn,/* Assigning a
row_number */
count(value) over () as Cnt /*Taking total count of the values */
from numbers
where value is not null /* Excluding the null values */
),
tempb as
(
/* Since we don't know whether the number of rows is odd or even, we shall
consider both the scenarios */
select round(cnt/2) as Ref from tempa where mod(cnt,2)=1
union all
select round(cnt/2) a Ref from tempa where mod(cnt,2)=0
union all
select round(cnt/2) + 1 as Ref from tempa where mod(cnt,2)=0
)
select avg(value) as Median_Value
from tempa where rn in
( select Ref from tempb);
下面的查询从一列中的值列表返回中位数。它不能作为聚合函数使用,也不能与聚合函数一起使用,但仍然可以在内部select中使用带有WHERE子句的子查询。
SQL Server 2005+:
SELECT TOP 1 value from
(
SELECT TOP 50 PERCENT value
FROM table_name
ORDER BY value
)for_median
ORDER BY value DESC