用MySQL计算中位数最简单(希望不会太慢)的方法是什么?我已经使用AVG(x)来寻找平均值,但我很难找到一个简单的方法来计算中位数。现在,我将所有的行返回到PHP,进行排序,然后选择中间的行,但是肯定有一些简单的方法可以在一个MySQL查询中完成它。

示例数据:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

对val排序得到2 2 3 4 7 8 9,因此中位数应该是4,而SELECT AVG(val) == 5。


当前回答

MariaDB / MySQL:

SELECT AVG(dd.val) as median_val
FROM (
SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
  FROM data d, (SELECT @rownum:=0) r
  WHERE d.val is NOT NULL
  -- put some where clause here
  ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );

Steve Cohen指出,在第一次传递之后,@rownum将包含总行数。这可用于确定中值,因此不需要第二次传递或连接。

此外,AVG(dd.val)和dd.row_number IN(…)用于在有偶数条记录时正确地产生中位数。推理:

SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3

最后,MariaDB 10.3.3+包含一个MEDIAN函数

其他回答

我刚刚在网上的评论中找到了另一个答案:

对于几乎所有SQL中的中位数: SELECT x.val from data x, data y GROUP BY x.val 总和(符号(1-SIGN (y.val-x.val))) = (COUNT (*) + 1) / 2

确保列有良好的索引,并且索引用于筛选和排序。与解释计划核对。

select count(*) from table --find the number of rows

计算“中值”行号。可能使用:median_row = floor(count / 2)。

然后把它从列表中挑出来:

select val from table order by val asc limit median_row,1

这将返回您想要的值的一行。

来自: http://mdb-blog.blogspot.com/2015/06/mysql-find-median-nth-element-without.html

我建议另一种方法,不用连接, 但是处理字符串

我没有用大数据表来检验, 但是中小型的桌子就可以了。

这里的好处是,它也可以通过GROUPING工作,所以它可以返回几个项目的中值。

下面是测试表的测试代码:

DROP TABLE test.test_median
CREATE TABLE test.test_median AS
SELECT 'book' AS grp, 4 AS val UNION ALL
SELECT 'book', 7 UNION ALL
SELECT 'book', 2 UNION ALL
SELECT 'book', 2 UNION ALL
SELECT 'book', 9 UNION ALL
SELECT 'book', 8 UNION ALL
SELECT 'book', 3 UNION ALL

SELECT 'note', 11 UNION ALL

SELECT 'bike', 22 UNION ALL
SELECT 'bike', 26 

求每组中位数的代码:

SELECT grp,
         SUBSTRING_INDEX( SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val), ',', COUNT(*)/2 ), ',', -1) as the_median,
         GROUP_CONCAT(val ORDER BY val) as all_vals_for_debug
FROM test.test_median
GROUP BY grp

输出:

grp | the_median| all_vals_for_debug
bike| 22        | 22,26
book| 4         | 2,2,3,4,7,8,9
note| 11        | 11

如果MySQL有ROW_NUMBER,那么MEDIAN是(受SQL Server查询的启发):

WITH Numbered AS 
(
SELECT *, COUNT(*) OVER () AS Cnt,
    ROW_NUMBER() OVER (ORDER BY val) AS RowNum
FROM yourtable
)
SELECT id, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
;

如果您有偶数个条目,则使用IN。

如果你想找到每个组的中位数,那么只需要在你的OVER子句中PARTITION BY组。

Rob

create table med(id integer);
insert into med(id) values(1);
insert into med(id) values(2);
insert into med(id) values(3);
insert into med(id) values(4);
insert into med(id) values(5);
insert into med(id) values(6);

select (MIN(count)+MAX(count))/2 from 
(select case when (select count(*) from 
med A where A.id<B.id)=(select count(*)/2 from med) OR 
(select count(*) from med A where A.id>B.id)=(select count(*)/2 
from med) then cast(B.id as float)end as count from med B) C;

 ?column? 
----------
  3.5
(1 row)

OR

select cast(avg(id) as float) from 
(select t1.id from med t1 JOIN med t2 on t1.id!= t2.id 
group by t1.id having ABS(SUM(SIGN(t1.id-t2.id)))=1) A;
set @r = 0;

select  
    case when mod(c,2)=0 then round(sum(lat_N),4)
    else round(sum(lat_N)/2,4) 
    end as Med  
from 
    (select lat_N, @r := @r+1, @r as id from station order by lat_N) A
    cross join
    (select (count(1)+1)/2 as c from station) B
where id >= floor(c) and id <=ceil(c)