用MySQL计算中位数最简单(希望不会太慢)的方法是什么?我已经使用AVG(x)来寻找平均值,但我很难找到一个简单的方法来计算中位数。现在,我将所有的行返回到PHP,进行排序,然后选择中间的行,但是肯定有一些简单的方法可以在一个MySQL查询中完成它。

示例数据:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

对val排序得到2 2 3 4 7 8 9,因此中位数应该是4,而SELECT AVG(val) == 5。


当前回答

下面的SQL代码将帮助您使用用户定义的变量来计算MySQL中的中位数。

create table employees(salary int); insert into employees values(8); insert into employees values(23); insert into employees values(45); insert into employees values(123); insert into employees values(93); insert into employees values(2342); insert into employees values(2238); select * from employees; Select salary from employees order by salary; set @rowid=0; set @cnt=(select count(*) from employees); set @middle_no=ceil(@cnt/2); set @odd_even=null; select AVG(salary) from (select salary,@rowid:=@rowid+1 as rid, (CASE WHEN(mod(@cnt,2)=0) THEN @odd_even:=1 ELSE @odd_even:=0 END) as odd_even_status from employees order by salary) as tbl where tbl.rid=@middle_no or tbl.rid=(@middle_no+@odd_even);

如果你正在寻找详细的解释,请参考这个博客。

其他回答

create table med(id integer);
insert into med(id) values(1);
insert into med(id) values(2);
insert into med(id) values(3);
insert into med(id) values(4);
insert into med(id) values(5);
insert into med(id) values(6);

select (MIN(count)+MAX(count))/2 from 
(select case when (select count(*) from 
med A where A.id<B.id)=(select count(*)/2 from med) OR 
(select count(*) from med A where A.id>B.id)=(select count(*)/2 
from med) then cast(B.id as float)end as count from med B) C;

 ?column? 
----------
  3.5
(1 row)

OR

select cast(avg(id) as float) from 
(select t1.id from med t1 JOIN med t2 on t1.id!= t2.id 
group by t1.id having ABS(SUM(SIGN(t1.id-t2.id)))=1) A;

这种方法似乎包括偶数和奇数计数,没有子查询。

SELECT AVG(t1.x)
FROM table t1, table t2
GROUP BY t1.x
HAVING SUM(SIGN(t1.x - t2.x)) = 0

我使用下表的解决方案在MySQL:

CREATE TABLE transactions (
  transaction_id int , user_id int , merchant_name varchar(255), transaction_date date , amount int
);

INSERT INTO transactions (transaction_id, user_id, merchant_name, transaction_date, amount)  
VALUES (1, 1 ,'abc', '2015-08-17', 100),(2, 2, 'ced', '2015-2-17', 100),(3, 1, 'def', '2015-2-16', 121),
(4, 1 ,'ced', '2015-3-17', 110),(5, 1, 'ced', '2015-3-17', 150),(6, 2 ,'abc', '2015-4-17', 130), 
(7, 3 ,'ced', '2015-12-17', 10),(8, 3 ,'abc', '2015-8-17', 100),(9, 2 ,'abc', '2015-12-17', 140),(10, 1,'abc', '2015-9-17', 100),
(11, 1 ,'abc', '2015-08-17', 121),(12, 2 ,'ced', '2015-12-23', 130),(13, 1 ,'def', '2015-12-23', 13),(3, 4, 'abc', '2015-2-16', 120),(3, 4, 'def', '2015-2-16', 121),(3, 4, 'ced', '2015-2-16', 121);

计算“金额”列的中位数:

WITH Numbered AS 
(
SELECT *, COUNT(*) OVER () AS TotatRecords,
    ROW_NUMBER() OVER (ORDER BY amount) AS RowNum
FROM transactions
)
SELECT Avg(amount)
FROM Numbered
WHERE RowNum IN ( FLOOR((TotatRecords+1)/2), FLOOR((TotatRecords+2)/2) )
;

TotalRecords = 16 and Median = 120.5000

此查询将适用于两种情况,即偶数和奇数记录。

我刚刚在网上的评论中找到了另一个答案:

对于几乎所有SQL中的中位数: SELECT x.val from data x, data y GROUP BY x.val 总和(符号(1-SIGN (y.val-x.val))) = (COUNT (*) + 1) / 2

确保列有良好的索引,并且索引用于筛选和排序。与解释计划核对。

select count(*) from table --find the number of rows

计算“中值”行号。可能使用:median_row = floor(count / 2)。

然后把它从列表中挑出来:

select val from table order by val asc limit median_row,1

这将返回您想要的值的一行。

基于@bob的回答,这将查询泛化为能够返回多个中位数,并按某些标准分组。

想想,例如,一个车场二手车的中位数销售价格,按年-月分组。

SELECT 
    period, 
    AVG(middle_values) AS 'median' 
FROM (
    SELECT t1.sale_price AS 'middle_values', t1.row_num, t1.period, t2.count
    FROM (
        SELECT 
            @last_period:=@period AS 'last_period',
            @period:=DATE_FORMAT(sale_date, '%Y-%m') AS 'period',
            IF (@period<>@last_period, @row:=1, @row:=@row+1) as `row_num`, 
            x.sale_price
          FROM listings AS x, (SELECT @row:=0) AS r
          WHERE 1
            -- where criteria goes here
          ORDER BY DATE_FORMAT(sale_date, '%Y%m'), x.sale_price
        ) AS t1
    LEFT JOIN (  
          SELECT COUNT(*) as 'count', DATE_FORMAT(sale_date, '%Y-%m') AS 'period'
          FROM listings x
          WHERE 1
            -- same where criteria goes here
          GROUP BY DATE_FORMAT(sale_date, '%Y%m')
        ) AS t2
        ON t1.period = t2.period
    ) AS t3
WHERE 
    row_num >= (count/2) 
    AND row_num <= ((count/2) + 1)
GROUP BY t3.period
ORDER BY t3.period;