用MySQL计算中位数最简单(希望不会太慢)的方法是什么?我已经使用AVG(x)来寻找平均值,但我很难找到一个简单的方法来计算中位数。现在,我将所有的行返回到PHP,进行排序,然后选择中间的行,但是肯定有一些简单的方法可以在一个MySQL查询中完成它。

示例数据:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

对val排序得到2 2 3 4 7 8 9,因此中位数应该是4,而SELECT AVG(val) == 5。


当前回答

对于一个表站和列lat_n,下面是MySQL代码来获得中位数:

set @rows := (select count(1) from station);
set @v1 := 0;
set @sql1 := concat('select lat_n into @v1 from station order by lat_n asc limit 1 offset ', ceil(@rows/2) - 1);
prepare statement1 from @sql1;
execute statement1;
set @v2 := 0;
set @sql2 := concat('select lat_n into @v2 from station order by lat_n asc limit 1 offset ', ceil((@rows + 1)/2) - 1);
prepare statement2 from @sql2;
execute statement2;
select (@v1 + @v2)/2;

其他回答

另一个对Velcrow答案的重复,但使用了一个中间表,并利用了用于行编号的变量来获得计数,而不是执行额外的查询来计算它。还开始计数,以便第一行是第0行,以便简单地使用Floor和Ceil选择中位数行。

SELECT Avg(tmp.val) as median_val
    FROM (SELECT inTab.val, @rows := @rows + 1 as rowNum
              FROM data as inTab,  (SELECT @rows := -1) as init
              -- Replace with better where clause or delete
              WHERE 2 > 1
              ORDER BY inTab.val) as tmp
    WHERE tmp.rowNum in (Floor(@rows / 2), Ceil(@rows / 2));

这种方法似乎包括偶数和奇数计数,没有子查询。

SELECT AVG(t1.x)
FROM table t1, table t2
GROUP BY t1.x
HAVING SUM(SIGN(t1.x - t2.x)) = 0

您可以使用窗口函数row_number()来回答查询以找到介质

select val 
from (select val, row_number() over (order by val) as rownumber, x.cnt 
from data, (select count(*) as cnt from data) x) abc
where rownumber=ceil(cnt/2);

如果这是MySQL,现在有窗口函数,你可以这样做(假设你想四舍五入到最接近的整数-否则只需将round替换为CEIL或FLOOR或其他什么)。下面的解决方案适用于表,无论表的行数是偶数还是奇数:


WITH CTE AS (
    SELECT val,
            ROW_NUMBER() OVER (ORDER BY val ASC) AS rn,
            COUNT(*) OVER () AS total_count
    FROM data
)
SELECT ROUND(AVG(val)) AS median
FROM CTE
WHERE
    rn BETWEEN
    total_count / 2.0 AND
    total_count / 2.0 + 1;

I think some of the more recent answers on this thread were already getting at this approach, but it also seemed like people were overthinking it, so consider this an improved version. Regardless of SQL flavor, there is no reason anyone should be writing a huge paragraph of code with multiple subqueries just to get the median in 2021. However, please note that the above query only works if you're asked to find the median for a continuous series. Of course, regardless of row number, sometimes people do make a distinction between what is referred to as the Discrete Median and what is referred to as the Interpolated Median for a continuous series.

如果你被要求为一个离散级数找到中位数,而表的行数是偶数,那么上面的解决方案就不适合你,你应该恢复使用其他解决方案之一,比如TheJacobTaylor的。

下面的第二个解决方案是对TheJacobTaylor的稍微修改的版本,其中我显式地声明了CROSS JOIN。这个方法也适用于行数为奇数的表,不管你是被要求求连续序列的中位数还是离散序列的中位数,但我特别会在被要求求离散序列的中位数时使用这个方法。否则,使用第一种解决方案。这样,您就永远不必考虑数据是包含“偶数”还是“奇数”个数的数据点。


SELECT x.val AS median
FROM data x
CROSS JOIN data y
GROUP BY x.val
HAVING SUM(SIGN(1 - SIGN(y.val - x.val))) = (COUNT(*) + 1) / 2;

最后,你可以在PostgreSQL中使用内置函数轻松做到这一点。这里有一个很好的解释,以及关于离散中位数和插值中位数的有效总结。

https://leafo.net/guides/postgresql-calculating-percentile.html#calculating-the-median

因为我只需要一个中位数和百分位数的解决方案,我根据这个线程中的发现做了一个简单而相当灵活的函数。我知道,如果我发现“现成的”功能很容易包含在我的项目中,我自己会很高兴,所以我决定快速分享:

function mysql_percentile($table, $column, $where, $percentile = 0.5) {

    $sql = "
            SELECT `t1`.`".$column."` as `percentile` FROM (
            SELECT @rownum:=@rownum+1 as `row_number`, `d`.`".$column."`
              FROM `".$table."` `d`,  (SELECT @rownum:=0) `r`
              ".$where."
              ORDER BY `d`.`".$column."`
            ) as `t1`, 
            (
              SELECT count(*) as `total_rows`
              FROM `".$table."` `d`
              ".$where."
            ) as `t2`
            WHERE 1
            AND `t1`.`row_number`=floor(`total_rows` * ".$percentile.")+1;
        ";

    $result = sql($sql, 1);

    if (!empty($result)) {
        return $result['percentile'];       
    } else {
        return 0;
    }

}

使用非常简单,例子来自我目前的项目:

...
$table = DBPRE."zip_".$slug;
$column = 'seconds';
$where = "WHERE `reached` = '1' AND `time` >= '".$start_time."'";

    $reaching['median'] = mysql_percentile($table, $column, $where, 0.5);
    $reaching['percentile25'] = mysql_percentile($table, $column, $where, 0.25);
    $reaching['percentile75'] = mysql_percentile($table, $column, $where, 0.75);
...