我如何才能最好地编写一个查询,从总共600k中随机选择10行?


当前回答

SELECT
  * 
FROM
  table_with_600k_rows
WHERE
  RAND( ) 
ORDER BY
  id DESC 
LIMIT 30;

Id是主键,按Id排序, 解释table_with_600k_rows,发现该行不扫描整个表

其他回答

我认为这是一个简单但更快的方法,我在现场服务器上测试了它,与上面的几个答案相比,它更快。

 SELECT * FROM `table_name` WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `table_name` ) ORDER BY id LIMIT 30; 

//对一个130行的表花费0.0014秒

SELECT * FROM `table_name` WHERE 1 ORDER BY RAND() LIMIT 30

//对130行的表花费0.0042秒

 SELECT name
FROM random AS r1 JOIN
   (SELECT CEIL(RAND() *
                 (SELECT MAX(id)
                    FROM random)) AS id)
    AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 30

//对130行的表花费0.0040秒

如果有一个自动生成的id,我发现一个很好的方法是使用模运算符'%'。例如,如果您需要70,000条随机记录中的10,000条,您可以简化为每7行中需要1行。这可以在这个查询中简化:

SELECT * FROM 
    table 
WHERE 
    id % 
    FLOOR(
        (SELECT count(1) FROM table) 
        / 10000
    ) = 0;

如果目标行除以total available的结果不是一个整数,那么你将得到比你要求的更多的行,所以你应该添加一个LIMIT子句来帮助你像这样修剪结果集:

SELECT * FROM 
    table 
WHERE 
    id % 
    FLOOR(
        (SELECT count(1) FROM table) 
        / 10000
    ) = 0
LIMIT 10000;

这确实需要一个完整的扫描,但它比ORDER BY RAND更快,在我看来,比本文中提到的其他选项更容易理解。另外,如果写入数据库的系统批量创建了一组行,你可能不会得到你所期望的随机结果。

SELECT
  * 
FROM
  table_with_600k_rows
WHERE
  RAND( ) 
ORDER BY
  id DESC 
LIMIT 30;

Id是主键,按Id排序, 解释table_with_600k_rows,发现该行不扫描整个表

您可以轻松地使用带限制的随机偏移量

PREPARE stm from 'select * from table limit 10 offset ?';
SET @total = (select count(*) from table);
SET @_offset = FLOOR(RAND() * @total);
EXECUTE stm using @_offset;

您还可以像这样应用where子句

PREPARE stm from 'select * from table where available=true limit 10 offset ?';
SET @total = (select count(*) from table where available=true);
SET @_offset = FLOOR(RAND() * @total);
EXECUTE stm using @_offset;

在600,000行(700MB)表查询执行上的测试花费了大约0.016秒的硬盘驱动器时间。

EDIT:偏移量可能取接近表末尾的值,这将导致select语句返回更少的行(或者可能只有一行),为了避免这种情况,我们可以在声明偏移量后再次检查,如下所示

SET @rows_count = 10;
PREPARE stm from "select * from table where available=true limit ? offset ?";
SET @total = (select count(*) from table where available=true);
SET @_offset = FLOOR(RAND() * @total);
SET @_offset = (SELECT IF(@total-@_offset<@rows_count,@_offset-@rows_count,@_offset));
SET @_offset = (SELECT IF(@_offset<0,0,@_offset));
EXECUTE stm using @rows_count,@_offset;

我改进了@Riedsio的答案。这是我在一个有间隙的大型均匀分布表上能找到的最有效的查询(测试从一个有> 2.6B行的表中获得1000个随机行)。

(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max := (SELECT MAX(id) FROM table)) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1)

让我来解释一下发生了什么。

(SELECT MAX(id) FROM table) 我在计算并保存最大值。对于非常大的表,每次需要一行时计算MAX(id)都会有轻微的开销 SELECT FLOOR(rand() * @max) + 1 as rand) 获取一个随机id SELECT id FROM table INNER JOIN(… 这就填补了空白。基本上,如果你在间隙中随机选择一个数字,它就会选择下一个id。假设间隙是均匀分布的,这应该不是问题。

进行联合可以帮助您将所有内容放入一个查询中,从而避免进行多个查询。它还可以节省计算MAX(id)的开销。根据您的应用程序,这可能非常重要,也可能无关紧要。

注意,这只获取id,并以随机顺序获取它们。如果你想做更高级的事情,我建议你这样做:

SELECT t.id, t.name -- etc, etc
FROM table t
INNER JOIN (
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max := (SELECT MAX(id) FROM table)) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1)
) x ON x.id = t.id
ORDER BY t.id