我如何才能最好地编写一个查询,从总共600k中随机选择10行?


当前回答

我想这是最好的办法了。

SELECT id, id * RAND( ) AS random_no, first_name, last_name
FROM user
ORDER BY random_no

其他回答

一个伟大的职位处理几个情况,从简单,到差距,到不均匀与差距。

http://jan.kneschke.de/projects/mysql/order-by-rand/

对于大多数一般情况,你可以这样做:

SELECT name
  FROM random AS r1 JOIN
       (SELECT CEIL(RAND() *
                     (SELECT MAX(id)
                        FROM random)) AS id)
        AS r2
 WHERE r1.id >= r2.id
 ORDER BY r1.id ASC
 LIMIT 1

这假设id的分布是相等的,并且id列表中可能存在间隙。有关更高级的示例,请参阅本文

SELECT
  * 
FROM
  table_with_600k_rows
WHERE
  RAND( ) 
ORDER BY
  id DESC 
LIMIT 30;

Id是主键,按Id排序, 解释table_with_600k_rows,发现该行不扫描整个表

如果有一个自动生成的id,我发现一个很好的方法是使用模运算符'%'。例如,如果您需要70,000条随机记录中的10,000条,您可以简化为每7行中需要1行。这可以在这个查询中简化:

SELECT * FROM 
    table 
WHERE 
    id % 
    FLOOR(
        (SELECT count(1) FROM table) 
        / 10000
    ) = 0;

如果目标行除以total available的结果不是一个整数,那么你将得到比你要求的更多的行,所以你应该添加一个LIMIT子句来帮助你像这样修剪结果集:

SELECT * FROM 
    table 
WHERE 
    id % 
    FLOOR(
        (SELECT count(1) FROM table) 
        / 10000
    ) = 0
LIMIT 10000;

这确实需要一个完整的扫描,但它比ORDER BY RAND更快,在我看来,比本文中提到的其他选项更容易理解。另外,如果写入数据库的系统批量创建了一组行,你可能不会得到你所期望的随机结果。

如果你只有一个读请求

将@redsio的答案与一个临时表结合起来(600K并不是很多):

DROP TEMPORARY TABLE IF EXISTS tmp_randorder;
CREATE TABLE tmp_randorder (id int(11) not null auto_increment primary key, data_id int(11));
INSERT INTO tmp_randorder (data_id) select id from datatable;

然后用一个@redsios的版本回答:

SELECT dt.*
FROM
       (SELECT (RAND() *
                     (SELECT MAX(id)
                        FROM tmp_randorder)) AS id)
        AS rnd
 INNER JOIN tmp_randorder rndo on rndo.id between rnd.id - 10 and rnd.id + 10
 INNER JOIN datatable AS dt on dt.id = rndo.data_id
 ORDER BY abs(rndo.id - rnd.id)
 LIMIT 1;

如果表比较大,可以先筛选第一部分:

INSERT INTO tmp_randorder (data_id) select id from datatable where rand() < 0.01;

如果你有很多读请求

Version: You could keep the table tmp_randorder persistent, call it datatable_idlist. Recreate that table in certain intervals (day, hour), since it also will get holes. If your table gets really big, you could also refill holes select l.data_id as whole from datatable_idlist l left join datatable dt on dt.id = l.data_id where dt.id is null; Version: Give your Dataset a random_sortorder column either directly in datatable or in a persistent extra table datatable_sortorder. Index that column. Generate a Random-Value in your Application (I'll call it $rand). select l.* from datatable l order by abs(random_sortorder - $rand) desc limit 1;

这个解决方案用最高和最低的random_sortorder来区分“边缘行”,所以在间隔中重新排列它们(一天一次)。

如何从表中随机选择行:

从这里开始: 在MySQL中随机选择行

对“表扫描”的快速改进是使用索引来获取随机id。

SELECT *
FROM random, (
        SELECT id AS sid
        FROM random
        ORDER BY RAND( )
        LIMIT 10
    ) tmp
WHERE random.id = tmp.sid;