编写一个程序，从一个包含10亿个数字的数组中找出100个最大的数字

最近我参加了一个面试，面试官要求我“编写一个程序，从一个包含10亿个数字的数组中找出100个最大的数字”。

我只能给出一个蛮力解决方案，即以O(nlogn)时间复杂度对数组进行排序，并取最后100个数字。

Arrays.sort(array);

面试官正在寻找一个更好的时间复杂度，我尝试了几个其他的解决方案，但都没有回答他。有没有更好的时间复杂度解决方案?

当前回答

从十亿个数字中找到前100个最好使用包含100个元素的最小堆。

首先用遇到的前100个数字对最小堆进行质数。Min-heap将前100个数字中最小的存储在根(顶部)。

现在，当你继续计算其他数字时，只将它们与根数(100中最小的数)进行比较。

如果遇到的新数字大于最小堆的根，则将根替换为该数字，否则忽略它。

作为在最小堆中插入新数字的一部分，堆中最小的数字将移到顶部(根)。

一旦我们遍历了所有的数字，我们将得到最小堆中最大的100个数字。

2017-11-24 10:55:00

其他回答

我用Python写了一个简单的解决方案，以防有人感兴趣。它使用bisect模块和一个临时返回列表，它保持排序。这类似于优先级队列实现。

import bisect

def kLargest(A, k):
    '''returns list of k largest integers in A'''
    ret = []
    for i, a in enumerate(A):
        # For first k elements, simply construct sorted temp list
        # It is treated similarly to a priority queue
        if i < k:
            bisect.insort(ret, a) # properly inserts a into sorted list ret
        # Iterate over rest of array
        # Replace and update return array when more optimal element is found
        else:
            if a > ret[0]:
                del ret[0] # pop min element off queue
                bisect.insort(ret, a) # properly inserts a into sorted list ret
    return ret

使用100,000,000个元素和最坏情况输入是一个排序列表:

>>> from so import kLargest
>>> kLargest(range(100000000), 100)
[99999900, 99999901, 99999902, 99999903, 99999904, 99999905, 99999906, 99999907,
 99999908, 99999909, 99999910, 99999911, 99999912, 99999913, 99999914, 99999915,
 99999916, 99999917, 99999918, 99999919, 99999920, 99999921, 99999922, 99999923,
 99999924, 99999925, 99999926, 99999927, 99999928, 99999929, 99999930, 99999931,
 99999932, 99999933, 99999934, 99999935, 99999936, 99999937, 99999938, 99999939,
 99999940, 99999941, 99999942, 99999943, 99999944, 99999945, 99999946, 99999947,
 99999948, 99999949, 99999950, 99999951, 99999952, 99999953, 99999954, 99999955,
 99999956, 99999957, 99999958, 99999959, 99999960, 99999961, 99999962, 99999963,
 99999964, 99999965, 99999966, 99999967, 99999968, 99999969, 99999970, 99999971,
 99999972, 99999973, 99999974, 99999975, 99999976, 99999977, 99999978, 99999979,
 99999980, 99999981, 99999982, 99999983, 99999984, 99999985, 99999986, 99999987,
 99999988, 99999989, 99999990, 99999991, 99999992, 99999993, 99999994, 99999995,
 99999996, 99999997, 99999998, 99999999]

我花了40秒计算1亿个元素，所以我不敢计算10亿个元素。为了公平起见，我给它提供了最坏情况的输入(具有讽刺意味的是，一个已经排序的数组)。

2013-10-07 20:28:14

求n个元素中最大的m个元素，其中n >>> m

最简单的解决方案，每个人都应该很明显，就是简单地做m次冒泡排序算法。

然后打印出数组的最后n个元素。

它不需要外部数据结构，并且使用了一种大家都知道的算法。

运行时间估计为O(m*n)。到目前为止最好的答案是O(nlog (m))，所以这个解决方案对于小m来说并不显着昂贵。

我并不是说这不能改进，但这是迄今为止最简单的解决方案。

2013-10-08 14:47:44

Time ~ O(100 * N)
Space ~ O(100 + N)

创建一个包含100个空槽的空列表对于输入列表中的每个数字: 如果数字小于第一个，跳过否则用这个数字代替它然后，将数字通过相邻的交换;直到它比下一个小返回列表

注意:如果log(input-list.size) + c < 100，那么最佳的方法是对输入列表进行排序，然后拆分前100项。

2013-10-09 06:19:07

I would find out who had the time to put a billion numbers into an array and fire him. Must work for government. At least if you had a linked list you could insert a number into the middle without moving half a billion to make room. Even better a Btree allows for a binary search. Each comparison eliminates half of your total. A hash algorithm would allow you to populate the data structure like a checkerboard but not so good for sparse data. As it is your best bet is to have a solution array of 100 integers and keep track of the lowest number in your solution array so you can replace it when you come across a higher number in the original array. You would have to look at every element in the original array assuming it is not sorted to begin with.

2013-10-09 15:11:46

这是谷歌或其他行业巨头提出的问题。也许下面的代码就是面试官想要的正确答案。时间成本和空间成本取决于输入数组中的最大数量。对于32位int数组输入，最大空间成本是4 * 125M字节，时间成本是5 *十亿。

public class TopNumber {
    public static void main(String[] args) {
        final int input[] = {2389,8922,3382,6982,5231,8934
                            ,4322,7922,6892,5224,4829,3829
                            ,6892,6872,4682,6723,8923,3492};
        //One int(4 bytes) hold 32 = 2^5 value,
        //About 4 * 125M Bytes
        //int sort[] = new int[1 << (32 - 5)];
        //Allocate small array for local test
        int sort[] = new int[1000];
        //Set all bit to 0
        for(int index = 0; index < sort.length; index++){
            sort[index] = 0;
        }
        for(int number : input){
            sort[number >>> 5] |= (1 << (number % 32));
        }
        int topNum = 0;
        outer:
        for(int index = sort.length - 1; index >= 0; index--){
            if(0 != sort[index]){
                for(int bit = 31; bit >= 0; bit--){
                    if(0 != (sort[index] & (1 << bit))){
                        System.out.println((index << 5) + bit);
                        topNum++;
                        if(topNum >= 3){
                            break outer;
                        }
                    }
                }
            }
        }
    }
}

2013-10-13 09:35:03

编写一个程序，从一个包含10亿个数字的数组中找出100个最大的数字

推荐文章

最新文章

标签