排序10个数字的最快方法?(数字为32位)

我正在解决一个问题，它涉及到快速排序10个数字(int32)。我的应用程序需要以尽可能快的速度对10个数字进行百万次排序。我正在对一个包含数十亿个元素的数据集进行采样，每次我都需要从中挑选10个数字(简化)并对它们进行排序(并从排序的10个元素列表中得出结论)。

目前我正在使用插入排序，但我想我可以为我的10个数字的特定问题实现一个非常快速的自定义排序算法，这将击败插入排序。

我该如何处理这个问题呢?

当前回答

尽管网络排序在小数组上有很好的快速几率，但如果适当优化，有时您无法击败插入排序。例如，有2个元素的批量插入:

{
    final int a=in[0]<in[1]?in[0]:in[1];
    final int b=in[0]<in[1]?in[1]:in[0];
    in[0]=a;
    in[1]=b;
}
for(int x=2;x<10;x+=2)
{
    final int a=in[x]<in[x+1]?in[x]:in[x+1];
    final int b=in[x]<in[x+1]?in[x+1]:in[x];
    int y= x-1;

    while(y>=0&&in[y]>b)
    {
        in[y+2]= in[y];
        --y;
    }
    in[y+2]=b;
    while(y>=0&&in[y]>a)
    {
        in[y+1]= in[y];
        --y;
    }
    in[y+1]=a;
}

2015-08-25 04:42:00

其他回答

当您处理这个固定大小时，请查看排序网络。这些算法有固定的运行时间，并且独立于它们的输入。对于您的用例，您没有某些排序算法所具有的这种开销。

二进制排序就是这种网络的一种实现。这个方法在CPU上使用len(n) <= 32时效果最好。对于更大的输入，你可以考虑使用GPU。

顺便说一下，比较排序算法的一个好页面是这个(尽管它缺少二进制排序):

排序算法动画

2015-08-23 22:26:20

这个问题并没有说这是某种基于web的应用程序。有一件事引起了我的注意:

我正在对一个包含数十亿个元素的数据集进行采样，每次我都需要从中挑选10个数字(简化)并对它们进行排序(并从排序的10个元素列表中得出结论)。

As a software and hardware engineer this absolutely screams FPGA to me. I don't know what kind of conclusions you need to draw from the sorted set of numbers or where the data comes from, but I know it would be almost trivial to process somewhere between one hundred million and a billion of these "sort-and-analyze" operations per second. I've done FPGA-assisted DNA sequencing work in the past. It is nearly impossible to beat the massive processing power of FPGAs when the problem is well suited for that type of a solution.

在某种程度上，唯一的限制因素是将数据铲入FPGA的速度有多快，以及取出数据的速度有多快。

As a point of reference, I designed a high performance real-time image processor that received 32 bit RGB image data at a rate of about 300 million pixels per second. The data streamed through FIR filters, matrix multipliers, lookup tables, spatial edge detection blocks and a number of other operations before coming out the other end. All of this on a relatively small Xilinx Virtex2 FPGA with internal clocking spanning from about 33 MHz to, if I remember correctly, 400 MHz. Oh, yes, it also had a DDR2 controller implementation and ran two banks of DDR2 memory.

当工作在数百MHz时，FPGA可以在每次时钟转换中输出10个32位数字。当数据填满处理管道时，操作开始时会有短暂的延迟。在此之后，您应该能够在每个时钟获得一个结果。如果可以通过复制排序和分析管道使处理并行化，则会更多。原则上，解决方案几乎是微不足道的。

关键在于:如果应用程序不是pc绑定的，并且数据流和处理与FPGA解决方案“兼容”(无论是独立的还是作为机器中的协处理器卡)，那么无论使用哪种算法，用任何语言编写的软件都无法击败可达到的性能水平。

我只是快速搜索了一下，找到了一篇可能对你有用的论文。看起来要追溯到2012年。在今天(甚至在过去)，您可以在性能上做得更好。下面就是:

fpga上的排序网络

2015-08-28 08:34:29

插入排序平均需要29,6次与排序10个输入的比较，最佳情况为9，最差情况为45(给定输入的顺序是相反的)。

{9,6,1} shellsort平均需要25.5次比较来排序10个输入。最好的情况是14次比较，最坏的情况是34次，对反向输入排序需要22次。

因此，使用shellsort而不是插入排序可以减少14%的平均情况。尽管最佳情况增加了56%，但最坏情况减少了24%，这对于控制最坏情况性能非常重要的应用程序来说意义重大。反之则减少51%。

因为你似乎对插入排序很熟悉，你可以将算法实现为{9,6}的排序网络，然后在那之后附加插入排序({1}):

i[0] with i[9]    // {9}

i[0] with i[6]    // {6}
i[1] with i[7]    // {6}
i[2] with i[8]    // {6}
i[3] with i[9]    // {6}

i[0 ... 9]        // insertion sort

2016-06-21 16:38:40

(根据@HelloWorld的建议，研究排序网络。)

似乎29个比较/交换网络是进行10个输入排序的最快方法。在这个例子中，我使用了Waksman在1969年发现的JavaScript网络，它应该直接转换成C语言，因为它只是一个if语句、比较和交换的列表。

这里是网络的图形表示，分为独立的阶段。

为了利用并行处理的优势，可以将5-4-3-3 - 4-4-2 -3-2分组改为4-4-4-2 -4-4-3-2分组。

2015-08-24 01:05:16

您可以完全展开插入排序。

为了简化这一点，可以在没有函数开销的情况下使用递归模板。因为int已经是模板，所以它也可以是模板形参。这也使得编码10以外的数组大小变得很简单。

注意，要排序int x[10]，调用insert_sort<int, 9>::sort(x);因为类使用最后一项的索引。这可以被包装，但是需要阅读更多的代码。

template <class T, int NUM>
class insert_sort;

template <class T>
class insert_sort<T,0>
// Stop template recursion
// Sorting one item is a no operation 
{
public:
    static void place(T *x) {}
    static void sort(T * x) {}
};

template <class T, int NUM>
class insert_sort
// Use template recursion to do insertion sort.
// NUM is the index of the last item, e.g. for x[10] call <9>
{
public:
    static void place(T *x)
    {
        T t1=x[NUM-1];
        T t2=x[NUM];
        if (t1 > t2)
        {
            x[NUM-1]=t2;
            x[NUM]=t1;
            insert_sort<T,NUM-1>::place(x);
        }
    }
    static void sort(T * x)
    {
        insert_sort<T,NUM-1>::sort(x); // Sort everything before
        place(x);                    // Put this item in
    }
};

在我的测试中，这比排序网络的例子要快。

2016-05-24 22:53:10

排序10个数字的最快方法?(数字为32位)

推荐文章

最新文章

标签