c++标准库中的std::sort算法(及其兄弟std::partial_sort和std::nth_element)在大多数实现中是更基本排序算法的复杂混合组合,例如选择排序、插入排序、快速排序、归并排序或堆排序。

这里和姐妹网站(如https://codereview.stackexchange.com/)上有许多问题与这些经典排序算法实现的错误、复杂性和其他方面有关。大多数提供的实现都由原始循环组成,使用索引操作和具体类型,并且在正确性和效率方面分析起来通常不是简单的。

问:如何使用现代c++实现上面提到的经典排序算法?

没有原始循环,而是结合了<algorithm>中的标准库算法构建块 迭代器接口和使用模板代替索引操作和具体类型 c++ 14风格,包括完整的标准库,以及诸如auto、模板别名、透明比较器和多态lambdas等语法降噪器。

注:

for further references on implementations of sorting algorithms see Wikipedia, Rosetta Code or http://www.sorting-algorithms.com/ according to Sean Parent's conventions (slide 39), a raw loop is a for-loop longer than composition of two functions with an operator. So f(g(x)); or f(x); g(x); or f(x) + g(x); are not raw loops, and neither are the loops in selection_sort and insertion_sort below. I follow Scott Meyers's terminology to denote the current C++1y already as C++14, and to denote C++98 and C++03 both as C++98, so don't flame me for that. As suggested in the comments by @Mehrdad, I provide four implementations as a Live Example at the end of the answer: C++14, C++11, C++98 and Boost and C++98. The answer itself is presented in terms of C++14 only. Where relevant, I denote the syntactic and library differences where the various language versions differ.


算法构建块

我们从组装标准库的算法构建块开始:

#include <algorithm>    // min_element, iter_swap, 
                        // upper_bound, rotate, 
                        // partition, 
                        // inplace_merge,
                        // make_heap, sort_heap, push_heap, pop_heap,
                        // is_heap, is_sorted
#include <cassert>      // assert 
#include <functional>   // less
#include <iterator>     // distance, begin, end, next

the iterator tools such as non-member std::begin() / std::end() as well as with std::next() are only available as of C++11 and beyond. For C++98, one needs to write these himself. There are substitutes from Boost.Range in boost::begin() / boost::end(), and from Boost.Utility in boost::next(). the std::is_sorted algorithm is only available for C++11 and beyond. For C++98, this can be implemented in terms of std::adjacent_find and a hand-written function object. Boost.Algorithm also provides a boost::algorithm::is_sorted as a substitute. the std::is_heap algorithm is only available for C++11 and beyond.

语法好东西

c++ 14提供了形式为std::less<>的透明比较器,它们对参数起多态作用。这避免了必须提供迭代器的类型。这可以与c++ 11的默认函数模板参数结合使用,为以<作为比较的排序算法和具有用户定义比较函数对象的排序算法创建单个重载。

template<class It, class Compare = std::less<>>
void xxx_sort(It first, It last, Compare cmp = Compare{});

在c++ 11中,可以定义一个可重用的模板别名来提取迭代器的值类型,这给排序算法的签名增加了一些小混乱:

template<class It>
using value_type_t = typename std::iterator_traits<It>::value_type;

template<class It, class Compare = std::less<value_type_t<It>>>
void xxx_sort(It first, It last, Compare cmp = Compare{});

在c++ 98中,需要编写两个重载,并使用详细的typename xxx<yyy>::type语法

template<class It, class Compare>
void xxx_sort(It first, It last, Compare cmp); // general implementation

template<class It>
void xxx_sort(It first, It last)
{
    xxx_sort(first, last, std::less<typename std::iterator_traits<It>::value_type>());
}

Another syntactical nicety is that C++14 facilitates wrapping user-defined comparators through polymorphic lambdas (with auto parameters that are deduced like function template arguments). C++11 only has monomorphic lambdas, that require the use of the above template alias value_type_t. In C++98, one either needs to write a standalone function object or resort to the verbose std::bind1st / std::bind2nd / std::not1 type of syntax. Boost.Bind improves this with boost::bind and _1 / _2 placeholder syntax. C++11 and beyond also have std::find_if_not, whereas C++98 needs std::find_if with a std::not1 around a function object.

c++风格

目前还没有普遍接受的c++ 14风格。不管是好是坏,我一直在密切关注Scott Meyers的《Effective Modern c++》草案和Herb Sutter的《GotW》修订版。我推荐以下风格:

Herb Sutter's "Almost Always Auto" and Scott Meyers's "Prefer auto to specific type declarations" recommendation, for which the brevity is unsurpassed, although its clarity is sometimes disputed. Scott Meyers's "Distinguish () and {} when creating objects" and consistently choose braced-initialization {} instead of the good old parenthesized initialization () (in order to side-step all most-vexing-parse issues in generic code). Scott Meyers's "Prefer alias declarations to typedefs". For templates this is a must anyway, and using it everywhere instead of typedef saves time and adds consistency. I use a for (auto it = first; it != last; ++it) pattern in some places, in order to allow for loop invariant checking for already sorted sub-ranges. In production code, the use of while (first != last) and a ++first somewhere inside the loop might be slightly better.

选择排序

选择排序不以任何方式适应数据,因此它的运行时间总是O(N²)。但是,选择排序具有使交换次数最小化的特性。在交换项代价较高的应用中,选择排序可能是最佳算法。

要使用标准库实现它,重复使用std::min_element查找剩余的最小元素,并使用iter_swap将其交换到适当的位置:

template<class FwdIt, class Compare = std::less<>>
void selection_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
{
    for (auto it = first; it != last; ++it) {
        auto const selection = std::min_element(it, last, cmp);
        std::iter_swap(selection, it); 
        assert(std::is_sorted(first, std::next(it), cmp));
    }
}

注意,selection_sort将已经处理过的范围[first, it)排序为它的循环不变量。与std::sort的随机访问迭代器相比,最小的需求是前向迭代器。

细节省略:

选择排序可以通过早期测试进行优化,如果(std::distance(first, last) <= 1)返回;(或者对于正向/双向迭代器:if (first == last || std::next(first) == last)返回;)。 对于双向迭代器,上面的测试可以与间隔[first, std::prev(last))上的循环结合,因为最后一个元素保证是最小的剩余元素,并且不需要交换。

插入排序

虽然它是最坏情况时间为O(N²)的基本排序算法之一,但无论是在数据接近排序时(因为它是自适应的),还是在问题规模较小时(因为它开销低),插入排序都是首选算法。由于这些原因,也因为它是稳定的,插入排序通常被用作更高开销的分治排序算法(如归并排序或快速排序)的递归基本情况(当问题规模较小时)。

要用标准库实现insertion_sort,重复使用std::upper_bound来查找当前元素需要移动的位置,并使用std::rotate将输入范围内的剩余元素向上移动:

template<class FwdIt, class Compare = std::less<>>
void insertion_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
{
    for (auto it = first; it != last; ++it) {
        auto const insertion = std::upper_bound(first, it, *it, cmp);
        std::rotate(insertion, it, std::next(it)); 
        assert(std::is_sorted(first, std::next(it), cmp));
    }
}

注意insertion_sort将已经处理过的范围[first, it)排序为它的循环不变量。插入排序也适用于前向迭代器。

细节省略:

insertion sort can be optimized with an early test if (std::distance(first, last) <= 1) return; (or for forward / bidirectional iterators: if (first == last || std::next(first) == last) return;) and a loop over the interval [std::next(first), last), because the first element is guaranteed to be in place and doesn't require a rotate. for bidirectional iterators, the binary search to find the insertion point can be replaced with a reverse linear search using the Standard Library's std::find_if_not algorithm.

以下片段的四个现场示例(c++ 14, c++ 11, c++ 98和Boost, c++ 98):

using RevIt = std::reverse_iterator<BiDirIt>;
auto const insertion = std::find_if_not(RevIt(it), RevIt(first), 
    [=](auto const& elem){ return cmp(*it, elem); }
).base();

对于随机输入,这给出了O(N²)个比较,但对于几乎排序的输入,这改进为O(N)个比较。二分查找总是使用O(N log N)个比较。 对于较小的输入范围,线性搜索的更好的内存位置(缓存,预取)也可能主导二进制搜索(当然,应该测试这一点)。

快速排序

当仔细实现时,快速排序是健壮的,具有O(N log N)的预期复杂性,但有O(N²)的最坏情况复杂性,可以由对手选择的输入数据触发。当不需要稳定排序时,快速排序是一种优秀的通用排序。

Even for the simplest versions, quick sort is quite a bit more complicated to implement using the Standard Library than the other classic sorting algorithms. The approach below uses a few iterator utilities to locate the middle element of the input range [first, last) as the pivot, then use two calls to std::partition (which are O(N)) to three-way partition the input range into segments of elements that are smaller than, equal to, and larger than the selected pivot, respectively. Finally the two outer segments with elements smaller than and larger than the pivot are recursively sorted:

template<class FwdIt, class Compare = std::less<>>
void quick_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
{
    auto const N = std::distance(first, last);
    if (N <= 1) return;
    auto const pivot = *std::next(first, N / 2);
    auto const middle1 = std::partition(first, last, [=](auto const& elem){ 
        return cmp(elem, pivot); 
    });
    auto const middle2 = std::partition(middle1, last, [=](auto const& elem){ 
        return !cmp(pivot, elem);
    });
    quick_sort(first, middle1, cmp); // assert(std::is_sorted(first, middle1, cmp));
    quick_sort(middle2, last, cmp);  // assert(std::is_sorted(middle2, last, cmp));
}

然而,快速排序是相当棘手的正确和有效的,因为上面的每个步骤都必须仔细检查和优化生产级代码。特别地,对于O(N log N)复杂度,枢轴必须导致输入数据的平衡分区,这对于O(1)枢轴通常不能保证,但如果将枢轴设置为输入范围的O(N)中位数,则可以保证。

细节省略:

the above implementation is particularly vulnerable to special inputs, e.g. it has O(N^2) complexity for the "organ pipe" input 1, 2, 3, ..., N/2, ... 3, 2, 1 (because the middle is always larger than all other elements). median-of-3 pivot selection from randomly chosen elements from the input range guards against almost sorted inputs for which the complexity would otherwise deteriorate to O(N^2). 3-way partitioning (separating elements smaller than, equal to and larger than the pivot) as shown by the two calls to std::partition is not the most efficient O(N) algorithm to achieve this result. for random access iterators, a guaranteed O(N log N) complexity can be achieved through median pivot selection using std::nth_element(first, middle, last), followed by recursive calls to quick_sort(first, middle, cmp) and quick_sort(middle, last, cmp). this guarantee comes at a cost, however, because the constant factor of the O(N) complexity of std::nth_element can be more expensive than that of the O(1) complexity of a median-of-3 pivot followed by an O(N) call to std::partition (which is a cache-friendly single forward pass over the data).

去排序

如果不考虑使用O(N)额外的空间,那么归并排序是一个很好的选择:它是唯一稳定的O(N log N)排序算法。

使用标准算法实现它很简单:使用一些迭代器实用程序来定位输入范围的中间[第一个,最后一个),并用std::inplace_merge组合两个递归排序的段:

template<class BiDirIt, class Compare = std::less<>>
void merge_sort(BiDirIt first, BiDirIt last, Compare cmp = Compare{})
{
    auto const N = std::distance(first, last);
    if (N <= 1) return;                   
    auto const middle = std::next(first, N / 2);
    merge_sort(first, middle, cmp); // assert(std::is_sorted(first, middle, cmp));
    merge_sort(middle, last, cmp);  // assert(std::is_sorted(middle, last, cmp));
    std::inplace_merge(first, middle, last, cmp); // assert(std::is_sorted(first, last, cmp));
}

归并排序需要双向迭代器,瓶颈是std::inplace_merge。注意,在对链表排序时,归并排序只需要O(log N)额外的空间(用于递归)。后一种算法是通过标准库中的std::list<T>::sort实现的。

堆排序

堆排序实现简单,执行O(N log N)就地排序,但不稳定。

第一个循环,O(N)"heapify"阶段,将数组按堆顺序排列。第二个循环,O(N log N)“排序”阶段,重复提取最大值并恢复堆顺序。标准库使这一点非常简单:

template<class RandomIt, class Compare = std::less<>>
void heap_sort(RandomIt first, RandomIt last, Compare cmp = Compare{})
{
    lib::make_heap(first, last, cmp); // assert(std::is_heap(first, last, cmp));
    lib::sort_heap(first, last, cmp); // assert(std::is_sorted(first, last, cmp));
}

如果你认为使用std::make_heap和std::sort_heap是“欺骗”,你可以再深入一层,分别用std::push_heap和std::pop_heap来编写这些函数:

namespace lib {

// NOTE: is O(N log N), not O(N) as std::make_heap
template<class RandomIt, class Compare = std::less<>>
void make_heap(RandomIt first, RandomIt last, Compare cmp = Compare{})
{
    for (auto it = first; it != last;) {
        std::push_heap(first, ++it, cmp); 
        assert(std::is_heap(first, it, cmp));           
    }
}

template<class RandomIt, class Compare = std::less<>>
void sort_heap(RandomIt first, RandomIt last, Compare cmp = Compare{})
{
    for (auto it = last; it != first;) {
        std::pop_heap(first, it--, cmp);
        assert(std::is_heap(first, it, cmp));           
    } 
}

}   // namespace lib

标准库将push_heap和pop_heap的复杂度指定为O(log N)。但是请注意,在[first, last)范围内的外层循环导致make_heap的复杂度为O(N log N),而std::make_heap的复杂度仅为O(N)。对于heap_sort的总体复杂度O(N log N)来说,这无关紧要。

详细信息省略:O(N) make_heap的实现

测试

这里有四个现场示例(c++ 14, c++ 11, c++ 98和Boost, c++ 98)在各种输入上测试所有五种算法(并不意味着详尽或严格)。只需注意LOC的巨大差异:c++ 11/ c++ 14需要大约130个LOC, c++ 98和Boost需要190个LOC(+50%),而c++ 98需要270多个LOC(+100%)。

另一个小而优雅的例子最初是在代码审查中发现的。我觉得值得分享。

计数排序

计数排序是一种简单的整数排序算法,如果要排序的整数的值相距不是太远,它通常可以非常快。例如,如果需要对100万个已知在0到100之间的整数的集合进行排序,这可能是理想的。

要实现对有符号整数和无符号整数都适用的非常简单的计数排序,需要找到集合中最小和最大的元素进行排序;它们的差异将告诉要分配的计数数组的大小。然后,对集合进行第二次遍历,以计算每个元素出现的次数。最后,我们将每个整数的所需数目写回原始集合。

template<typename ForwardIterator>
void counting_sort(ForwardIterator first, ForwardIterator last)
{
    if (first == last || std::next(first) == last) return;

    auto minmax = std::minmax_element(first, last);  // avoid if possible.
    auto min = *minmax.first;
    auto max = *minmax.second;
    if (min == max) return;

    using difference_type = typename std::iterator_traits<ForwardIterator>::difference_type;
    std::vector<difference_type> counts(max - min + 1, 0);

    for (auto it = first ; it != last ; ++it) {
        ++counts[*it - min];
    }

    for (auto count: counts) {
        first = std::fill_n(first, count, min++);
    }
}

虽然它只在已知要排序的整数的范围很小(通常不大于要排序的集合的大小)时有用,但使计数排序更通用会使它在最佳情况下变慢。如果已知范围不是很小,则可以使用另一种算法,例如基数排序、ska_sort或spreadsort。

细节省略:

We could have passed the bounds of the range of values accepted by the algorithm as parameters to totally get rid of the first std::minmax_element pass through the collection. This will make the algorithm even faster when a usefully-small range limit is known by other means. (It doesn't have to be exact; passing a constant 0 to 100 is still much better than an extra pass over a million elements to find out that the true bounds are 1 to 95. Even 0 to 1000 would be worth it; the extra elements are written once with zero and read once). Growing counts on the fly is another way to avoid a separate first pass. Doubling the counts size each time it has to grow gives amortized O(1) time per sorted element (see hash table insertion cost analysis for the proof that exponential grown is the key). Growing at the end for a new max is easy with std::vector::resize to add new zeroed elements. Changing min on the fly and inserting new zeroed elements at the front can be done with std::copy_backward after growing the vector. Then std::fill to zero the new elements. The counts increment loop is a histogram. If the data is likely to be highly repetitive, and the number of bins is small, it can be worth unrolling over multiple arrays to reduce the serializing data dependency bottleneck of store/reload to the same bin. This means more counts to zero at the start, and more to loop over at the end, but should be worth it on most CPUs for our example of millions of 0 to 100 numbers, especially if the input might already be (partially) sorted and have long runs of the same number. In the algorithm above, we use a min == max check to return early when every element has the same value (in which case the collection is sorted). It is actually possible to instead fully check whether the collection is already sorted while finding the extreme values of a collection with no additional time wasted (if the first pass is still memory bottlenecked with the extra work of updating min and max). However such an algorithm does not exist in the standard library and writing one would be more tedious than writing the rest of counting sort itself. It is left as an exercise for the reader. Since the algorithm only works with integer values, static assertions could be used to prevent users from making obvious type mistakes. In some contexts, a substitution failure with std::enable_if_t might be preferred. While modern C++ is cool, future C++ could be even cooler: structured bindings and some parts of the Ranges TS would make the algorithm even cleaner.

Visual Studio versions prior to 2015 used a bottom up merge sort for linked list, but switched to a less efficient (due to scanning lists to split them via std::next) top down merge sort for linked list when it was changed to handle no default allocator using an iterator based algorithm, which also provided exception safety if the compare (user defined) threw an exception (with the older version, a compare exception would lose the part of the list in an internal array). When this was first questioned, I assumed that there was some issue with implementing an iterator based bottom up merge sort for linked list, but later determined there was no issue. The only "clever" part of my implementation is using std::list::end() (a constant) to indicate an empty sub-list (since an iterator can't be null). Link to my answer showing stand-alone and replacement code for std::list::sort():

https://stackoverflow.com/a/40629882/3282056