为什么哈希函数应该使用质数模?

很久以前，我花1.25美元在便宜货桌上买了一本数据结构的书。在这篇文章中，哈希函数的解释说，由于“数学的本质”，它最终应该被一个质数mod。

你对一本1.25美元的书有什么期待?

不管怎么说，我花了很多年思考数学的本质，但还是没弄明白。

当有质数个桶时，数字的分布真的更均匀吗?

或者这是一个老程序员的故事，每个人都接受，因为其他人都接受?

当前回答

这取决于哈希函数的选择。

许多哈希函数通过将数据中的各种元素与一些因子相乘，再乘以与机器的字大小相对应的2的幂的模(这个模可以通过让计算溢出来释放)来组合数据中的各种元素。

您不希望在数据元素的乘数和哈希表的大小之间有任何公共因子，因为这样可能会发生改变数据元素不会将数据分散到整个表上的情况。如果你为表的大小选择一个质数，这样的公因数是极不可能的。

另一方面，这些因数通常由奇数质数组成，因此在哈希表中使用2的幂也应该是安全的(例如，Eclipse在生成Java hashCode()方法时使用31)。

2009-07-18 07:32:19

其他回答

Primes are used because you have good chances of obtaining a unique value for a typical hash-function which uses polynomials modulo P. Say, you use such hash-function for strings of length <= N, and you have a collision. That means that 2 different polynomials produce the same value modulo P. The difference of those polynomials is again a polynomial of the same degree N (or less). It has no more than N roots (this is here the nature of math shows itself, since this claim is only true for a polynomial over a field => prime number). So if N is much less than P, you are likely not to have a collision. After that, experiment can probably show that 37 is big enough to avoid collisions for a hash-table of strings which have length 5-10, and is small enough to use for calculations.

2013-11-26 01:04:11

这取决于哈希函数的选择。

另一方面，这些因数通常由奇数质数组成，因此在哈希表中使用2的幂也应该是安全的(例如，Eclipse在生成Java hashCode()方法时使用31)。

2009-07-18 07:32:19

抄袭我的其他答案https://stackoverflow.com/a/43126969/917428。有关更多细节和示例，请参阅它。

我相信这和电脑在2进制下工作有关。想想以10为基数的情况:

8%10 = 8 18%10 = 8 87865378%10 = 8

不管这个数是多少只要它以8结尾，它对10的模就是8。

选择一个足够大的、非2的幂的数字将确保哈希函数实际上是所有输入位的函数，而不是它们的子集。

2017-03-30 19:48:30

为了提供另一种观点，这里有一个网站:

http://www.codexon.com/posts/hash-functions-the-modulo-prime-myth