在.NET中,GetHashCode方法在整个.NET基类库的许多地方都使用。正确执行它对于在集合中或确定相等时快速查找项目尤为重要。
对于如何为自定义类实现GetHashCode,是否有标准算法或最佳实践,以便不会降低性能?
在.NET中,GetHashCode方法在整个.NET基类库的许多地方都使用。正确执行它对于在集合中或确定相等时快速查找项目尤为重要。
对于如何为自定义类实现GetHashCode,是否有标准算法或最佳实践,以便不会降低性能?
当前回答
这是一个很好的例子:
/// <summary>
/// Helper class for generating hash codes suitable
/// for use in hashing algorithms and data structures like a hash table.
/// </summary>
public static class HashCodeHelper
{
private static int GetHashCodeInternal(int key1, int key2)
{
unchecked
{
var num = 0x7e53a269;
num = (-1521134295 * num) + key1;
num += (num << 10);
num ^= (num >> 6);
num = ((-1521134295 * num) + key2);
num += (num << 10);
num ^= (num >> 6);
return num;
}
}
/// <summary>
/// Returns a hash code for the specified objects
/// </summary>
/// <param name="arr">An array of objects used for generating the
/// hash code.</param>
/// <returns>
/// A hash code, suitable for use in hashing algorithms and data
/// structures like a hash table.
/// </returns>
public static int GetHashCode(params object[] arr)
{
int hash = 0;
foreach (var item in arr)
hash = GetHashCodeInternal(hash, item.GetHashCode());
return hash;
}
/// <summary>
/// Returns a hash code for the specified objects
/// </summary>
/// <param name="obj1">The first object.</param>
/// <param name="obj2">The second object.</param>
/// <param name="obj3">The third object.</param>
/// <param name="obj4">The fourth object.</param>
/// <returns>
/// A hash code, suitable for use in hashing algorithms and
/// data structures like a hash table.
/// </returns>
public static int GetHashCode<T1, T2, T3, T4>(T1 obj1, T2 obj2, T3 obj3,
T4 obj4)
{
return GetHashCode(obj1, GetHashCode(obj2, obj3, obj4));
}
/// <summary>
/// Returns a hash code for the specified objects
/// </summary>
/// <param name="obj1">The first object.</param>
/// <param name="obj2">The second object.</param>
/// <param name="obj3">The third object.</param>
/// <returns>
/// A hash code, suitable for use in hashing algorithms and data
/// structures like a hash table.
/// </returns>
public static int GetHashCode<T1, T2, T3>(T1 obj1, T2 obj2, T3 obj3)
{
return GetHashCode(obj1, GetHashCode(obj2, obj3));
}
/// <summary>
/// Returns a hash code for the specified objects
/// </summary>
/// <param name="obj1">The first object.</param>
/// <param name="obj2">The second object.</param>
/// <returns>
/// A hash code, suitable for use in hashing algorithms and data
/// structures like a hash table.
/// </returns>
public static int GetHashCode<T1, T2>(T1 obj1, T2 obj2)
{
return GetHashCodeInternal(obj1.GetHashCode(), obj2.GetHashCode());
}
}
下面是如何使用它:
private struct Key
{
private Type _type;
private string _field;
public Type Type { get { return _type; } }
public string Field { get { return _field; } }
public Key(Type type, string field)
{
_type = type;
_field = field;
}
public override int GetHashCode()
{
return HashCodeHelper.GetHashCode(_field, _type);
}
public override bool Equals(object obj)
{
if (!(obj is Key))
return false;
var tf = (Key)obj;
return tf._field.Equals(_field) && tf._type.Equals(_type);
}
}
其他回答
我在Helper库中有一个Hashing类,用于此目的。
/// <summary>
/// This is a simple hashing function from Robert Sedgwicks Hashing in C book.
/// Also, some simple optimizations to the algorithm in order to speed up
/// its hashing process have been added. from: www.partow.net
/// </summary>
/// <param name="input">array of objects, parameters combination that you need
/// to get a unique hash code for them</param>
/// <returns>Hash code</returns>
public static int RSHash(params object[] input)
{
const int b = 378551;
int a = 63689;
int hash = 0;
// If it overflows then just wrap around
unchecked
{
for (int i = 0; i < input.Length; i++)
{
if (input[i] != null)
{
hash = hash * a + input[i].GetHashCode();
a = a * b;
}
}
}
return hash;
}
然后,只需将其用作:
public override int GetHashCode()
{
return Hashing.RSHash(_field1, _field2, _field3);
}
我没有评估它的表现,所以欢迎任何反馈。
我的大部分工作都是通过数据库连接完成的,这意味着我的类都具有来自数据库的唯一标识符。我总是使用数据库中的ID来生成哈希代码。
// Unique ID from database
private int _id;
...
{
return _id.GetHashCode();
}
这是我使用JonSkeet实现的助手类。
public static class HashCode
{
public const int Start = 17;
public static int Hash<T>(this int hash, T obj)
{
var h = EqualityComparer<T>.Default.GetHashCode(obj);
return unchecked((hash * 31) + h);
}
}
用法:
public override int GetHashCode()
{
return HashCode.Start
.Hash(_field1)
.Hash(_field2)
.Hash(_field3);
}
如果要避免为System.Int32编写扩展方法:
public readonly struct HashCode
{
private readonly int _value;
public HashCode(int value) => _value = value;
public static HashCode Start { get; } = new HashCode(17);
public static implicit operator int(HashCode hash) => hash._value;
public HashCode Hash<T>(T obj)
{
var h = EqualityComparer<T>.Default.GetHashCode(obj);
return unchecked(new HashCode((_value * 31) + h));
}
public override int GetHashCode() => _value;
}
它仍然避免了任何堆分配,使用方式完全相同:
public override int GetHashCode()
{
// This time `HashCode.Start` is not an `Int32`, it's a `HashCode` instance.
// And the result is implicitly converted to `Int32`.
return HashCode.Start
.Hash(_field1)
.Hash(_field2)
.Hash(_field3);
}
编辑(2018年5月):EqualityComparer<T>。默认getter现在是JIT内在的-Stephen Toub在这篇博文中提到了pull请求。
微软引领了几种哈希方法。。。
//for classes that contain a single int value
return this.value;
//for classes that contain multiple int value
return x ^ y;
//for classes that contain single number bigger than int
return ((int)value ^ (int)(value >> 32));
//for classes that contain class instance fields which inherit from object
return obj1.GetHashCode();
//for classes that contain multiple class instance fields which inherit from object
return obj1.GetHashCode() ^ obj2.GetHashCode() ^ obj3.GetHashCode();
我可以猜测,对于多个大整数,您可以使用这个:
int a=((int)value1 ^ (int)(value1 >> 32));
int b=((int)value2 ^ (int)(value2 >> 32));
int c=((int)value3 ^ (int)(value3 >> 32));
return a ^ b ^ c;
对于多类型也是如此:首先使用GetHashCode()将所有类型转换为int然后int值将被xor'ed,结果是您的哈希值。
对于那些使用哈希作为ID(我的意思是一个唯一的值)的人来说,哈希自然被限制在数字个数,我认为哈希算法是5个字节,至少是MD5。
您可以将多个值转换为哈希值,其中一些值是相同的,因此不要将其用作标识符。(也许有一天我会使用你的组件)
可以尝试采用C++Boost库的方法。类似于:
class HashUtil
{
public static int HashCombine(int seed, int other)
{
unchecked
{
return other + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}
}
}
然后:
class MyClass
{
private string _field1;
private int _field2;
private AnotherClass _field3;
private YetAnotherClass _field4;
public override int GetHashCode()
{
int result = HashUtil.HashCombine(_field1.GetHashCode(), _field2);
result = HashUtil.HashCombine(result, _field3.GetHashCode());
return HashUtil.HashCombine(result, _field4.GetHashCode());
}
}