在.NET中,GetHashCode方法在整个.NET基类库的许多地方都使用。正确执行它对于在集合中或确定相等时快速查找项目尤为重要。

对于如何为自定义类实现GetHashCode,是否有标准算法或最佳实践,以便不会降低性能?


当前回答

这是一个很好的例子:

/// <summary>
/// Helper class for generating hash codes suitable 
/// for use in hashing algorithms and data structures like a hash table. 
/// </summary>
public static class HashCodeHelper
{
    private static int GetHashCodeInternal(int key1, int key2)
    {
        unchecked
        {
           var num = 0x7e53a269;
           num = (-1521134295 * num) + key1;
           num += (num << 10);
           num ^= (num >> 6);

           num = ((-1521134295 * num) + key2);
           num += (num << 10);
           num ^= (num >> 6);

           return num;
        }
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="arr">An array of objects used for generating the 
    /// hash code.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode(params object[] arr)
    {
        int hash = 0;
        foreach (var item in arr)
            hash = GetHashCodeInternal(hash, item.GetHashCode());
        return hash;
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <param name="obj4">The fourth object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and
    /// data structures like a hash table.
    /// </returns>
    public static int GetHashCode<T1, T2, T3, T4>(T1 obj1, T2 obj2, T3 obj3,
        T4 obj4)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3, obj4));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2, T3>(T1 obj1, T2 obj2, T3 obj3)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2>(T1 obj1, T2 obj2)
    {
        return GetHashCodeInternal(obj1.GetHashCode(), obj2.GetHashCode());
    }
}

下面是如何使用它:

private struct Key
{
    private Type _type;
    private string _field;

    public Type Type { get { return _type; } }
    public string Field { get { return _field; } }

    public Key(Type type, string field)
    {
        _type = type;
        _field = field;
    }

    public override int GetHashCode()
    {
        return HashCodeHelper.GetHashCode(_field, _type);
    }

    public override bool Equals(object obj)
    {
        if (!(obj is Key))
            return false;
        var tf = (Key)obj;
        return tf._field.Equals(_field) && tf._type.Equals(_type);
    }
}

其他回答

我在Helper库中有一个Hashing类,用于此目的。

/// <summary> 
/// This is a simple hashing function from Robert Sedgwicks Hashing in C book.
/// Also, some simple optimizations to the algorithm in order to speed up
/// its hashing process have been added. from: www.partow.net
/// </summary>
/// <param name="input">array of objects, parameters combination that you need
/// to get a unique hash code for them</param>
/// <returns>Hash code</returns>
public static int RSHash(params object[] input)
{
    const int b = 378551;
    int a = 63689;
    int hash = 0;

    // If it overflows then just wrap around
    unchecked
    {
        for (int i = 0; i < input.Length; i++)
        {
            if (input[i] != null)
            {
                hash = hash * a + input[i].GetHashCode();
                a = a * b;
            }
        }
    }

    return hash;
}

然后,只需将其用作:

public override int GetHashCode()
{
    return Hashing.RSHash(_field1, _field2, _field3);
}

我没有评估它的表现,所以欢迎任何反馈。

我的大部分工作都是通过数据库连接完成的,这意味着我的类都具有来自数据库的唯一标识符。我总是使用数据库中的ID来生成哈希代码。

// Unique ID from database
private int _id;

...    
{
  return _id.GetHashCode();
}

这是我使用JonSkeet实现的助手类。

public static class HashCode
{
    public const int Start = 17;

    public static int Hash<T>(this int hash, T obj)
    {
        var h = EqualityComparer<T>.Default.GetHashCode(obj);
        return unchecked((hash * 31) + h);
    }
}

用法:

public override int GetHashCode()
{
    return HashCode.Start
        .Hash(_field1)
        .Hash(_field2)
        .Hash(_field3);
}

如果要避免为System.Int32编写扩展方法:

public readonly struct HashCode
{
    private readonly int _value;

    public HashCode(int value) => _value = value;

    public static HashCode Start { get; } = new HashCode(17);

    public static implicit operator int(HashCode hash) => hash._value;

    public HashCode Hash<T>(T obj)
    {
        var h = EqualityComparer<T>.Default.GetHashCode(obj);
        return unchecked(new HashCode((_value * 31) + h));
    }

    public override int GetHashCode() => _value;
}

它仍然避免了任何堆分配,使用方式完全相同:

public override int GetHashCode()
{
    // This time `HashCode.Start` is not an `Int32`, it's a `HashCode` instance.
    // And the result is implicitly converted to `Int32`.
    return HashCode.Start
        .Hash(_field1)
        .Hash(_field2)     
        .Hash(_field3);
}

编辑(2018年5月):EqualityComparer<T>。默认getter现在是JIT内在的-Stephen Toub在这篇博文中提到了pull请求。

微软引领了几种哈希方法。。。

//for classes that contain a single int value
return this.value;

//for classes that contain multiple int value
return x ^ y;

//for classes that contain single number bigger than int    
return ((int)value ^ (int)(value >> 32)); 

//for classes that contain class instance fields which inherit from object
return obj1.GetHashCode();

//for classes that contain multiple class instance fields which inherit from object
return obj1.GetHashCode() ^ obj2.GetHashCode() ^ obj3.GetHashCode(); 

我可以猜测,对于多个大整数,您可以使用这个:

int a=((int)value1 ^ (int)(value1 >> 32));
int b=((int)value2 ^ (int)(value2 >> 32));
int c=((int)value3 ^ (int)(value3 >> 32));
return a ^ b ^ c;

对于多类型也是如此:首先使用GetHashCode()将所有类型转换为int然后int值将被xor'ed,结果是您的哈希值。

对于那些使用哈希作为ID(我的意思是一个唯一的值)的人来说,哈希自然被限制在数字个数,我认为哈希算法是5个字节,至少是MD5。

您可以将多个值转换为哈希值,其中一些值是相同的,因此不要将其用作标识符。(也许有一天我会使用你的组件)

可以尝试采用C++Boost库的方法。类似于:

class HashUtil
{
  public static int HashCombine(int seed, int other)
  {
    unchecked
    {
      return other + 0x9e3779b9 + (seed << 6) + (seed >> 2);
    }
  }
}

然后:

class MyClass
{
  private string _field1;
  private int _field2;
  private AnotherClass _field3;
  private YetAnotherClass _field4;

  public override int GetHashCode()
  {
    int result = HashUtil.HashCombine(_field1.GetHashCode(), _field2);
    result = HashUtil.HashCombine(result, _field3.GetHashCode());
    return HashUtil.HashCombine(result, _field4.GetHashCode());
  }
}