The Right Way to do Equality in C#
One of the pitfalls of doing development in C#, Java, C++, or really any predominantly Object Oriented language (OOP) is how “equality” is defined.
In C#, for instance, you have the following methods that are built into every object
:
object.Equals
- the
==
operator ReferenceEquals
, for explicitly checking reference equality
My personal opinion: in any managed language, checking for referential equality is a pretty bad default - and if you’re working with immutable objects then equality by reference doesn’t really work.
In C/C++, where pointer arithmetic and knowing the precise location of something in memory matters it’s a different story. Equality by reference is the correct default in that case.
What’s the right thing to do?
Equality by value - i.e. determining if two objects are equal by comparing their content.
In Akka.NET all message classes are supposed to be immutable, which means a reference to a message is useless as soon as someone modifies it (because it creates a copy.) Therefore the Akka.NET development team has had a lot of practice implementing equality by value on many of the built-in message classes.
Here’s what that technique looks like:
- Implement
IEquatable<T>
for your class (whereT
is the class;) - Override
object.Equals(object o)
; and - [Special cases] Override
object.GetHashCode()
using a high-entropy function, with caveats.
1. Add a little IEquatable<T>
love for your class
The [IEquatable
bool Equals(T other);
Nothing complicated here.
The important distinction between this
Equals
method and the built-inobject.Equals(object o)
one that comes with every .NET object is that when you’re comparing two classes of typeT
theIEquatable<T>
method is what gets called as it’s the most specific match.
Here’s an example using an actual POCO class:
public class Foo : IEquatable<Foo>{
public int MyNum {get; set;}
public string MyStr {get; set;}
public DateTime Time {get; set;}
#region Equality
public bool Equals(Foo other){
throw new NotImplementedException();
}
#endregion
}
Time to implement the Equals(T other)
method. This is tedious, but straightforward: we want to determine that the values of each individual property are equal. All properties that you want to include for comparisons must be equal in order for two object instances to be equal, so here’s what that would look like for this implementation:
public bool Equals(Foo other){
if(other == null) return false;
return MyNum == other.MyNum &&
Time == other.Time &&
string.Equals(MyStr, other.MyStr);
}
In this case we’re comparing all of the properties, because they’re simple. I’m paranoid about running into a NullReferenceException
so I throw in a test to see if other
is null
immediately. string
is a nullable type, so but string.Equals
will be able to return true
or false
without throwing an exception even if one or both of the strings are null
.
But what if one of my properties was another custom POCO object, say a
Bar
class? How would I perform this equality by value check forFoo
? Well here’s the bad news -Bar
and any other class used as a property ofFoo
also have to be equality by value.
It’s because of this that implementing equality by value in C# often feels like a yak-shaving exercise, but the reward is worth the pain.
2. Tune up object.Equals(object o)
Most of the hard work in this process goes into step 1. Step 2 is pretty boilerplate by contrast:
public class Foo : IEquatable<Foo>{
public int MyNum {get; set;}
public string MyStr {get; set;}
public DateTime Time {get; set;}
#region Equality
public bool Equals(Foo other){
if(other == null) return false;
return MyNum == other.MyNum &&
Time == other.Time &&
string.Equals(MyStr, other.MyStr);
}
public override bool Equals(object obj){
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != GetType()) return false;
return Equals(obj as Foo);
}
#endregion
}
We override the object.Equals
method and replace it with some boilerplate code that builds upon our work with the IEquatable<Foo>.Equals(Foo other)
method:
- Use
ReferenceEquals
to determine ofobj
isnull
- immediately returnfalse
if that’s the case. - Use
ReferenceEquals
ifobj
actually does refer tothis
and returntrue
. - Check to see if the
Type
ofobj
is the equal to our currentType
- returnfalse
if otherwise. - Cast
obj
toFoo
and hand it off toEquals(Foo other)
to do all of the work we did in step 1.
Calls 1-2 on this list are normally all that the object.Equals
method does.
Last step!
3. Use some prime numbers and bit-shifting to get a unique GetHashCode
So we have the ability to determine if two Foo
instances are equal by value, but we haven’t fixed their GetHashCode
functions yet. What does this mean?
Well, if equality by value was important to you and you wanted to use Foo
in a HashSet<Foo>
, you could end up adding two Foo
instances with identical values to your HashSet<Foo>
by accident - because that collection keys off of the hashcode of each Foo
object to determine uniqueness.
There are lots of other cases where the hashcode gets used by built-in pieces of the .NET framework, so it’s critical that we override GetHashCode
.
public class Foo : IEquatable<Foo>{
public int MyNum {get; set;}
public string MyStr {get; set;}
public DateTime Time {get; set;}
#region Equality
public bool Equals(Foo other){
if(other == null) return false;
return MyNum == other.MyNum &&
Time == other.Time &&
string.Equals(MyStr, other.MyStr);
}
public override bool Equals(object obj){
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != GetType()) return false;
return Equals(obj as Foo);
}
public override int GetHashCode(){
unchecked{
var hashCode = 13;
hashCode = (hashCode * 397) ^ MyNum;
var myStrHashCode =
!string.IsNullOrEmpty(MyStr) ?
MyStr.GetHashCode() : 0;
hashCode = (hashCode * 397) ^ MyStr ;
hashCode =
(hashCode * 397) ^ Time.GetHashCode();
return hashCode;
}
}
#endregion
}
Ok, weird unchecked
keyword and lots of prime numbers for some reason - what the hell is going on?
We don’t really care what the value of
hashCode
is - all we care about is that it’s an extremely unique hash code that only another object with the exact same values could provide.
We’re going to use a computation technique that provides a reproducible hashcode for all equal-by-value instances of Foo
and makes the likelihood of a hash collision extremely low.
If you want to read an explanation of this technique written by someone who’s much more talented than I am, check out Jon Skeet’s answer about C# GetHashCode functions on StackOverflow.
First, we use the unchecked
keyword to the let the CLR know that we don’t care if hashCode
overflows or underflows in this instance - all we care about is the value.
Second - we pick two different prime numbers, one to act as the seed for the hash and the other to be used as part of our hash multiplier. For each property we need to include in the hash function we multiply the current hash times the prime and then we incorporate the hashcode of each individual member.
In my case I’m using bitwise or (^
) but you could just use addition and achieve and equally robust results.
When do I really need to override GetHashCode
?
An important caveat - you should only override GetHashCode
if your objects are immutable.
If the properties of your
Foo
class can change and change theGetHashCode
result, any collection (List<Foo>
, etc…) will behave unpredictably or throw an exception if the hashcode of an individualFoo
instance in the collection changes.
You can still follow steps 1 and 2 of this guide and have solid equality by value semantics, but only override the hashcode when you’re designing a class to be immutable from the get-go.