Toward a More Egalitarian ObjectSpace

Comments

Egalitarianism (from French égal, meaning “equal”) is a trend of thought that favors equality among living entities.
Wikipedia

So, I may have taken a bit of poetic licence with this post’s title. A completely egalitarian ObjectSpace would be insane. We don’t want all of our objects to be equal, but we do generally want the ability to recognize when two objects are equal. Today, I’d like to share a story about an ActiveRecord refactoring I worked on over the weekend, and how it relates to the topic.

Relation merges and equality conditions

When you merge two ActiveRecord::Relation objects, ActiveRecord makes a few decisions on your behalf. One of them involves what to do if both relations have equality conditions against the same attribute. When you pass a hash of column/value pairs to Relation#where, ActiveRecord’s PredicateBuilder creates ARel Equality nodes (or In nodes, in the case of an array or range value). Here’s an example (altered for legibility/brevity):

relation = Person.where(:name => 'Bob').where_values
# => [
#      #<Arel::Nodes::Equality:0x1075fbfb0
#      @left=#<struct Arel::Attributes::String relation=#<Arel::Table (...)>,
#      @right="Bob", (...)
#    ]

When you merge two relations, ActiveRecord normally just appends the where_values of the second relation to those of the first relation, which results in all of the conditions being joined by AND in the resulting SQL. But what happens if we next write:

relation = relation.merge(Person.where(:name => 'Jim'))

Well, generating a query that selects all people whose name is Bob and whose name is Jim makes no sense, since both conditions can’t be true. As you probably already know, ActiveRecord uses the last equality node. In this case, that means we get all people whose name is Jim.

Here’s the code, straight from ActiveRecord that made this work:

lib/active_record/relation/merger.rb (from 8f58d6e5 in master):

unless relation.where_values.empty?
  # Remove duplicates, last one wins.
  seen = Hash.new { |h,table| h[table] = {} }
  merged_wheres = merged_wheres.reverse.reject { |w|
    nuke = false
    if w.respond_to?(:operator) && w.operator == :==
      name              = w.left.name
      table             = w.left.relation.name
      nuke              = seen[table][name]
      seen[table][name] = true
    end
    nuke
  }.reverse
end

This code reverses the order of the combined where_values, and if any of the objects quack like an Equality node, it notes the ARel Attribute on the left hand side has been seen, so it can discard any later Equality nodes. When the array is again reversed, this has the effect of keeping only the last occurrence of each Equality condition against a specific attribute.

Problem? trollface

Yes, there’s a problem here. It makes sense to only do this for Equality nodes, but what happens if the left hand side of the Equality node isn’t an attribute? What if it’s a function, instead? This happens:

undefined method `relation’ for #<Arel::Nodes::NamedFunction>

This is because once we’ve detected an Equality node, we make an assumption that the left hand of that node is an ARel Attribute. That’s unfortunate, as it’s unnecessarily limiting. As a quick fix, we can tweak things like so, to prevent an exception being raised:

# We might have non-attributes on the left side of equality nodes,
# so we need to make sure they quack like an attribute.
if w.respond_to?(:operator) && w.operator == :== &&
  w.left.respond_to?(:relation)
  name              = w.left.name
  table             = w.left.relation.name
  nuke              = seen[table][name]
  seen[table][name] = true
end

But, as you can probably imagine, that’s only a partial fix. Now, we’ll ignore that Equality node altogether, meaning that we could end up with two conflicting equality conditions against identical functions. Far from ideal.

Refactoring for fun and profit

So, it’d be really great if we could just satisfy the obvious intent of the code in a more straightforward manner: Given an array of where conditions, we want to keep only the last equality condition for each left-hand value. Wouldn’t it be nice if we didn’t need to know anything at all about what that left-hand object was, and could just write:

# Remove equalities with duplicated left-hand. Last one wins.
seen = {}
merged_wheres = merged_wheres.reverse.reject { |w|
  nuke = false
  if w.respond_to?(:operator) && w.operator == :==
    nuke         = seen[w.left]
    seen[w.left] = true
  end
  nuke
}.reverse

Good news: We can! But first, we have to do some work. :(

What constitutes equality?

The answer to that question depends on who’s asking. And, for that matter, how they ask it. Oh, and also, what your object considers an equal.

The default Ruby implementation of object equality is extremely narrow. To demonstrate:

class MyNiftyObject # < Object is implied
  def initialize(value)
    @value = value
  end
end
 
MyNiftyObject.new('hi') == MyNiftyObject.new('hi') # => false

What? This is because the Object class has one concept of equality: two Objects are equal if and only if they refer to the exact same instance in memory. We created two instances of MyNiftyObject, and just because they happen to have the same values doesn’t mean they’re “equal” to Ruby.

Think about it: how else could you really compare two instances of the Object class, which has no attributes? Ruby leaves it up to the subclasses of Object to implement the various equality checks in ways that make sense for them, but it provides some guidelines. Here’s a quick rundown, by decreasing strictness:

  • equal? – Shouldn’t be overridden. Returns true if both objects are literally the same object (the same in-memory instance)
  • eql? – Should return true if both objects have the same value. This is normally synonymous with ==, but there can be subtle differences. Could be more accurately described by saying “objects have equal values, and are also the same class”. For instance, Numerics like Float and Fixnum return false when compared with 1.eql?(1.0)
  • == – Should return true if the objects have the same values. Again, except in certain odd cases, synonymous with eql?
  • ===? – More commonly referred to as “case equality”. Normally the same as == but often overridden to provide useful behavior in case statements (each when value will call its === method to compare with the value being checked). For instance, Class#===(object) behaves like object.is_a?(Class). Note that other methods, like Enumerable#grep may call this as well.

So, with these considerations in mind, your first instinct might reasonably be to try defining the ==(other) method on your objects. HAH! Your first instinct would be WRONG! But, for giggles, let’s try it.

class MyNiftyObject # < Object is implied
  attr_reader :value
 
  def initialize(value)
    @value = value
  end
 
  def ==(other)
    self.class == other.class && # Not just any object with a "value" attr
      self.value == other.value
  end
end
 
MyNiftyObject.new('hi') == MyNiftyObject.new('hi') # => true

“I thought you said my instinct was wrong, Mr. Smarty Q. Pantsington, III,” I hear you saying. Not so fast! Remember, we wanted to implement object equality in a way that would work with our “seen” hash, in ActiveRecord. Let’s try it.

hi1 = MyNiftyObject.new 'hi'
hi2 = MyNiftyObject.new 'hi'
hi1 == hi2 # => true
hi_hash = {}
hi_hash[hi1] = "hello, good sir"
hi_hash[hi2] = "i hope this missive finds you well"
hi_hash.size   # => 2
hi_hash.values # => ["hello, good sir", "i hope this missive finds you well"]

Well, that sucked.

The oddly-yet-aptly-named Object#hash

Why didn’t it work? Because there’s another Object method we need to implement, Object#hash. What’s that do? Let’s consult the docs:

Generates a Fixnum hash value for this object. This function must have the property that a.eql?(b) implies a.hash == b.hash. The hash value is used by class Hash. Any hash value that exceeds the capacity of a Fixnum will be truncated before being used.

The first time I read this in the docs, I’ll admit that I still had no idea what the method needed to do, or how, exactly, the Hash class used this method. In English, here’s how this all fits together:

The hash method is sort of an optimization trick used by Ruby. Certain Ruby classes (Hash and Array, notably) make use of a hash table (see st.c in the Ruby source) which “bins” data based on the Fixnums returned by their hash methods. As more values are stored in the hash, the number of bins can be increased. Since it’s super-efficient to look up entries in this table, it can be used to quickly determine whether there’s a chance the object might exist, and roughly where to find it, before bothering to actually retrieve or set it.

Using this trick, a Hash knows very quickly whether it can just stow your data in the given key, because there’s no chance of needing to overwrite an existing object, or whether it needs to do a bit more legwork. Likewise, an Array knows whether it might need to remove an object when you call uniq. There’s a bit more to it than that, but this post is already too long, so let’s skip to the chase, shall we? The short version is that unless your hash method returns the same value as another object’s hash method, Ruby won’t even bother calling your object’s eql? method, which is why our little experiment failed.

Implementing a custom hash method

How should we write a custom hash method? If you only have one instance variable in your class, it’s easy. Just have your hash method call its hash method, and you’ll be set. If you have more than one, there are a couple of ways to do it. The general idea is that whatever method you choose should return a Fixnum, and not a Bignum, so it’s probably not best to just add all of your instance variables’ hashes together. That being said, Ruby is kind enough to truncate the value for you if you violate this “rule”, or even coerce your returned value to a Fixnum if you return something really crazy (as long as whatever you return implements to_int).

My personal favorite way to handle hashing an object with multiple instance variables is to let Array do the work for me:

def hash
  [@var1, @var2, @var3].hash
end

Back to the problem at hand

Remember our friend ActiveRecord::Relation#merge? If we want to use objects as keys in that “seen” hash, we have some work to do.

At a minimum, we need to implement the hash and eql? methods on any objects we think might end up on the left side of the Equality nodes. But taking it one step further, it’s always a good idea to implement the == method, as well. Thankfully, that’s as easy an aliasing it to eql?.

For a moment, pretending our little sample class above was going to be in the left-hand of an equality node, this is our minimal implementation:

class MyNiftyObject
  attr_reader :value
 
  def initialize(value)
    @value = value
  end
 
  def hash
    @value.hash
  end
 
  def eql?(other)
    self.class == other.class &&
      self.value == other.value
  end
  alias :== :eql?
end

With ARel, it’s a bit trickier, because most nodes have more than one instance variable (in some cases many more), and even Arel::Attributes::Attribute, which inherits from a Struct (which already has a sane default implementation of the methods shown above), contains an Arel::Table, which itself must implement equality tests, and so on… Equality all the way down.

What’s that end up looking like? A whole lot of typing. But in the end, we get what we wanted!

Great. So when should I actually go through this hassle?

It really depends. Do you have complete control over how your objects are used? Then it’s probably up to you. If you’re writing code intended for reuse (and aren’t you always?) then you should consider whether it might be the case that a user of your class (even the future you) might want to compare instances for equality, or use Array#uniq on an array containing your objects, or… My personal rule is this: If you’re wondering whether you should write equality methods for your class, you probably should.

As for the hassle? Well, it’s really not much hassle at all, in most cases. But after typing all that code for ARel, I decided to make it easier, anyway.

Equivalence

Equivalence is a tiny gem that implements the equality code discussed in this post via a macro. Check the README for more info. It’s easy enough for the most common use cases that you have no excuse, now.

Go forth, and write egalitarian code!

comments powered by Disqus