My last post, Persistence is a Solved Problem, got some mixed responses. This post is a bit overdue, as Jeff Iacono asked me to elaborate on the previous post almost a month ago:
@jeffreyiacono I’ll see what I can do. It’ll probably end up being a separate post if/when I get the time to write it :)
– Ernie Miller (@erniemiller) July 19, 2012
Anyway, better late than never, right? Here goes. Bear with me, this post is going to be heavy on philosophy and light on code.
The previous post’s title was more than a little hyperbolic. With that out of the way…
Why is “Persistence is a Solved Problem” a useful viewpoint to hold?
Imagine that you’re writing a new, non-trivial application. You haven’t written a line of code, yet. You’re just thinking through how the application might be structured, essentially modeling the domain of the application in your head. When an application is in this stage, you start out with tremendous flexibility in how you can choose to think about the problem you’re addressing.
Now, your thought process is going to fall somewhere on a continuum between “mental database schema” and “there is no spoon.”
I’ve found that my tendencies naturally lie toward the blue-shaded area of this continuum. The result is that I voluntarily and artificially constrain my thought process to the concepts easily described in terms of my chosen persistence layer. This, of course, means that I’m already deciding on implementation details (which persistence layer will I use as scaffolding for my thought process) before I’ve written a line of code, which narrows my thinking considerably.
Based on my review of many applications, especially Rails applications, I don’t believe I’m alone in this tendency. My taking the (admittedly extreme) view that persistence is a solved problem is an attempt to push that decision forward to some future point, removing friction while thinking about how best to model the domain.
Why do we (as Rails developers) gravitate toward the “mental database schema” way of thought?
First, because the RDBMS and its associated concepts have been so thoroughly ingrained in us. They’re like a trusty hammer. Until recent history, in fact, if I sat a group of developers down in a room, we’d possibly end up in a debate about which RDBMS we should use for the new application, but certainly not if we’d use one.
Second, because of the Active Record pattern (particularly ActiveRecord, as implemented in Rails), which presents a ridiculously low barrier to entry. I love ActiveRecord for lots of reasons, and a large portion of my open source work relates to ActiveRecord. It makes basic CRUD operations almost effortless. But let’s remember that the Active Record pattern existed before Rails was a sparkle in DHH’s eye, and Martin Fowler, who named the pattern, had this to say, in 2002, in Patterns of Enterprise Application Architecture (emphasis mine):
Active Record is a good choice for domain logic that isn’t too complex, such as creates, reads, updates, and deletes. Derivations and validations based on a single record work well in this structure. […] Their primary problem is that they work well only if the Active Record objects correspond directly to the database tables: an isomorphic schema. […] Another argument against Active Record is the fact that it couples the object design to the database design. This makes it more difficult to refactor either design as the project goes forward.
(Incidentally, Martin recommends the Data Mapper pattern as an alternative that addresses these concerns. Thankfully, we have a partial – thanks for the correction, @lsegal – implementation of this pattern in the Ruby world, too.)
So, the Active Record pattern excels for simple domain logic such as CRUD operations. It should come as no surprise, then, that following this particular well-worn path as we think about modeling our non-trivial application’s domain may well end with us looking up to see a dark forest surrounds us, and the path has all but disappeared.
This happens because we allow our persistence model to drive our domain model. The consequences to the resulting code will be that our system is harder to maintain.
Achieving freedom through ignorance
What I’ve come to believe is that this problem, allowing our persistence model to drive our domain model, doesn’t manifest itself only in our tightly-coupled code. It infects our thought process.
If you’ve built at least a few web applications, consider this: what is the most significant challenge you typically face when reasoning about a new application? With exceptions for applications requiring massive scalability on day one (hint: your application doesn’t), I’m going to guess that “how to read and write all those ones and zeroes” wasn’t your answer. It’s probably not even in your top three, if you really think about it. Sure, when to persist your data, or where to persist it, or what to do once it’s persisted (or read). But not how.
Maybe this seems like an oversimplification.
Have you ever inherited responsibility for a large application? Was the application a joy to maintain, or a nightmarish mudball of code with files you were frightened to even open, much less modify?
I’ve seen my fair share of both kinds of applications, and I’ve come to the conclusion that the most difficult part of reasoning about a new system is, in fact, figuring out how to model a system that is easy to reason about.
If success is achieved in this area, other questions will almost seem to answer themselves. Here are a few:
- What should this internal API look like?
- What tests should I write?
- How would I extend this object’s behavior?
Of course, now that we’ve decided the ideal way in which we’d like to interact with our objects, there’s one more big question that we can begin to make an informed decision about:
- What is the most effective way to implement object persistence, while best enabling our ideal method of reasoning about the problem domain?
At this point, it is always possible that we might find ourselves without a “perfect” solution to our persistence problem, and we are free to make some concessions in our domain model for the sake of performance or pragmatism. But now, we find ourselves at a vantage point from which we recognize the sacrifices we are making. More importantly, because our application will be easier to reason about, we’re in a stronger position to recover the ground lost by those concessions when a better option presents itself.
Jeff (and anyone else who’s still reading), I hope this better explains where I was coming from in the previous post.