October 18, 2012

No More Repositories

The Repository Pattern is an implementation pattern that is used primarily to encapsulate the persistence medium of an application away from the application's business logic. The result of implementing the repository pattern correctly is that a change to the application's persistence store (from MS-SQL to MySQL, or more generally from a RDBMS to XML stored on the file system) should not necessitate a change in the application's business logic.

Let me first say that I am not categorically against the idea of the Repository Pattern. I think it's both important and useful to encapsulate data access, and I've certainly used the Repository Pattern quite a bit in my career. In principle, the Repository Pattern is a perfectly good solution to the problem of creating a dependency relationship between application logic and data access. In practice, however, successfully implementing the repository pattern can be difficult.

Let's start with the obvious - data context. Where do you new up a data context for your repository? You could let your repositories manage their own data context instances, but that causes scoping issues and gets ugly fast. Okay, so then new up your context in your application logic and then pass that into your repository instances.... oh wait, nope... that breaks encapsulation, which is the whole point of using repositories. What to do? If you've had a course in software design you're probably jumping out of your seat screaming "ABSTRACT FACTORY PATTERN!!!" at the screen. As it turns out, that's the right way to go, though you have to realize that we've just jumped from 1 conceptual layer of abstraction away from our DAL to 3. Yikes. Is it worth it? Yes, absolutely, if there's any chance whatsoever that your data store or ORM will change at any point in the future. Will most developers implementing repositories do this? Absolutely not.

The second issue with using the Repository Pattern is scalability. Most applications start out small and have few areas that require high levels of performance, and in these cases one will hardly ever consider a repository to be an impediment to scaling the application. But, if the day should come that you need to create  two separate data stores to split up read and write operations then the Repository Pattern will fall woefully short. The reason for this inadequacy is pretty obvious: repositories as objects do not follow the principle of Command-Query Separation.

So what's the solution? Well, my solution has been to ditch repositories in favor of individual generic Command and Query classes, along with a CommandProcessor class that manages units of work, for my projects that are using EF as an ORM. This is called Command-Query Responsibility Segregation, or CQRS for short. I like that I now have a very clean separation of commands and queries, which allows me to use different data contexts connected to different databases for queries and commands, and I've additionally reduced  my code footprint by implementing generics.

I'm still working out details on managing the creation of data contexts, so I still haven't resolved that dependency. However, my current plan is to apply the Abstract Factory Pattern in conjuction with some interfaces and adapters to make a single data context interface that my client objects can call regardless of what the underlying query or command pattern is.

At the end of the day, repositories are not inherently bad or evil, and I don't think that everyone should go and get rid of them. Heck, I haven't even completely removed them from my code... yet. The problem, as I see it, is that repositories are difficult to implement with encapsulation in mind, or when performance is a critical consideration. That is why I am moving away from the Repository Pattern, and toward CQRS.

No comments: