January 15, 2013

CodeSense Addendum: Separating Data From Representation

If you read CodeSense: Implementing Methods then you'll probably recall this small portion of the article regarding data conversions:
It is the responsibility of the client to perform data conversions, both for input parameters and return values. It's poor form to return an int as a string, accept a string when you are performing operations on a double, etc.
There's really a lot more to say about critical it is to separate data and the methods that act on data from representations and the methods that act on representations.

Data, Representation, What's the Difference?

When I say data, I mean undecorated, unadorned, concrete data. I'm talking about ints and bools and longs and strings, as well as the various structures and classes that can be built from those and other primitive types. When I say representation, I mean whatever structure might be holding data: JSON, HTML, XML, and delimited text are all examples of representation.

Here's a quick example, just to make sure we're on the same page, of some data represented in JSON and XML:

The data consists of labels and their associated values, while the representations are JSON and XML respectively. Representations use conventions and syntax to structure data in a way that makes it easy for disparate software systems to parse and make use of the data held within.

So What's the Problem?

The problem with data and its representation isn't with the data or the representation... it's with the methods that act upon the data and the representation. Take this method stub written in C# for example:


Do you see the problem with the method above?

The problem is that the method is written in such a way that it conflates an action on the data (saving) with an action on the representation (parsing JSON). Due to this conflation of concerns, our method no longer abides by the Single Responsibility Principle and as a result when we need to save a person object from another data format we'll have to create another method with a virtually identical algorithm (1. Parse data; 2. Populate Person object; 3. Persist Person object). Thus, the process of saving Person data is tightly coupled to the representation of that data.

"But," you protest, "what if I know that our application will only ever need to work with JSON structured data? Why should I care if my Save method is doing the parsing and saving then? Didn't you mention the YAGNI principle in that CodeSense post about methods?"

It is true that you don't want to write methods unnecessarily to implement not-yet-specified functional requirements. However, that does not mean that you cannot or should not design your API in such a way that it will be easier to implement future functionality when it is required. So, when it comes to data and representation is it better to have two methods - one to deal with the representation and another to deal with the data - than it is to have one method that deals with both data and representation.

That said, our updated C# example now looks like this:
As you can see it will be much more straight-forward now to integrate other data formats without re-writing the code that performs the Save operation, and we also have the benefit of reusability for both the method that parses Person data from JSON as well as the method that Saves a Person object.

How Do I Know If I Have a Problem?

There are a couple of easy ways to tell if you're co-mingling code that operates on data with code that operates on a representation of that data.

The easiest marker to find is if you have a method that operates on an object of type Type, but that object has to be derived within the method from structured data (XML, JSON, HTML, CSV, etc as above).

Another similar way to find this issue is when you have several methods with different signatures based on representation that have a significant percentage of common code operating on an object of type Type.

There are certainly other traits of code that has violated the separation of data from representation - these are just two general cases. You'll have to evaluate your own code to determine if you've mixed up data and representation in the same method. Once you've identified code that needs to change, make a plan to refactor it.

How Do I Prevent This From Happening to Me?

This is the Golden Rule of Data-Representation Separation:
A method that directly operates on an object of type Type should either take an object of type Type as a parameter, or should take a set of primitive parameters that represent a necessary and sufficient set of data needed to create an object of type Type.
In other words, when you're writing new code, this is okay: public void Save(Person person)
...and this is okay: public void Save(string firstName, string lastName, int age)
But this is not okay: public void Save(XmlDocument personXml)

Obviously the third signature above could be okay, so long as the method body composes a method that parses XML to create a Person object with another method that saves a Person object.

If you follow the Golden Rule of Data-Representation Separation when you write new code, you should not have any issues with keeping data and representation separate.



Thanks for checking out this article on data and representation, which is part of a new series called CodeSense. CodeSense is about making code easier to read and write by developing common sense conventions that can be applied immediately in nearly any programming language. Please contact me at brian (at) brian-driscoll (dot) com if you have questions, comments, or requests for future CodeSense topics.
Post a Comment