January 30, 2013

Stop Asking Whether You Can Build It In...

...C, or C++, or Java, or PHP, or Perl, or HTML5, or AppleScript.

The answer is Yes.

You can build just about any type of application in just about any language, as there is sure to be a framework in whatever language you choose for whatever type of app you want to build.

So, please, I'm begging you, STOP asking if you can build an application in whatever language you happen to know.

Instead, start asking more specific questions.

For instance, let's say you're building a web application, and one of the application's non-functional requirements specifies that the application must work for all users, regardless of what browser plugins they have installed. If that's the case, you might want to ask a question such as
Do Java applets require a browser plugin?
Or, maybe one of the non-functional requirements for the web application is that it must be secure from SQL injection attacks, in which  case you might ask:
Does PHP support parameterized queries?
I think you get the idea. It's not very helpful to ask whether you can build something in a particular language, but it is helpful to ask whether a particular language supports the specific functionality that your application requires.

January 15, 2013

CodeSense Addendum: Separating Data From Representation

If you read CodeSense: Implementing Methods then you'll probably recall this small portion of the article regarding data conversions:
It is the responsibility of the client to perform data conversions, both for input parameters and return values. It's poor form to return an int as a string, accept a string when you are performing operations on a double, etc.
There's really a lot more to say about critical it is to separate data and the methods that act on data from representations and the methods that act on representations.

Data, Representation, What's the Difference?

When I say data, I mean undecorated, unadorned, concrete data. I'm talking about ints and bools and longs and strings, as well as the various structures and classes that can be built from those and other primitive types. When I say representation, I mean whatever structure might be holding data: JSON, HTML, XML, and delimited text are all examples of representation.

Here's a quick example, just to make sure we're on the same page, of some data represented in JSON and XML:

The data consists of labels and their associated values, while the representations are JSON and XML respectively. Representations use conventions and syntax to structure data in a way that makes it easy for disparate software systems to parse and make use of the data held within.

So What's the Problem?

The problem with data and its representation isn't with the data or the representation... it's with the methods that act upon the data and the representation. Take this method stub written in C# for example:

Do you see the problem with the method above?

The problem is that the method is written in such a way that it conflates an action on the data (saving) with an action on the representation (parsing JSON). Due to this conflation of concerns, our method no longer abides by the Single Responsibility Principle and as a result when we need to save a person object from another data format we'll have to create another method with a virtually identical algorithm (1. Parse data; 2. Populate Person object; 3. Persist Person object). Thus, the process of saving Person data is tightly coupled to the representation of that data.

"But," you protest, "what if I know that our application will only ever need to work with JSON structured data? Why should I care if my Save method is doing the parsing and saving then? Didn't you mention the YAGNI principle in that CodeSense post about methods?"

It is true that you don't want to write methods unnecessarily to implement not-yet-specified functional requirements. However, that does not mean that you cannot or should not design your API in such a way that it will be easier to implement future functionality when it is required. So, when it comes to data and representation is it better to have two methods - one to deal with the representation and another to deal with the data - than it is to have one method that deals with both data and representation.

That said, our updated C# example now looks like this:
As you can see it will be much more straight-forward now to integrate other data formats without re-writing the code that performs the Save operation, and we also have the benefit of reusability for both the method that parses Person data from JSON as well as the method that Saves a Person object.

How Do I Know If I Have a Problem?

There are a couple of easy ways to tell if you're co-mingling code that operates on data with code that operates on a representation of that data.

The easiest marker to find is if you have a method that operates on an object of type Type, but that object has to be derived within the method from structured data (XML, JSON, HTML, CSV, etc as above).

Another similar way to find this issue is when you have several methods with different signatures based on representation that have a significant percentage of common code operating on an object of type Type.

There are certainly other traits of code that has violated the separation of data from representation - these are just two general cases. You'll have to evaluate your own code to determine if you've mixed up data and representation in the same method. Once you've identified code that needs to change, make a plan to refactor it.

How Do I Prevent This From Happening to Me?

This is the Golden Rule of Data-Representation Separation:
A method that directly operates on an object of type Type should either take an object of type Type as a parameter, or should take a set of primitive parameters that represent a necessary and sufficient set of data needed to create an object of type Type.
In other words, when you're writing new code, this is okay: public void Save(Person person)
...and this is okay: public void Save(string firstName, string lastName, int age)
But this is not okay: public void Save(XmlDocument personXml)

Obviously the third signature above could be okay, so long as the method body composes a method that parses XML to create a Person object with another method that saves a Person object.

If you follow the Golden Rule of Data-Representation Separation when you write new code, you should not have any issues with keeping data and representation separate.

Thanks for checking out this article on data and representation, which is part of a new series called CodeSense. CodeSense is about making code easier to read and write by developing common sense conventions that can be applied immediately in nearly any programming language. Please contact me at brian (at) brian-driscoll (dot) com if you have questions, comments, or requests for future CodeSense topics.

January 9, 2013

Now Is A Good Time To Refactor

What are you doing right now?

Okay, besides reading this?

You know what you could be doing, don't you?


Don't start with rebuttals and excuses because I've heard them all before:
  • The codebase is too complex and intertwined with itself that you'd need to set aside 1,000 hours and 3 months on the production schedule to refactor it. 
  • The production schedule is too busy. 
  • We don't have enough PM/dev/QA resources. 
  • Etc...
Refactoring isn't rocket science, and any good developer knows that. The problem is that refactoring is often seen as a binary proposition: we have to refactor everything, otherwise we cannot refactor anything. However, that's simply not the case. You can make your code better bit by bit and block by block, and you have time to do that right now. Here's how...

Step 1: Identify

Where does it hurt most when you have to touch your codebase? While you're working on new features or fixing defects, identify the things that you'd like to make better. You don't need to make a fix now (or even know what the fix would be), just document the parts of the codebase that are painful to deal with. The thing to keep in mind here is that you don't have to go looking for huge chunks of code to fix. In fact, the smaller the problem the better, because it's easier to deal with.

Step 2: Justify

Why is that code so awful to work with? It could be any one of a number of things... the important thing is that you don't change code without justifying the change. If you don't know why the code is bad you have no realistic chance of making it better.

Step 3: Design

How would you improve the code? Your solution should directly address the pain point(s) from Step 2 that you used to justify why the code needs to change. You need to design your improvements in order to make sure that your code will still meet existing requirements without introducing new side effects. Designing a solution to the problem will also help you to quickly see if your solution is just as painful as the problem itself.

Step 4: Evaluate

Once you've completed steps 1-3, ask yourself: is it worth the time and effort to refactor? If the benefit of your change doesn't outweigh the implementation and testing costs associated with it, then it's probably not worth it. How do you know? Well, you'll need to estimate some things:
  • Future technical debt resolved by refactoring (benefit).
    • Technical debt encompasses many things, including: readability, cohesion, coupling, and regression, among others.
    • You should not use past technical debt as a measure, as it is considered a sunk cost.
  • Future non-functional improvements, if any (benefit).
    • Non-functional improvements include improvements to performance, security, and administration.
  • Estimated time to implement and test the refactored code (cost).
  • Future non-functional costs, if any (cost).
Remember that you're just estimating the items above, so you don't need to be too scientific; you're just trying to ballpark some figures to see if there's an obvious disparity in costs vs. benefits.

Step 5: Implement

Once you've evaluated that your change is worth making, you can (and should) make the change. This is where I usually hear the excuses and reasons for not refactoring. If you've identified a reasonably small piece of code to change, and have designed a solution for it, then the implementation should take a relatively small amount of time. Heck, I'd even argue that you don't need to put it on the production schedule - just do it when you have a few minutes between tasks.

If you need to implement a larger change, see if you can break it down into smaller chunks and do it one piece at a time. If that's not possible, then put the refactoring task on your production schedule and get it done in the next sprint!

Step 6: Test

The only time I've ever seen a refactoring effort fail (and thus refactoring becomes taboo...) is when refactored code is not tested as if it's a new feature or change request. Seriously folks, you need to test your changes, regardless of how small or insignificant you think they are. "This code does the same thing as the old code, only better!" is not a good justification for not testing.

Armed with the 5 steps above, refactoring your code should be a much easier and more organized process, and you should now see it as a process that happens in small chunks rather than as a large paradigm-shifting effort.

So, now, go forth and REFACTOR!

What are you waiting for?