January 30, 2013

Stop Asking Whether You Can Build It In...

...C, or C++, or Java, or PHP, or Perl, or HTML5, or AppleScript.

The answer is Yes.

You can build just about any type of application in just about any language, as there is sure to be a framework in whatever language you choose for whatever type of app you want to build.

So, please, I'm begging you, STOP asking if you can build an application in whatever language you happen to know.

Instead, start asking more specific questions.

For instance, let's say you're building a web application, and one of the application's non-functional requirements specifies that the application must work for all users, regardless of what browser plugins they have installed. If that's the case, you might want to ask a question such as
Do Java applets require a browser plugin?
Or, maybe one of the non-functional requirements for the web application is that it must be secure from SQL injection attacks, in which  case you might ask:
Does PHP support parameterized queries?
I think you get the idea. It's not very helpful to ask whether you can build something in a particular language, but it is helpful to ask whether a particular language supports the specific functionality that your application requires.


January 15, 2013

CodeSense Addendum: Separating Data From Representation

If you read CodeSense: Implementing Methods then you'll probably recall this small portion of the article regarding data conversions:
It is the responsibility of the client to perform data conversions, both for input parameters and return values. It's poor form to return an int as a string, accept a string when you are performing operations on a double, etc.
There's really a lot more to say about critical it is to separate data and the methods that act on data from representations and the methods that act on representations.

Data, Representation, What's the Difference?

When I say data, I mean undecorated, unadorned, concrete data. I'm talking about ints and bools and longs and strings, as well as the various structures and classes that can be built from those and other primitive types. When I say representation, I mean whatever structure might be holding data: JSON, HTML, XML, and delimited text are all examples of representation.

Here's a quick example, just to make sure we're on the same page, of some data represented in JSON and XML:

The data consists of labels and their associated values, while the representations are JSON and XML respectively. Representations use conventions and syntax to structure data in a way that makes it easy for disparate software systems to parse and make use of the data held within.

So What's the Problem?

The problem with data and its representation isn't with the data or the representation... it's with the methods that act upon the data and the representation. Take this method stub written in C# for example:


Do you see the problem with the method above?

The problem is that the method is written in such a way that it conflates an action on the data (saving) with an action on the representation (parsing JSON). Due to this conflation of concerns, our method no longer abides by the Single Responsibility Principle and as a result when we need to save a person object from another data format we'll have to create another method with a virtually identical algorithm (1. Parse data; 2. Populate Person object; 3. Persist Person object). Thus, the process of saving Person data is tightly coupled to the representation of that data.

"But," you protest, "what if I know that our application will only ever need to work with JSON structured data? Why should I care if my Save method is doing the parsing and saving then? Didn't you mention the YAGNI principle in that CodeSense post about methods?"

It is true that you don't want to write methods unnecessarily to implement not-yet-specified functional requirements. However, that does not mean that you cannot or should not design your API in such a way that it will be easier to implement future functionality when it is required. So, when it comes to data and representation is it better to have two methods - one to deal with the representation and another to deal with the data - than it is to have one method that deals with both data and representation.

That said, our updated C# example now looks like this:
As you can see it will be much more straight-forward now to integrate other data formats without re-writing the code that performs the Save operation, and we also have the benefit of reusability for both the method that parses Person data from JSON as well as the method that Saves a Person object.

How Do I Know If I Have a Problem?

There are a couple of easy ways to tell if you're co-mingling code that operates on data with code that operates on a representation of that data.

The easiest marker to find is if you have a method that operates on an object of type Type, but that object has to be derived within the method from structured data (XML, JSON, HTML, CSV, etc as above).

Another similar way to find this issue is when you have several methods with different signatures based on representation that have a significant percentage of common code operating on an object of type Type.

There are certainly other traits of code that has violated the separation of data from representation - these are just two general cases. You'll have to evaluate your own code to determine if you've mixed up data and representation in the same method. Once you've identified code that needs to change, make a plan to refactor it.

How Do I Prevent This From Happening to Me?

This is the Golden Rule of Data-Representation Separation:
A method that directly operates on an object of type Type should either take an object of type Type as a parameter, or should take a set of primitive parameters that represent a necessary and sufficient set of data needed to create an object of type Type.
In other words, when you're writing new code, this is okay: public void Save(Person person)
...and this is okay: public void Save(string firstName, string lastName, int age)
But this is not okay: public void Save(XmlDocument personXml)

Obviously the third signature above could be okay, so long as the method body composes a method that parses XML to create a Person object with another method that saves a Person object.

If you follow the Golden Rule of Data-Representation Separation when you write new code, you should not have any issues with keeping data and representation separate.



Thanks for checking out this article on data and representation, which is part of a new series called CodeSense. CodeSense is about making code easier to read and write by developing common sense conventions that can be applied immediately in nearly any programming language. Please contact me at brian (at) brian-driscoll (dot) com if you have questions, comments, or requests for future CodeSense topics.

January 9, 2013

Now Is A Good Time To Refactor

What are you doing right now?

Okay, besides reading this?

You know what you could be doing, don't you?

REFACTORING!

Don't start with rebuttals and excuses because I've heard them all before:
  • The codebase is too complex and intertwined with itself that you'd need to set aside 1,000 hours and 3 months on the production schedule to refactor it. 
  • The production schedule is too busy. 
  • We don't have enough PM/dev/QA resources. 
  • Etc...
Refactoring isn't rocket science, and any good developer knows that. The problem is that refactoring is often seen as a binary proposition: we have to refactor everything, otherwise we cannot refactor anything. However, that's simply not the case. You can make your code better bit by bit and block by block, and you have time to do that right now. Here's how...

Step 1: Identify

Where does it hurt most when you have to touch your codebase? While you're working on new features or fixing defects, identify the things that you'd like to make better. You don't need to make a fix now (or even know what the fix would be), just document the parts of the codebase that are painful to deal with. The thing to keep in mind here is that you don't have to go looking for huge chunks of code to fix. In fact, the smaller the problem the better, because it's easier to deal with.

Step 2: Justify

Why is that code so awful to work with? It could be any one of a number of things... the important thing is that you don't change code without justifying the change. If you don't know why the code is bad you have no realistic chance of making it better.

Step 3: Design

How would you improve the code? Your solution should directly address the pain point(s) from Step 2 that you used to justify why the code needs to change. You need to design your improvements in order to make sure that your code will still meet existing requirements without introducing new side effects. Designing a solution to the problem will also help you to quickly see if your solution is just as painful as the problem itself.

Step 4: Evaluate

Once you've completed steps 1-3, ask yourself: is it worth the time and effort to refactor? If the benefit of your change doesn't outweigh the implementation and testing costs associated with it, then it's probably not worth it. How do you know? Well, you'll need to estimate some things:
  • Future technical debt resolved by refactoring (benefit).
    • Technical debt encompasses many things, including: readability, cohesion, coupling, and regression, among others.
    • You should not use past technical debt as a measure, as it is considered a sunk cost.
  • Future non-functional improvements, if any (benefit).
    • Non-functional improvements include improvements to performance, security, and administration.
  • Estimated time to implement and test the refactored code (cost).
  • Future non-functional costs, if any (cost).
Remember that you're just estimating the items above, so you don't need to be too scientific; you're just trying to ballpark some figures to see if there's an obvious disparity in costs vs. benefits.

Step 5: Implement

Once you've evaluated that your change is worth making, you can (and should) make the change. This is where I usually hear the excuses and reasons for not refactoring. If you've identified a reasonably small piece of code to change, and have designed a solution for it, then the implementation should take a relatively small amount of time. Heck, I'd even argue that you don't need to put it on the production schedule - just do it when you have a few minutes between tasks.

If you need to implement a larger change, see if you can break it down into smaller chunks and do it one piece at a time. If that's not possible, then put the refactoring task on your production schedule and get it done in the next sprint!

Step 6: Test

The only time I've ever seen a refactoring effort fail (and thus refactoring becomes taboo...) is when refactored code is not tested as if it's a new feature or change request. Seriously folks, you need to test your changes, regardless of how small or insignificant you think they are. "This code does the same thing as the old code, only better!" is not a good justification for not testing.

Armed with the 5 steps above, refactoring your code should be a much easier and more organized process, and you should now see it as a process that happens in small chunks rather than as a large paradigm-shifting effort.

So, now, go forth and REFACTOR!

What are you waiting for?


December 29, 2012

CodeSense: Implementing Methods

As I mentioned in an earlier CodeSense post, methods describe the functionality that your class provides. In this article I'll explain how to implement methods in a way that increases the readability and maintainability of your code. This article is by no means an exhaustive treatment of best practices, rather it is an overview of high-impact changes you can make to your coding practice right now to significantly improve your work product.

General Principles to Follow When Implementing Methods in Your Classes

What's In a Method Name?

If you haven't already read Smart Naming Practices, go ahead and read it now. Don't worry, I'll wait...

The Single Responsibility Principle and Atomicity

Methods should do one thing only, whatever that one thing happens to be. I've written and reviewed some monstrous methods that performed several actions within the method body, and I can tell you from experience that they were awfully difficult to test and even more difficult to maintain.

Additionally, methods that do multiple things are rarely (if ever) reusable for their constituent functionality. Let's say I have an application that accepts user registration. I've written my registration method such that it inserts the user's details in a database and sends the user a confirmation email. After several months I start to receive institutional accounts that want to add users but do not want the confirmation email to be sent. What do I do now? 
  • Create a new method that only inserts users into the database? Great, now I have two methods I'll need to maintain whenever I have to change something related to adding users to the database.
  • Add a flag to the method's parameters to indicate whether the confirmation email should be sent, and wrap the code that sends the email confirmation in a giant if-statement? Ok, that might work in this sort of binary scenario, but it does have a certain smell if you ask me.
Neither of these two options is very good. It would be better instead to separate the database function from the email function, and use any one of a number of patterns to implement the use case that requires both functions. 

The simplest thing to do would be to create three methods - one that adds a user to the database, another that sends a confirmation email, and then a third that calls each of these methods in succession. Yes, if we do that we'll still have a method that does more than one thing, but at the very least we'll have isolated the two separate actions being performed within that method so that we can reuse one or the other in the future.

The overarching message here is that you should see your methods as building blocks of atomic functionality that can be used separately or together, rather than seeing your methods as do-it-all buckets of code that are written and rewritten over and over again for each use case.

Overloading and the YAGNI Principle

Overloading a method means creating several different implementations of the same method, each with a different signature. Let me tell you a story about overloading. Early on in my career I was asked to create a method that would serialize the containing object as XML for storage and transfer. Perhaps I wanted to show someone that I knew a thing or two about OOP, or maybe I thought I was preparing for the future. Either way, I created not only a method to serialize the object to XML, but several overloads to serialize it to JSON, YAML, url-encoded string, you name it... none of which was necessary because none would ever be used.

It's often tempting to try to anticipate the future by creating methods (or overloading methods) that aren't needed right now, but might be needed in the future. I find this temptation to be particularly strong when it comes to overloading methods. But, chances are, You Ain't Gonna Need It. Create only the methods that you need right now. Implementing only the methods you need at any given time prevents you from writing code unnecessarily that you will have to maintain in the future even if it is not in use.

Avoid External Dependencies

I'm sure to most seasoned developers this sounds like a pie in the sky notion, but I don't think that methods have any business relying on any object or data that isn't explicitly passed in to the method as a parameter. There are two problems inherent to relying upon external state in a method: first, it creates a tight coupling between your method's class and the external state's class. The second problem follows from the first: such a tight coupling makes it difficult to reuse your method's class.

So what's an external dependency? An external dependency is any object or data that is referenced within the body of your method that is not passed into the method as a parameter. The most obvious examples of this type of dependency are session state, application state, configuration settings, and static dependencies like static. However, even the implementing class's state can be considered an external dependency depending on how strictly you interpret the definition.

Avoid Side Effects

In theory, methods are meant to perform transformations on data and then return the result of those transformations. Any behavior that falls outside of that description is considered a side effect. In more practical terms, a side effect is any change in state that occurs within your method body that is not expected or explicitly defined. 

We expect that mutator methods will change state - that's actually their entire purpose, so it's not really a side effect. However, it is generally a side effect if any non-mutator changes state either within the implementing class or - gasp - outside of the implementing class. It is a good practice to avoid creating side effects by avoiding code that explicitly or implicitly changes the state of the implementing class when such a change in state is not to be expected as a result of calling that that method. In practice this means that any method you write that returns a value should not update fields in your class or outside of your class (like session, cache, etc). 

Save Conversions for the Method's Client

It is the responsibility of the client to perform data conversions, both for input parameters and return values. It's poor form to return an int as a string, accept a string when you are performing operations on a double, etc.

Don't Eat Exceptions

There's nothing more infuriating to me than to debug code and find that there's an empty catch block swallowing a thrown exception. If you've put code in a try block you've done so because you think your code might throw an exception. If that's the case, you should either handle the exception or re-throw it. In general if you are thinking of eating an exception you should re-throw it instead. That brings me to my next point...

Fail Loudly 

Code that fails but does not throw an exception is poorly written in my opinion. Some programmers like to return status objects (or return boolean and pass a status object as an out parameter) rather than throw exceptions, but I just don't see the point. Why should a method continue execution if, for example, it has been passed an invalid parameter? The most common argument is usually that the method should recover if it can, but I don't think it should be the method's job to handle recovery... that's the method caller's job. So, long story short, if your method's execution fails it should throw an exception!

Accessor and Mutator Methods

Accessors and Mutators are methods that retrieve and modify the state of your class, respectively. While some languages can automatically implement these methods for you, many do not. Accessor methods should be named getFoo (or GetFoo, if Pascal case is preferred) and mutator methods should be named setFoo, where foo is the name of the backing field. It is permissible to name boolean accessor methods isBar, hasBaz, etc if that is preferred.

It is absolutely critical that your accessor and mutator methods do not modify state unexpectedly. In other words, your setFoo method should set the value of the foo field, and do nothing more to modify the object's state.

Collection Methods

It's generally not a good idea to provide accessor and mutator methods for collections that belong to your class. Rather, it is better to provide methods to add, remove, and retrieve specific items from collections.

Static Methods

A method should rightly be static if it can be applied to all instances of the class regardless of the internal state of the class. In other words, if you write a method that does not call accessors or reference backing fields, nor call mutators or update backing fields, then it can (and should) be a static method.

Accessibility

In most object-oriented languages you can specify whether your methods are public, protected, or private. Public methods can be accessed from any code within or outside of your class. Protected methods can be accessed from within your class or any of its subclasses. Private methods can only be accessed from within your class. My general approach with accessibility is to make all my methods private, and then increase accessibility to protected or public as needed. This has always worked well for me, so I think it's a good approach. Obviously if you're creating a method in a class to meet an external requirement then it's okay to set its accessibility level to public initially.


Thanks for checking out this article on methods, which is part of a new series called CodeSense. CodeSense is about making code easier to read and write by developing common sense conventions that can be applied immediately in nearly any programming language. Please contact me at brian (at) brian-driscoll (dot) com if you have questions, comments, or requests for future CodeSense topics.

December 20, 2012

Why You Should Stop Using ext/mysql (mysql_* functions) in PHP Right Now

I cannot take credit for the content of this post; it comes from StackOverflow user Madara Uchiha, but it deserves as wide an audience as possible, so I am reposting it here for everyone to see and hopefully share. If you are a PHP developer you'd do well to heed the message and update your code accordingly.

Question: Why Shouldn't I Use mysql_* functions in PHP?

What are the technical reasons that I shouldn't use mysql_* functions? (mysql_query,mysql_connectmysql_real_escape_string)?
Why should I change them as long as it works on my site?
(via StackOverflow)

Answer:

First, let's begin with the standard comment we give everyone:
Please, don't use mysql_* functions in new code. They are no longer maintained and are officially deprecated. See the red box? Learn about prepared statements instead, and use PDO orMySQLi - this article will help you decide which. If you choose PDO, here is a good tutorial.
Let's go through this, sentence by sentence, and explain:
  • They are no longer maintained, and are officially deprecated
    This means that the PHP community is gradually dropping support for these very old functions. They are likely to not exist in a future (recent) version of PHP! Continued use of these functions may break your code in the (not so) far future.
    NEW! - ext/mysql is now officially deprecated as of PHP 5.5!
  • Instead, you should learn of prepared statements -
    mysql_* extension does not support prepared statements, which is (among other things) a very effective countermeasure against SQL Injection. It fixed a very serious vulnerability in MySQL dependent applications which allows attackers to gain access to your script and perform any possible query on your database.
    For more information, see How to prevent SQL injection?
  • See the Red Box?
    When you go on any mysql function manual page, you see a red box, explaining it should not be used anymore.
  • Use either PDO or MySQLi
    There are better, more robust and well built alternatives, PDO - PHP Database Object, which offers a complete OOP approach to database interaction, and MySQLi, which is a MySQL specific improvement.
(via StackOverflow)