December 29, 2012

CodeSense: Implementing Methods

As I mentioned in an earlier CodeSense post, methods describe the functionality that your class provides. In this article I'll explain how to implement methods in a way that increases the readability and maintainability of your code. This article is by no means an exhaustive treatment of best practices, rather it is an overview of high-impact changes you can make to your coding practice right now to significantly improve your work product.

General Principles to Follow When Implementing Methods in Your Classes

What's In a Method Name?

If you haven't already read Smart Naming Practices, go ahead and read it now. Don't worry, I'll wait...

The Single Responsibility Principle and Atomicity

Methods should do one thing only, whatever that one thing happens to be. I've written and reviewed some monstrous methods that performed several actions within the method body, and I can tell you from experience that they were awfully difficult to test and even more difficult to maintain.

Additionally, methods that do multiple things are rarely (if ever) reusable for their constituent functionality. Let's say I have an application that accepts user registration. I've written my registration method such that it inserts the user's details in a database and sends the user a confirmation email. After several months I start to receive institutional accounts that want to add users but do not want the confirmation email to be sent. What do I do now? 
  • Create a new method that only inserts users into the database? Great, now I have two methods I'll need to maintain whenever I have to change something related to adding users to the database.
  • Add a flag to the method's parameters to indicate whether the confirmation email should be sent, and wrap the code that sends the email confirmation in a giant if-statement? Ok, that might work in this sort of binary scenario, but it does have a certain smell if you ask me.
Neither of these two options is very good. It would be better instead to separate the database function from the email function, and use any one of a number of patterns to implement the use case that requires both functions. 

The simplest thing to do would be to create three methods - one that adds a user to the database, another that sends a confirmation email, and then a third that calls each of these methods in succession. Yes, if we do that we'll still have a method that does more than one thing, but at the very least we'll have isolated the two separate actions being performed within that method so that we can reuse one or the other in the future.

The overarching message here is that you should see your methods as building blocks of atomic functionality that can be used separately or together, rather than seeing your methods as do-it-all buckets of code that are written and rewritten over and over again for each use case.

Overloading and the YAGNI Principle

Overloading a method means creating several different implementations of the same method, each with a different signature. Let me tell you a story about overloading. Early on in my career I was asked to create a method that would serialize the containing object as XML for storage and transfer. Perhaps I wanted to show someone that I knew a thing or two about OOP, or maybe I thought I was preparing for the future. Either way, I created not only a method to serialize the object to XML, but several overloads to serialize it to JSON, YAML, url-encoded string, you name it... none of which was necessary because none would ever be used.

It's often tempting to try to anticipate the future by creating methods (or overloading methods) that aren't needed right now, but might be needed in the future. I find this temptation to be particularly strong when it comes to overloading methods. But, chances are, You Ain't Gonna Need It. Create only the methods that you need right now. Implementing only the methods you need at any given time prevents you from writing code unnecessarily that you will have to maintain in the future even if it is not in use.

Avoid External Dependencies

I'm sure to most seasoned developers this sounds like a pie in the sky notion, but I don't think that methods have any business relying on any object or data that isn't explicitly passed in to the method as a parameter. There are two problems inherent to relying upon external state in a method: first, it creates a tight coupling between your method's class and the external state's class. The second problem follows from the first: such a tight coupling makes it difficult to reuse your method's class.

So what's an external dependency? An external dependency is any object or data that is referenced within the body of your method that is not passed into the method as a parameter. The most obvious examples of this type of dependency are session state, application state, configuration settings, and static dependencies like static. However, even the implementing class's state can be considered an external dependency depending on how strictly you interpret the definition.

Avoid Side Effects

In theory, methods are meant to perform transformations on data and then return the result of those transformations. Any behavior that falls outside of that description is considered a side effect. In more practical terms, a side effect is any change in state that occurs within your method body that is not expected or explicitly defined. 

We expect that mutator methods will change state - that's actually their entire purpose, so it's not really a side effect. However, it is generally a side effect if any non-mutator changes state either within the implementing class or - gasp - outside of the implementing class. It is a good practice to avoid creating side effects by avoiding code that explicitly or implicitly changes the state of the implementing class when such a change in state is not to be expected as a result of calling that that method. In practice this means that any method you write that returns a value should not update fields in your class or outside of your class (like session, cache, etc). 

Save Conversions for the Method's Client

It is the responsibility of the client to perform data conversions, both for input parameters and return values. It's poor form to return an int as a string, accept a string when you are performing operations on a double, etc.

Don't Eat Exceptions

There's nothing more infuriating to me than to debug code and find that there's an empty catch block swallowing a thrown exception. If you've put code in a try block you've done so because you think your code might throw an exception. If that's the case, you should either handle the exception or re-throw it. In general if you are thinking of eating an exception you should re-throw it instead. That brings me to my next point...

Fail Loudly 

Code that fails but does not throw an exception is poorly written in my opinion. Some programmers like to return status objects (or return boolean and pass a status object as an out parameter) rather than throw exceptions, but I just don't see the point. Why should a method continue execution if, for example, it has been passed an invalid parameter? The most common argument is usually that the method should recover if it can, but I don't think it should be the method's job to handle recovery... that's the method caller's job. So, long story short, if your method's execution fails it should throw an exception!

Accessor and Mutator Methods

Accessors and Mutators are methods that retrieve and modify the state of your class, respectively. While some languages can automatically implement these methods for you, many do not. Accessor methods should be named getFoo (or GetFoo, if Pascal case is preferred) and mutator methods should be named setFoo, where foo is the name of the backing field. It is permissible to name boolean accessor methods isBar, hasBaz, etc if that is preferred.

It is absolutely critical that your accessor and mutator methods do not modify state unexpectedly. In other words, your setFoo method should set the value of the foo field, and do nothing more to modify the object's state.

Collection Methods

It's generally not a good idea to provide accessor and mutator methods for collections that belong to your class. Rather, it is better to provide methods to add, remove, and retrieve specific items from collections.

Static Methods

A method should rightly be static if it can be applied to all instances of the class regardless of the internal state of the class. In other words, if you write a method that does not call accessors or reference backing fields, nor call mutators or update backing fields, then it can (and should) be a static method.

Accessibility

In most object-oriented languages you can specify whether your methods are public, protected, or private. Public methods can be accessed from any code within or outside of your class. Protected methods can be accessed from within your class or any of its subclasses. Private methods can only be accessed from within your class. My general approach with accessibility is to make all my methods private, and then increase accessibility to protected or public as needed. This has always worked well for me, so I think it's a good approach. Obviously if you're creating a method in a class to meet an external requirement then it's okay to set its accessibility level to public initially.


Thanks for checking out this article on methods, which is part of a new series called CodeSense. CodeSense is about making code easier to read and write by developing common sense conventions that can be applied immediately in nearly any programming language. Please contact me at brian (at) brian-driscoll (dot) com if you have questions, comments, or requests for future CodeSense topics.

December 20, 2012

Why You Should Stop Using ext/mysql (mysql_* functions) in PHP Right Now

I cannot take credit for the content of this post; it comes from StackOverflow user Madara Uchiha, but it deserves as wide an audience as possible, so I am reposting it here for everyone to see and hopefully share. If you are a PHP developer you'd do well to heed the message and update your code accordingly.

Question: Why Shouldn't I Use mysql_* functions in PHP?

What are the technical reasons that I shouldn't use mysql_* functions? (mysql_query,mysql_connectmysql_real_escape_string)?
Why should I change them as long as it works on my site?
(via StackOverflow)

Answer:

First, let's begin with the standard comment we give everyone:
Please, don't use mysql_* functions in new code. They are no longer maintained and are officially deprecated. See the red box? Learn about prepared statements instead, and use PDO orMySQLi - this article will help you decide which. If you choose PDO, here is a good tutorial.
Let's go through this, sentence by sentence, and explain:
  • They are no longer maintained, and are officially deprecated
    This means that the PHP community is gradually dropping support for these very old functions. They are likely to not exist in a future (recent) version of PHP! Continued use of these functions may break your code in the (not so) far future.
    NEW! - ext/mysql is now officially deprecated as of PHP 5.5!
  • Instead, you should learn of prepared statements -
    mysql_* extension does not support prepared statements, which is (among other things) a very effective countermeasure against SQL Injection. It fixed a very serious vulnerability in MySQL dependent applications which allows attackers to gain access to your script and perform any possible query on your database.
    For more information, see How to prevent SQL injection?
  • See the Red Box?
    When you go on any mysql function manual page, you see a red box, explaining it should not be used anymore.
  • Use either PDO or MySQLi
    There are better, more robust and well built alternatives, PDO - PHP Database Object, which offers a complete OOP approach to database interaction, and MySQLi, which is a MySQL specific improvement.
(via StackOverflow)

December 18, 2012

Things That Make You Go WTF?!?

I recently discovered HatePaste, which is good, because I've been having a lot of WTF?!? moments lately with code.

Don't get me wrong, I'm all for using less-than-perfect code as an opportunity to both learn and teach about testing, refactoring, cyclomatic complexity, clarity, etc. However, there are some times when even I must shake my head and say "WTF?!?"

Here is an example of some code that precipitated one of those moments:

A single instance of code like the above might be attributable to an otherwise competent developer having a bad afternoon. But, sadly, there is nearly identical code scattered throughout the entire codebase. If I knew the developer who wrote the offending code I would have a serious code review with him/her, but it's likely that developer is no longer here (thankfully).

That thought begs the question, however, of what would I say/do in a code review with someone who apparently doesn't know that equality and inequality operators (among others) returns a boolean value? Would it be as simple as pointing out that fact, or would it require addressing more serious deficits in understanding?

How would you go about addressing code like what's above, or what's out there on HatePaste, in a code review with the developer who wrote the code?

December 7, 2012

Be the Change You Wish to See in Your Codebase

If you've ever worked with other developers on a project you know that writing consistent code across the entire team can be difficult to achieve without some thought and planning. More importantly, however, it is impossible to achieve consistent coding practices across a team without consistent execution of those practices.

In the software engineering management world there's a lot of talk about team software process and how it's easier and better to integrate teams of engineers when there are documented standards and practices in place. However, the focus there seems to be on the documentation of standards and practices rather than the execution of the same. The thinking seems to be that if a process is documented it will be followed, but I am not convinced that is really the case.

I've worked on several different teams in different verticals over the last 5 years, all of which had varying degrees of documentation of standards and practices for product construction. What I've found is that documented standards and practices were only followed by new team members coincidentally regardless of the level of detail or quality of a standards document, and regardless of how ardently management advocated for following such a document. Instead, new team members simply adopted whatever practices were implemented by existing team members. What's more, if there was a question as to how to approach a specific situation, a new team member would not consult documentation. Rather, he or she would ask a more senior team member what to do.

Of course, there's nothing wrong with peer training on standards and practices in principal. I would always rather ask a coworker what to do than read some Word document that probably hasn't been updated since the days of Clippy. But, if the practices implemented by existing members of a team are inconsistent with one another and the documentation, then there is no hope that any standard will ever be followed (regardless of whether there is documentation or not).

The interesting side effect is that every so often the old guard of the team will bemoan the mess that the codebase has become, seemingly unaware of the fact that they have created the mess themselves by employing practices that are inconsistent with one another and inconsistent with the documentation. The conclusion reached by existing members of the team, however, is almost always that someone else isn't following the documented standards, or the documented standards need to be updated, or some sort of training program needs to be put in place, etc.

While all of these proposed solutions may be of some benefit, they do not address the real problems: the existing codebase and the inconsistent practices of existing team members. And, to the team's (and management's) dismay, nothing gets better. Team members come and go, and over time the inconsistent practices actually grow worse. Eventually the codebase becomes an incomprehensible mess, and there are cries to tear it all down and start over again with a new codebase, new standards, and (of course) new documentation.

And so the cycle repeats.

How do we avoid it? It almost seems impossible, but it's not. It's not easy, of course, to avoid inconsistent coding practices. It takes time and discipline. A lot of discipline. Almost militant discipline. Nevertheless, it can be done. The key factor - what you need to remember at all times - is that as a manager you need to focus on what your team is doing, not on the documentation, because most of the time no one's going to read the documentation.

First, you need to sit down with your team (or core members of the team) to figure out what the standards and practices should be in the first place. It is absolutely key that everyone buys in, so it's better to have few standards that everyone agrees to at first than it is to have many standards that not everyone agrees to. If you don't already have documented standards, start with some things that are easy to implement but will have great benefits (a consistent directory structure for all projects, for example).

Second, you and your team need to refactor existing code under development and/or test to meet the standards that you've all agreed upon. This has to be done regardless of the extent of the effort required. It will be worth it in the end, I promise.

Third, and most importantly, you and your team need to implement the agreed upon standards in all new work products it creates, without any exceptions.

Finally, you and your team should document the standards you've implemented, but you should do it in two forms: a requirements document and a technical spec (you'll see why in a minute).

The future is bright...

The great benefit of having a documented standard that is actually applied in your codebase is not just that your codebase is consistent, rather it is what you can do with your documented standards to make your codebase work for you. You can create tools to automate repetitive parts of your code to remove inconsistencies due to human action. You can create tools to measure logical SLOC, which will inform many other processes. You can create a code analysis engine using the technical spec to make sure that your codebase remains consistent over time. You get the idea.

Moreover, when team members come and go you will have the most powerful and most valuable documentation and knowledge transfer system available: your codebase.

December 4, 2012

Null + Type Checking: The Bad, The Ugly, and The Good

It is often the case that one must check for nulls and types at the same time. Over the last couple of weeks I've been both reading and writing code that checks for nulls and types for various subclasses throughout a codebase. Let's review some of the ways that one might perform such checks in order to execute conditional code branches:


The above code is just bad. Really bad. OMGWTF?!? bad. Yet, it's probably what I come across most often in code reviews. The reason it's bad is because the cast will throw an unhandled InvalidCastException if BaseClassInstance is not, in fact, an instance of SubclassType. Okay, so we know we need to handle that exception now, right? Let's try it again.


Alright, so technically the above code works and meets our requirements, but it's inelegant. We don't really want to be catching an InvalidCastException as a means of determining whether the types match. We can do better by removing the explicit cast.


Whoa... that's a lot of code to just check for a type and value. I mean, yes, it's technically correct and meets requirements, but could you scan that really quickly in a review and know what the hell was going on?


The above code is semantically the same as the previous example, but is clearly much more concise. The is keyword checks that an instance of a reference type is not null, and that it is of the specified type. So from now on, use the is keyword whenever you want to check for nulls and types in the same conditional expression.