Thursday, September 11, 2008

Practical Refactoring via Code Smells

Refactor, Refactor, Refactor! (um…thanks?)

Just as "location, location, location" is the mantra of real estate, "refactor, refactor, refactor" is the mantra of Agile development. If you're new to Agile development, people will be constantly telling you to refactor your code, to make it "simpler" and “better” and "more elegant".

But those aren't instructions--they're goals. It's all too easy to define "refactor" in terms of its goals, and think you've told people what to do. But all you've told them is the destination, not how to get there.

In this post, I'll be discussing some simple, practical instructions for refactoring: rules that, if you follow them, will reliably make your code simpler, better, and more elegant. These instructions aren't the result of programming theory (although I’ll take a minute at the end to discuss the theory behind why they work).They're rules that I've seen empirically proven, day after day, in real software development situations.

Refactor from Code Smells

They all follow the same basic pattern:
  1. Observe a Code Smell in your code.
  2. Remove the code smell.
It’s really that simple. When you’re refactoring, you shouldn’t be thinking about functionality, or testing, or Big Architecture. You should be looking at the code you have (which already has correct functionality, and automated tests that all run green…right?), and making simple improvements to it, over and over. Code Smells are the best way I’ve found of identifying where there’s a simple improvement to be made, and eliminating Code Smells is the best forcing function for making refactoring actually happen.

There are lots of Code Smells, but only a few of them have the key attributes that make them optimal refactoring drivers: they have simple, direct fixes; and when you fix them, you end up fixing problems rather than symptoms.

The Code Smells I’ve found most useful for driving the process of refactoring are:
  1. Long methods.
  2. Bad names.
  3. Breaking the Law of Demeter.
  4. Comments.
Sniff out and eliminate every instance of these four key Refactoring Smells in your application, and your code will reliably improve in ways that go far beyond the cosmetic changes the Code Smells seem to require.

Keep Your Methods Short

Methods should never be longer than seven or eight lines. I know some developers who can hold 30-line methods in their heads and reason about them (guys, have pity, and keep it to a length we can manage!). For the rest of us, anything longer than about eight lines requires us to think about it in smaller chunks anyway; the method might as well be written in these chunks in the first place, saving everyone who has to edit or maintain it from re-doing that parse.
But there’s a more profound benefit that short methods provide. They require you to really think about the hierarchical decomposition of the task. What are you trying to accomplish here? If you can’t express it simply in terms of smaller tasks, you probably haven’t thought it out enough. Capping your method length at eight lines will naturally highlight for you the places where more analysis will improve your code.

It’s easy to give people instructions like “Just look for separable concepts in your method, and pull them out into a separate method.” But nobody actually does it, because it’s not easy to identify a “separable concept” at a glance. If you follow the simple rule “no methods over eight lines long”, you’ll be forced to restructure your methods along clear, conceptual breakdown lines, and you’ll end up identifying the underlying concepts almost as a side effect.

Be a Name Nazi

If you insist on naming concepts precisely in your code, one of the nicer things people will call you is a “name Nazi”. Don’t let them fool you—there is vast benefit in accurate naming.

Code should read like English. Remember that 80% of software costs are maintenance: your code is going to be read over and over again, hopefully for years or decades to come. It should be as easy to read as possible.

If your code is full of lines like
entity.Leg.Hand.Shake ( friend.Leg.Hand )
you can be pretty sure that you haven’t clearly thought out the underlying concept. And while it may take a little longer the first time to type out
entity.HandshakeAppendage.Shake ( friend.HandshakeAppendage )
the first time, you’ll more than make up that time difference the first time someone isn’t confused by the Leg object having a Hand property.

And, you’ll have improved the object oriented abstractions in your code. Instead of a multi-part if-statement that looks like
if ( entity is Dog && friend is Human ) {
entity.Leg.Hand.Shake ( friend.Arm.Hand );
} else if ( entity is Human && friend is Dog ) {
entity.Arm.Hand.Shake ( friend.Leg.Hand );
} else if ( entity is Human && friend is Human ) {
entity.Arm.Hand.Shake ( friend.Arm.Hand );
} else if ( entity is Dog && friend is Dog ) {
Debug.Fail(“Dogs cannot shake hands with other Dogs.”);
}
you’ll have a simple bit of code that says
if (entity is Dog && friend is Dog ) {
Debug.Fail (“Dogs cannot shake hands with other Dogs.”);
} else {
entity.HandshakeAppendage.Shake ( friend.HandshakeAppendage );
}
By being religious about naming precision (“there must be something wrong with this code, because Leg.Hand doesn’t make any sense!”), you’ve successfully extracted one of the key abstractions of your application: that different species will offer different parts of their bodies for handshakes.

Obey the Law of Demeter

The first step to fixing violations of the Law of Demeter is knowing what it is. Go read about it (this step is optional). Confused yet? Read some debates about it, and a technical definition (this step, also, is optional).

Now forget everything you just read (see? Toldja they were optional).

The Law of Demeter boils down to, “Don’t do someone else’s job.” An object should deal with its own business, and ask other objects to deal with their own business. If you’ve got a method in your class that does nothing but poke at other objects’ internals, pull out that functionality and put it into the class you’re poking at.

Instead of
if ( myNeighbor.Children != null ) {
myNeighbor.Children.Remove ( neighborChildToRemove );
} else {
myNeighbor.Children = new List();
}
write
myNeighbor.RemoveChild ( neighborChildToRemove );
and, in the Neighbor class,
private List children;
private List Children {
get {
if (children == null) {
children = new List();
}
return children;
}
}

public void RemoveChild ( NeighborChild childToRemove ) {
Children.Remove ( childToRemove );
}
It’s now infinitely easier, when you suddenly discover that Neighbor needs to listen to events from all of its children NeighborChild objects, to add that kind of constraint reliably:
public void RemoveChild ( NeighborChild childToRemove ) {
childToRemove.InformParentEvent -= myChildListener;
Children.Remove ( childToRemove );
}
The responsibility for managing the list of NeighborChild children lives entirely where it belongs: in the Neighbor object.

Destroy Comments

“But, but, but,” I hear some of you splutter, “comments are good things, right?”

No.

As Alex Chaffee said, “Every comment is a lie waiting to happen.” Comments are almost never updated when the code they document is changed, and thus are more likely to be actively misleading than helpful and explanatory.

Worse yet, the very existence of a comment indicates that the original developer didn’t invest the resources, or simply wasn’t able, to make the code or algorithm clear.

We want to get rid of comments, to remove the likely divergence between the comment and the code. But it’s not sufficient simply to delete the comment; in fact, it’s often actively harmful. By deleting the comment, you’re removing the invaluable marker that someone failed to write clear code here.

In order to safely remove comments, and thus eliminate the “lie waiting to happen”, you must first eliminate the bad code that inspired the comment in the first place. Often, simply applying the first three rules will result in the clear code that should have been written in the first place, and deleting the comment will be the natural final step in the process of refactoring. Sometimes, however, especially in code written by an expert programmer, the comment reflects an authentic difficulty with making the code clear. Maybe it accesses a poorly documented interface, or is solving a problem with the wrong tool (“pounding a nail with a wrench”). Maybe it’s a quick hack that somehow made it into the production code unrevised. Whatever the original cause of the bad code, take the time now to fix it, and remove the comment.

Fixing this code smell also synergizes nicely with the first fix (keep your methods short). If you have a 20-line method, where the successive sets of five lines are commented, respectively:
// Prepare data structures for Foobarizing
// Foobarize prepared data structures
// Verify Foobarized data
// Write out newly Foobarized data
it’s very simple simultaneously to convert it into a four-line method and to eliminate the four lines of comments.

This rule is also great for getting rid of comments designed to “clarify” massive, multi-line conditionals. Instead of
// Only shake hands if both “entity” and “friend” can shake hands.
if ( ( ( entity is Dog && ((Dog)entity).HasBothFrontLegs() )
|| (entity is Human && ((Human)entity).HasARightHand()) )
( ( friend is Dog && ((Dog) friend).HasBothFrontLegs() )
|| (friend is Human && ((Human) friend).HasARightHand()) ) ) {
entity.HandshakeAppendage.Shake ( friend.HandshakeAppendage );
}
write
if ( entity.CanShakeHands() && friend.CanShakeHands() ) {
entity.HandshakeAppendage.Shake ( friend.HandshakeAppendage );
}
Note how I sneakily snuck in a fix to a Law of Demeter violation, there? Now, instead of having to check the type of the object and then poke around in its internals to figure out for it whether it can shake hands, I just ask it “Can you shake hands?” And it decides for itself, and lets me know.

Be Not Afraid

Refactoring can look complex and scary when you’re just given the general goals. But by choosing Code Smells with simple resolutions and profound consequences, you can drive your refactoring efficiently and with minimal stress. Don’t let the general “make your code more elegant” instructions scare you off; all they’re saying is “make your code smell less.” Your bug count will go down, your customers will be happier, and your code—voila—will be the magical “better”!

2 comments:

Gement said...

I have a question about destroying comments. Does it apply to those of us who suck at reading code?

Because, I look at what you write, even with the pretty variables, and I still have the initial instinct of "yes, and can I also have a comment sentence before the maximum-8-line method so I can read the synopsis instead of working it out from scratch?"

Mickey Phoenix said...

While I normally consider a comment "a lie waiting to happen", there are two specific purposes for which comments are ideal, either one of which might suffice to answer your question.

The first is encapsulation, comments as API definition. If I am publishing an API for public use, my consumers should not necessarily need to read my code--and, if my product is not open-source, may not have the option to do so. Thus, all public methods in an API should have a comment describing their intent, parameters, preconditions, and post-conditions. This will also, for the wise programmer, act as an incentive to keep the API relatively small and well-factored, because it will minimize the complexity of the comments and the resulting cost of comment maintenance.

And, even then, a disturbing number of bugs result from out of date or simply incorrect descriptions of API method expectations. This is one of the greatest advantages of open-source code: the consumer of an API is able to directly inspect how his or her inputs will be handled, reducing the dependency on possibly-inaccurate comments.

The second valid use for comments is pedagogical. Someone who is just learning to program will write even the simplest algorithms as pseudocode (comments), and then flesh them out into actual code. And when reading code is not yet second nature, having comments in ordinary English to prepare your mind for reading the code can be really helpful.

The danger, however, is when students begin to believe that bicycles should always be ridden with training wheels. Comments serve a valid purpose when learning to read code, but they are strictly inferior to becoming fluent enough in code to read it "at speed". And an excessive dependence on comments can inhibit this process, and may lead to absurdities like

i = i + 1; // increment i by 1

The direct answer to your question, though, is "Yes, you can have a comment sentence that provides the synopsis: we call it the method name." That's part of why I'm so aggressive about refactoring and renaming my methods until they express a single, complete thought, and have a name that conveys that thought.

Remember your grammar-school definition of a sentence or a paragraph? Yeah. When your code is clean enough that it not only "reads like" English line-by-line, but is structured like good writing paragraph-by-paragraph, you'll find that much of the information you'd like to find in a synoptic header comment is comfortably embedded in the well-chosen names of the methods.