Home

Idioms - continued

Following we will be talking about some common idioms and things to watch out for when maintaining a piece of code. As stated in the previous installment, being able to recognize idiomatic code will help greatly in maintenance. We will talk about several idioms here, some of which are common and some of which are just fun to look at due to their sheer geekiness. Here they are, in no particular order.

Looping

Looping is not just a programming construct (while, for, do, etc.) but also an idiomatic way of processing data. The basic loop idiom is get or allocate a resource, do something with the elements of that resource and then close or dispose of that resource. This basic idiom is used so many times in so many places (records in a database, lines in a file, elements in an array, etc.) that you wonder how it can go bad. But it does. One example of looping gone bad is on the front page of this site. This is a generic loop through a recordset in VBScript. The problem is that we get and open the resource, then check for a condition and if the condition is false, we open the resource again. It has to be closed first, which anybody who has worked with recordsets on more than an occasional basis knows.

The really scary part about this code is that it was tested. I’m guessing it was tested once and that’s it. It passed the first time through and then since it worked, it was let fly. This code was put together by somebody working with a customer, so I am guessing they tested it in the customer’s environment, it worked and away they went. Anybody responsible for maintaining a codebase that is used in a production environment had better have a more thorough approach than that, but that is a topic for another essay.

When you see a loop in the code with processing of some sort in the middle, see if it follows the open-process-close idiom for a single instance of whatever you are working on. If not, hopefully there are some comments explaining exactly why it is deviating from this simple construct. If there are not, take the time to find out why it is deviating and then put the comments in. You will thank yourself later.

Number of Elements in an Array

In many new languages, arrays are objects with all kinds of properties like size or length which will tell you the how many elements are in the array. However, C and its ilk have no such operator for arrays. In order to get the size of an array you can do the following

int simpleArray[] = {0, 1, 2, 3, 4};
int arraySize = sizeof(simpleArray)/sizeof(simpleArray[0]);

In this case the sizeof operator in the numerator gets the entire size in bytes of the array. The sizeof operator in the denominator gets the size of the first element of the array. Dividing the former by the latter yields the number of elements in your array. This is useful for iterating over for loops, etc.

Swapping Two Elements, No Temporary Variable

Often the code for swapping two elements looks something like this:

int a = 9;
int b = 12;
int c = a;
a = b;
b = c;

This is perfectly fine. However, if you do not want to take up the space of a temporary variable, you can use the XOR operator to swap variables like so:

int a = 9;
int b = 12;
a ^= b ^= a ^= b;

Using XOR in this way swaps the values of the two variables without introducing a temporary variable into the mix.

Recursion

Recursion is a topic that will get its own essay at some point, but since it is a very powerful construct with a couple of rules of thumb, I did want to cover it here. For the uninitiated, recursion is the method by which a function calls itself to get work done. A common use of recursion is to walk through binary trees. Recursion is one of the best methods to do this because in describing the problem, the algorithm maps most readily to a recursive implementation. You can easily change the method of traversal from pre to post order simply by changing the order of the calls. The code for recursion is often much more compact than its alternatives, which usually include pushing and popping a lot of stacks and checking a lot of values along the way. For those working with recursive functions written by somebody else, there are several things to watch out for.

Will the routine always terminate? One problem with recursion is that it is very easy to write an algorithm that will not terminate, ever. One problem I had to solve a long time ago was actually checking a graph for cycles. I found this code a lot easier and safer to write using stacks to keep track of the nodes visited and once I came to a node twice there was a cycle, so I exited. I’m sure this could have been done recursively, but in a few instances I was getting into infinite loops. The stack based implementation always ended and always found the cycles. It also had another side benefit which we’ll talk about now.

Be careful of your local variables when using recursion. Every local variable gets a new copy when the function is called again. 300K on the stack for a single function call is not usually a big deal. 300K on the stack when the function could potentially call itself thousands of times is likely disastrous. When looking at recursive functions, be aware of the local variables in the function and if you can find a way to eliminate them, do so.

If possible, try to set a number of times it makes sense to call the recursive function and check this value as you are working. If you can prevent a memory explosion, it is in your best interest to do so. Most users would rather be informed up front the problem will be too big to solve than to try and solve it, have it run for three days and then have the computer crash, losing all the work they have been doing because they did not save early and often.

Other things to look out for

Beyond consistency and idioms, I have found a lot of code reading to be dependant on the code base you are working on, the language you are working in and the collection of common knowledge that is reflected in the codebase. Working on VBScript in a corporate web site is a lot different than looking at C in an open source project such as Apache. Different groups make different uses of braces and indentation depending on common practices at the time, syntactic taste of lead developers and some less obvious reasons. Since this site is going to be remaining largely language agnostic and I likely do not work in your environment, I do not have anything else to add at this time. This is one area that I believe would benefit from a lot of discussion and this is an area I do plan on revisiting from time to time.

In the meantime, there are a few idioms that do show up commonly enough that they are worth keeping in mind.