While the numbers vary, it appears to be a genrally agreed upon point that most software work these days is maintenance of some sort. Either fixing bugs, adding new features to old systems or adapting old systems to handle slightly new versions of tasks they have been dutifully performing for years. Given that most work is modifying or existing code that already exists, you would think that there is a wide body of literature encompasing the art of reading code. Sadly, such a body does not appear to exist. There is only one book that I know of on the subject: Code Reading: The Open Source Perspective by Diomidis Spinelli. In addition to the book the author is kind enough to provide several links to code reading resources on the internet.
One book and less than a dozen links. You would think that something as obviously important as reading code would get a further treatment. Alas, that is not how it works. From day one we are taught to write code. Most school projects penalize you if you work with other people in class or copy code from someplace else. On one hand this is good in that a lot of programming is actually thinking about and solving the problem, not writing the code for it. Lifting a solution from someplace else or getting help from a more talented classmate does not help you in this regard. On the other hand, this method of teaching is poor preparation for the real world where your first assignment fresh out of school is likely to be maintaining and bug fixing some nasty piece of code that has been clunking around for a decade or more. Mine was maintaining an 87 page Excel Macro written in the Excel 4.0 macro language. So much for those classes in C, Pascal, Lisp and C++ all done on either Macintoshes or very nice (at the time) Sun SPARC 10 workstations. The ONLY way to make heads or tails of this was to read the code, as comments were scarce at best and the manual for the Excel 4.0 macro language, while technically correct I’m sure, did not have many useful examples remotely resembling the intricacies this poor bastardized code base did.
So now that we've established the fact that learning to read code is important and hopefully got raised eyebrows from most of you saying "Excel 4.0 Macro Language, what’s that?", let us talk a little bit about some of the things to look for when reading code. This will certainly not be an exhaustive reference but will hopefully give you some insight into things to look for when faced with the task of reading somebody else’s source. In addition we will talk briefly about some tools and techniques for getting through source files.
Consistency is a topic that will be covered in much of the content of this web site. It is a recurring theme in maintenance and hopefully in a way beyond things being consistently inconsistent. The first thing to look for in the code base you are looking at is internal consistency. Take a step back and just look at the code without digging into too much detail. Hopefully things look similar from top to bottom. Simple things like brace alignment being the same, casing on variables being more or less consistent, spacing of function parameters and around assignment statements being uniform, etc. These things are easy to detect without grokking one bit of code and it will tell you some important things right up front. If spaceing and casing is uniform and brace alignment is consistent throughout the source, you either inherited something from one developer who at least has been programming long enough to form some habits or this code base has been maintained over the years by others thoughtful enough to be as consistent with the existing code as possible. From a high level this tells you that you can probably expect the code will likely not be too painful to go through and you can put some stock in the other clues it gives you along the way; such things as member variables beginning with m_, constants being in all CAPS, varibles named c, x, y, i and j only being used for local loop counters (if they are used at all) and likely function and variable names that make an attempt to be descriptive of their actual use. If you open up the code in your editor and it looks more like graffiti than a well structured program, you have to read it with a more critical eye and should not make many assumptions along the way.
When coding, many tasks are performed in set ways. Many times these are taken from sample code somebody wrote for documentation or something you pick up from reading code that you are responsible for maintaining. Many idioms are language specific while others span many languages, given that many languages share cunstructs such as loops, logical operators, etc. It pays to be familiar with idioms because they will let you read code more quickly. Consistent ways of performing tasks coupled with decent commenting (assume that the person who wrote the code base you are maintaining wrote useful contents) should take away the mystery in any piece of code.
Some idioms will be common. Others will not. One of the benefits of being able to write and spot certain idioms is that it allows you to read the block of code as a whole part and not have to take the time and effort to read each line and piece together the meaning yourself. Also, the case is often that once you learn one idiom in one language that is crystal clear, you will often find ways to translate it into other languages you work with. For example, opening a file in python and processing a line at a time looks very intuitive.
myFile = file("filename","r")
currentLine = myFile.readline()
while currentLine != "":
processLine(currentLine) #simple function for doing whatever to your current line
currentLine = myFile.readline()
myFile.close()
After writing this code a few times, to have to go back to Visual Basic and use its file handling routines seems particularly painful, so I have some vb routines that make reading a file line by line for processing look similar to the python version. By simply including this library in my projects, all of my VB code becomes much more readable and easy to maintain because the idiom is transparent. A quick glance tells me the looping construct is simply concerned with processing a file, so I can worry more about what is going on specifically inside the loop and not pay much attention to the lines around them.
When reading code, be on the lookout for idioms. These sometimes come about because of cut and paste coding, other times because something is done so many times it just becomes habit. In part two of this installment we will discuss some common code idioms.