Improving Code Documentation

The debate between whether to use comments or to write self-documenting code has raged on for ages. Both approaches are equally valid, but each has its own advantages and disadvantages depending on the circumstances. Comments can be as much of a hindrance as a help in the wrong conditions, and self-documenting code can be opaque and unwieldy in others.

Self-documenting code tends to be quicker, but comments add clarity for usage and intention. Do you paint the picture or describe the process? A combined approach tends to be best. The size of the team or project as well as the purpose of the project can affect which approach is ideal. Let’s see what each entails and how to improve our commenting styles for pretty much any situation.

Self-Documenting Code

Self-documenting code is the practice of writing code in a way which is intuitive when read. Variables need to be named sanely, functions need to be named properly, code needs to be intuitive. Using self-documenting code is one of the most efficient ways to write cleaner code since you have to understand what you’re doing to make it make sense. Self-documenting code is also typically a bit easier to follow when working on debugging tasks.

There is never a reason to not use self-documenting code, but only using it can have it’s own shortcomings. Using only self-documenting code can get out of control if there is too much code in a given section or if things get complicated. There is also the limitation of having to track down how something is put together. If you have an interface in front of a class which inherits from another class, things will get confusing quickly if you’re relying on self-documenting code alone. Don’t let this be your only documentation method for non-trivial work!

Improving Self-Documenting Code

It’s easy to argue that the following trivial example is self-documenting:

function divide( dividend, divisor )
	return dividend / divisor
end

We know that the function will divide the dividend by the divisor and return the value. There isn’t any error handling or anything in this function, which is clear from looking at it. This works for smaller functions easily, but can break down with more complicated code blocks.

Make variables easily readable. In our example, we used the full terms dividend and divisor so it’s clear we’re working with division. Good self-documenting code shouldn’t have variables like a or i unless they’re iterators or other one off variables. This is fine in simple methods, but a function full of the alphabet is going to be hard to read.

Standard Comments

Standard commenting is typically taught early on, but quickly abandoned in favor of self-documenting code at many shops. Standard comments being added into code add explicit explanation of what a given piece of code is doing. Standard comments are downright necessary in many languages due to the complexity of certain operations. Higher level languages like Java especially benefit from a clear commenting process.

Traditional commenting does require upkeep as well as extra time to add in. There are people who do use it as a crutch and will continue to use variables like a or b for non-trivial scenarios and they just note it in comments. Commenting too much in a given section can also make it harder to read. Some developers will use commenting as a crutch and break the process up to the point they lose any flow.

Writing Better Comments

One thing worth noting is that I won’t be committing to a specific commenting style. I will float between several I have used for production code because the comments should be tailored towards the standards and process of the project it is in.

If we take our previous function, we can comment it out to be something like follows:

--divide_comments
--takes: scalars: a, b
--returns: a/b
--warning: this function has no error handling for wrong types on a or b, or if b == 0
function divide_comments( a, b )
	return a/b
end

This is a trivial example so I used a and b. We can see explicitly that this function has no handling for if b, our divisor is 0. We also have no handling for wrong types.

Commenting is rarely going to be bad unless it is over the top and scattered. The problem a lot of coders have is finding that balance. Our example in the previous function has 4 lines of comments for 3 lines of actual code. Does that make our code harder or easier to read? The answer is: “It depends on what we’re doing.” There will be scenarios where a few extra lines make things substantially easier to understand at a glance.

The goal of commenting is to make the code easier to read and understand without having to analyze it all. Sparingly using comments in self-documenting code can make things much clearer. For instance, here’s a nonsensical algorithm foobar:

--foobar algorithm
--takes: array a, array b
--returns: scalar, number foobar
--expects sanitized arrays
--runs our patented foobar algorithm: [internal wiki link or description goes here]
function foobar( a, b )
	local foo = 0
	
	--generates foo part 1: [link to documentation or description of what and why we're doing it]
	for i, av in ipairs( a ) do
		for j, bv in ipairs( b ) do
			foo = foo + i + j / 2
			foo = foo + ( av * i ) - (bv * j / 4 ) + math.sqrt( av * bv * i * j )
			
			--foo part 1 adjustment table
			if foo > 100 then
				foo = foo - 99
			elseif foo > 20 then
				foo = foo - 18
			end
		end
	end
	
	--modifies foo part 1 to get bar adjustment: [link to documentation or description of what and why we're doing this]
	for i, av in ipairs( a ) do
		foo = foo + math.sqrt( av ) - math.log( av )
		
		--bar adjustment table
		if math.log( av ) > 25 then
			foo = foo + 6
		elseif math.log( av ) > 20 then
			foo = foo + 5
		elseif math.log( av ) > 15 then
			foo = foo + 4
		end
	end
	
	return foo
end

I intentionally made a completely nonsensical algorithm (and we’re pretending a and b mean something important for the algorithm). What does it do? We have a header which tells us how to use it, what it takes, and returns, any specific conditions, and a link to internal documentation on what it’s for (or put a nice description). We also have commented what each major section is so that if there are adjustments, we know where to look.

A new person could pick up this code and use it (well, if it actually did anything). If they’re referencing the documentation, they know what each larger section is responsible for. Comments should work to make the code clearer without clutter. I’ve seen some people claim you should have as many comments as you do lines of code, and that every single step should be spelled out with explicit comments, but that just gets unwieldy for anything but academic code.

Which is these is easier to read?

--function square
--takes: scalar, number: term
--gives us the square of a number as defined as term * term
--returns a number which is our square
--error checks if a number is correct or not before returning value
--returns nil if term is not a number
function square( term )
	--check our type, if it's not number return nil
	if( type( term ) ~= "number" ) then
		return nil --return nil since the term is not a number
	end
	
	--make sure we cast term to a number explicitly
	local tempterm = tonumber( term )
	--get our square by multiple tempterm by itself
	local squaredterm = tempterm * tempterm
	
	--return our value
	return squaredterm
end

Or:

--function square
--returns the square of term
--checks if term is a number
function square( term )
	if( type( term ) ~= "number" ) then
		return nil
	end
	
	return term * term
end

The second obviously makes far more sense on a quick read through. I have seen actual production code as absurd as the first. The commenting style of verbosely documenting every thought in the process leads to redundant cruft which is overthought out for accomplishing a trivial function. Make comments work for you, not against.

Is It Better To Use Self-Documenting Code or Comments?

Neither is going to be objectively better, but one can be more applicable than the other depending on the circumstances. Self-documenting code by itself is going to make more sense when there are rigid standards in place for the code and the project is straightforward. Comments are much more applicable to more abstract operations which are harder to keep track of. The two combined correctly are substantially more powerful and efficient than either alone.

Team size can also affect the efficiency of a given method. A larger team will need more structure to stay functional, while a smaller team usually cannot afford the extra cost to add comments. A joint approach will be useful for complicated structures, while excess commenting is a waste for trivial and nontrivial blocks.

Small Teams

Small teams tend to work best focusing on complying with code standards and writing self-documenting code. Comments can be a distraction in small teams as they inflate the code base. This isn’t to say don’t ever comment, but don’t waste time documenting code which is clear.

Shorter headers for functions which detail the basic usage speed up usage. For our earlier square function we gave a quick header so we know what we’re working with:

--function square
--returns the square of term
--checks if term is a number

You should be able to rewrite the function from the header (if you know what its purpose is in the project). By commenting properly, it’s easy to keep track of what does what without having to delve into a bunch of code. By not commenting too much, a smaller team can focus on producing the project rather than over-documenting. Document more complicated algorithms and complex functions, explain basic usage, and document bugs and intentional behavior, but don’t write out every thought and feature for every single thing unless necessary.

Large Teams

Large teams should ideally focus more on documenting code than smaller teams. The more people are touching the problem, the more verbose the notes need to be so that other developers know what was done and why. Self-documenting code is fine for trivial functions, but even boilerplate classes need some degree of explanation. The more essential or common a class, the more you should document it.

In the example of our overly verbose square function, the header would be more than sufficient for more non-trivial real code:

--function square
--takes: scalar, number: term
--gives us the square of a number as defined as term * term
--returns a number which is our square
--error checks if a number is correct or not before returning value
--returns nil if term is not a number

It tells us what the function is, what it takes (and what it expects), what it does and how, what it returns, and error handling (if any) which is present. Anyone on the team should be able to recreate a feature complete version of this function from this type of header. For non-trivial code, this will help with debugging and regression testing. We know what the function expects, we know what possibilities exist for returns, and we know how it accomplishes its goal at a higher level (well, maybe not so much for this example).

Large teams have projects which are rife with opportunities for people to step on one another. What happens if someone changes the error handling in our square function? If code depends on it returning nil when it encounters the wrong input, what happens if someone changes the behavior because it’s convenient for them? You end up with something else breaking silently.

The efficiency hit of more documentation is made up with the efficiency gain from it being easier to pivot tasks. The average person can only keep track of so much before they’re overwhelmed. By making the code make more sense without having to read as much, it’s easier for developers to jump in on other places to shore up other issues in larger teams.

Project Size

The larger the project, the more ideal it is to use some degree of verbose commenting. A few hundred lines of code can be fine to slog through, but when you get in the thousands, you want notes. What is that widget class and why does it exist? How do you use it? Having a quick reference can save time when jumping around.

I would still suggest smaller, closer teams not get too verbose, even with large projects, but be a little more descriptive with comments breaking up sections of code. What is this object for? What do these functions do? If it’s something extremely common, what is a quick example and use case? Using the “open source documentation” (e.g. writing a couple examples and explaining a few of the main use-cases and caveats) approach is typically fine for smaller teams, but don’t be afraid to be more verbose for more complicated projects.

Larger teams tend to be working on larger projects, or at least have much larger crossover on smaller parts. Either way, the more hands involved, the more process benefits the whole team, even if it compromises an individual’s raw efficiency. Large teams should be commenting and creating internal documentation. An hour on documentation can save days of work (it’s saved me months even for some projects).

Treat each person as if they could be gone tomorrow. If your number one engineer leaves tomorrow, and you find someone of equal skill, how long until they’re up to speed? Reading the entire code base is fine for a small project, but after several thousand single lines of code, how sane is that strategy? Make sure that anyone competent could understand what is being done and more importantly why for any given section.

Applying Better Commenting

Take note of what you do now. Do you prefer self-documenting code or heavy comments? What do you use and why? There’s no right answer to this.

Consider what the other approach can do better. If you use self-documenting code, have there been scenarios where you would have preferred comments instead or in addition? If you use comments, are their times you got lost in the text soup and couldn’t read the code? The answer is almost always going to have been “yes” at some point if you’ve done a non-trivial amount of coding, even solo.

If you primarily only use self-documenting code, begin adding function headers and small notes explaining individual algorithms. Make it a habit so that you don’t disrupt the flow of writing code while documenting, or document right after a logical segment. If you primarily rely on comments, try waiting until you finish writing the block to document what it does. By waiting, you’ll have to read back over your code. What doesn’t fit or doesn’t read well in the code itself? Change it to be cleaner.

If you use both approaches together, you’ll find where you naturally want to add comments and where the code speaks for itself. You also get used to more formal documentation which is a sell for a team environment. Too much verbosity can be discarded, too little requires reinventing the wheel. Make your code easy to understand and maintain.

Featured image by Pexels from Pixabay