10 Self-Documenting Code
We’ve mentioned this before, but it’s worth repeating. There are certain habits which will help create self-documenting code, which will reduce the need for a large amount of outside documentation. These habits are using comments and descriptive names.
10.1 Variable Names
Variable names should be self-describing, and there are some other naming conventions and restrictions which you should know about. Yes, I’ve had researchers schedule a consultation and ask me why their code wasn’t running when their variable names violated a restriction.
10.1.1 Naming Restrictions
- Variable and function names are case sensitive
- Variables names use alphanumeric characters and underscores. Special characters cannot be in variable names, especially since
$
and even.
mean specific things in many languages.
- Variable names cannot start with a numeric character, but must start with an letter or underscore.
- Underscores
_variable
usually have a predefined meaning in different languages.
- Underscores
- Variable names cannot contain spaces or hyphens.
- The space will treat it as two variables:
my variable
- The hyphen is treated as a minus sign
my-variable
will be read asmy
minusvariable
- The space will treat it as two variables:
- Variable names cannot be reserved words, such as
for
orwhile
10.1.2 Naming Conventions
Pro Tip: Choose a convention and use it consistently
Because variable names are generally case-sensitive, it allows different types of naming conventions.
camelCase
use_of_underscore
ADifferentTypeOfCamelCase
-
ALLCAPS
means something specific in certain languages, such as constants in C/C++. it can be used, but you should avoid it.
10.1.3 Common Variable Names
You may have noticed that some of the variable names in our examples use certain
variable names, such as i
, consistently. This is not because we are being
lazy. There are some common naming conventions for variable
or function names used across programming languages. Here is an incomplete list
that focuses on the most commonly used variable names.
-
i
,j
,k
are commonly used as loop indices, iterators, and sometimes a count -
n
commonly represents a count or number -
df
is commonly used to represent a DataFrame -
tmp
is short for temporary, and represents a temporary variable -
func
orf
will represent a function- these can be used as
func_sum
orf_sum
to indicate it as a user defined function
- these can be used as
- as shown in the previous chapter,
pd
andnp
are often used as aliases for pandas and numpy, respectively. Other libraries will have common aliases used across the community which uses those libraries.
10.2 Comments
Just like metadata, comments are love notes to your future self.
Comments in programming scripts are there for the programmer. The computer doesn’t see the comments at all. It just ignores them. I use comments as a supplement to self-describing variable names. The variable name tells you what; the comments tell you why
I’ve seen code from grad students without comments before, and I gently asked them if they would remember why they did certain things if they weren’t working on it for six months, then suggested adding comments so it’d be easier to pick it back up in the future.
Each language will have different characters which begin a comment line or comment block. For single line comments, the comment will start at the special characters and go to the end of the line.
Python, R, Ruby, Bash
# This is a comment
Python
SQL
-- This is a comment
C/C++, JavaScript, PHP, Java, C#
// This is a comment
HTML
<!-- This is a comment, which can go multiple lines -->
10.2.1 Multi-Line Comment Trick
There is a trick to easily using multi-line comments
Multi-line comments are often use to block out code while testing different scripts in C/C++ and JavaScript, or block out text in html. But adding and removing the beginning/end of the code is a bit of a pain.
However, here’s the trick:
Always have a beginning + end at the end of a code block you want to comment out, then you just need to add the beginning.
It’s similar for html