5 Loops
The nice thing about the terms used for the control structures in most programming languages is they are usually self-describing. We’re going to talk about loops now.
Loops are common events in programming scripts. It is what the computer uses to listen for user-input. It’s also what researchers use when they want to apply a statistical method over several data files, or several columns within a data table.
5.1 for
loops
The for
loop is probably the most common type of loop. It executes a chunk of code for a certain number of times. The common structure of most for
loops is
for variable in a collection
do something
Now, a collection can be a list of text items, such as cheesy mashed potato ingredients, but it can also be a range of numbers, like 1 to 5
(which is programming speak for 1 2 3 4 5) or 7 to 13
(which is 7 8 9 10 11 12 13).
Let’s see what this looks like in action using a basic list of cheesy mashed potatoes ingredients:
ingredients <- cheddar cheese, potatoes, milk, salt, butter
for ingredient in ingredients
write(ingredient)
Before we look at any real world code, we will manually go through some example loops, without the computer doing the work for us. Practicing a trace on tasks that are simple, like writing each ingredient in a collection, will help us build skills for, and comfort with, reading complex code.
5.1.1 Activity: Practicing a Trace
Timing: 10 minutes
You will need: a piece of paper and writing implement
Let’s set up our trace exercise. The example loop is:
ingredients <- cheddar cheese, potatoes, milk, salt, butter
for ingredient in ingredients
write(ingredient)
At the top of your paper, write out the ingredients
assignment (ingredients <- cheddar cheese, potatoes, milk, salt, butter
).
On the line below that we’re going to create headers for all of the variables used within the for
loop code block. Starting on the left, write the term loop iteration
, then ingredient
in the center of the line, then the term write
on the right of the line.
Your paper should look something like this:
| ||||||
---|---|---|---|---|---|---|
|
loop iteration |
|
ingredient |
|
write(ingredient) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ||||||
| ||||||
| ||||||
|
Your paper is formatted so it’s easy to see what’s going on with variable during each loop iteration. This is called a trace, and is often used in software development to help debug the program. It is also a good exercise to help build mental muscles to read code.
The next step is to fill in the values for each iteration of the loop.
Your paper will now look something like this:
## Warning: package 'ftExtra' was built under R version 4.4.2
| ||||||
---|---|---|---|---|---|---|
|
loop iteration |
|
ingredient |
|
write(ingredient) |
|
1 |
cheddar cheese |
cheddar cheese |
||||
2 |
potatoes |
potatoes |
||||
3 |
milk |
milk (yes, this is a little redundant) |
||||
4 |
salt |
salt (acknowledge the redunancy) |
||||
5 |
butter |
butter (is there an equivalent to horizontal ditto marks?) |
||||
| ||||||
Note: It’s okay if you didn’t completely fill out both the variable ingredient
and the function write ingredient
because in this case they are the same. But, it’s important to check what actions are happening to variables when you do a trace.
You’ve now completed the first trace exercise of a pseudocode example. Congratulations!
In this workshop, we’re focused on the programming logic underlying code, rather than specific syntax, but it can be helpful to look at “real” code doing the same task, to further build out your mental model.
If we asked a computer to do the same task we did, using the R language, it would look something like this:
ingredients <- c("cheddar cheese", "potatoes", "milk", "salt", "butter")
for(ingredient in ingredients){
print(ingredient)
}
## [1] "cheddar cheese"
## [1] "potatoes"
## [1] "milk"
## [1] "salt"
## [1] "butter"
5.1.2 Activity - Loop Trace
Timing: 10 minutes
You will need: a piece of paper and writing implement
Now that we’ve practiced a trace using text, let’s practice using number ranges Remember that to a computer, the range written as 2 to 7
represents numbers 2 3 4 5 6 7.
For this loop trace, you’ll write out the output of each iteration of this loop:
for x in 0 to 5
write x
write x*2
Like in the ingredients trace example, you’ll create a table with headers for all of the variables used within the for
loop code block. Starting on the left, write the term loop iteration
, then write(x)
in the center of the line, then the term write(x*2)
on the right of the line.
Then, fill out the table for each iteration of for x in 0 to 5
.
Loop Trace
| ||||||
---|---|---|---|---|---|---|
|
loop iteration |
|
write(x) |
|
write(x*2) |
|
1 |
0 |
0 |
||||
2 |
1 |
2 |
||||
3 |
2 |
4 |
||||
4 |
3 |
6 |
||||
5 |
4 |
8 |
||||
6 |
5 |
10 |
||||
| ||||||
By this point, you’re probably thinking that doing a manual trace is a bit tedious. And it is! But again - this is a good way to strengthen your mental model of how loops work, and get you comfortable with thinking through what a chunk of code is trying to do.
5.1.3 Exercise - Trace Nested for
Loops
Timing: 15 minutes You will need: a piece of paper and writing implement
So far, we’ve tackled one loop at a time. But loops are flexible and can be “nested”, or embedded within each other.
Let’s look at an example the following matrix, which is created using two for
loops
Index |
Column 1 |
Column 2 |
Column 3 |
---|---|---|---|
Row 1 |
2 |
4 |
6 |
Row 2 |
5 |
7 |
9 |
Row 3 |
8 |
10 |
12 |
Row 4 |
11 |
13 |
15 |
The pseudocode for how this matrix was created is as follows:
num_rows = 4
num_col = 3
matrix <- []
for (i in 1 to num_rows)
for (j in 1 to num_col)
matrix[i,j] <- (3*(i-1))+(j*2)
Before creating the table to help trace the code, answer the following
- What variables need to be traced?
- What are the ranges that are used?
- What calculations do you need to keep track of?
- Is there anything else?
Answers
- What variables need to be traced?
-
i
,j
- you might think that you need to keep track of
loop iteration
, but it’s redundant because bothi
andj
start at one. it’s enough to keep track of just their values.
-
- What are the ranges that are used?
- 1 to 4 and 1 to 3
- What calculations do you need to keep track of?
(3*(i-1))+(j*2)
- Is there anything else?
- With a small enough matrix, it’s okay to make an empty one and fill it in
- With a small enough matrix, it’s okay to make an empty one and fill it in
Label columns and sketch out a matrix to fill in the variables as the loop iterates for the first five lines (you won’t fill in the whole matrix, but only a portion).
Loop Trace
| ||||||
---|---|---|---|---|---|---|
| ||||||
| ||||||
|
i |
|
j |
|
(3*(i-1))+(j*2) |
|
1 |
1 |
(3*(1-1))+(1*2) |
||||
1 |
2 |
(3*(1-1))+(2*2) |
||||
1 |
3 |
(3*(1-1))+(3*2) |
||||
2 |
1 |
(3*(2-1))+(1*2) |
||||
2 |
2 |
(3*(2-1))+(2*2) |
Index |
j = 1 |
j = 2 |
j = 3 |
---|---|---|---|
i = 1 |
2 |
4 |
6 |
i = 2 |
5 |
7 |
|
i = 3 |
|||
i = 4 |
5.2 while
Loops
There are other types of loops in addition to the commonly used for
loop.
The while
loop uses a condition that the computer checks if it is TRUE
or FALSE
at the start of the loop to determine if the code chunk inside the loop will execute.
while (condition is TRUE)
do something
This is different from a for
loop, because while a for
loop will execute a predetermined number of times, the while
loop will execute upon a condition being met.
For this loop to work, you need to set the condition before the loop, and change it in the loop.
You can write a for
loop into a while
loop by using a variable which will increment by one for each time the loop is run. This usually isn’t done, but we want to show something you’re already familiar with.
For example, recall the initial for loop:
ingredients <- cheddar cheese, potatoes, milk, salt, butter
for ingredient in ingredients
write(ingredient)
An equivalent while
loop would look like:
ingredients <- cheddar cheese, potatoes, milk, salt, butter
i = 1
condition_check = length(ingredients) + 1
while i < condition_check
write ingredients[i]
i <- i + 1 // change for condition check
Here’s what it look like in R
ingredients <- c("cheddar cheese", "potatoes", "milk", "salt", "butter")
i <- 1
condition_check = length(ingredients) + 1
while(i < condition_check){
print(ingredients[i])
i <- i + 1
}
## [1] "cheddar cheese"
## [1] "potatoes"
## [1] "milk"
## [1] "salt"
## [1] "butter"
When we identify the variables to keep track of in a while
loop trace, we need to add one to check if the condition for entering the loop is met.
| ||||||||
---|---|---|---|---|---|---|---|---|
|
loop iteration |
|
i |
|
ingredients[i] |
|
i < condition_check |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ||||||||
| ||||||||
| ||||||||
|
Try filling out this table for condition_check = length(ingredients) + 1
(condition_check = 6). What is the last value of ingredients[i]
that is printed?
Now let’s try a thought exercise:
- What would be the last value of
ingredients[i]
printed ifcondition_check = 3
? - What would be the last value of
ingredients[i]
printed ifcondition_check = 8
?
Answers
If condition_check = 3
, then the loop would stop after the i = 2
iteration. Once i = 3
, i < condition_check
would be False
, and the loop would not continue. So, the last printed value of ingredients[i]
would be ingredients[2]
, which is potatoes
.
In this case, we would be all out of values in ingredients
before condition_check
was False
, meaning the loop would keep going even if there was nothing to return. Since there are only 5 values in ingredients
, when we got to i = 6
and ingredients[6]
, the code would return NA
(missing value). The same would happen for i = 7
. So, our full results would look like:
[1] "cheddar cheese"
[1] "potatoes"
[1] "milk"
[1] "salt"
[1] "butter"
[1] NA
[1] NA
5.2.1 Other types of loops
Two other types of loops are a do while
loop (check at end) and an until
loop (check at end).
do
something
while (condition is TRUE)
The logic in the do while
loop is while a condition is true, the loop will run.
As with the while
loop, the condition needs to be set before the loop, and changed within the loop.
For the until
loop, it’s slightly different. The loop structure is like this
do
something
until (condition is TRUE)
The logic is opposite of a do while
loop. It will run while the condition is false, and will only exit when the condition is true.
These aren’t used often, in fact IMO it’s better to just re-write the logic to use a while
loop.
5.2.2 Exercise - While Trace
Timing: 10 minutes You will need: a piece of paper and writing implement
Let’s practice a while
combined with a for
trace
condition_check <- 5
i <- 1
while (i < condition_check):
for j in 1 to 3:
write (i*j)+i
I’ll give you all a few minutes to write out your trace. Once you’re done, type the largest number you traced in the write statement, but don’t press enter until I give the signal.
5.3 Caveats
As you’ve just experienced, loops come with their own set of common issues.
Since loops are a frequently encountered concept in programming, we’ll go over the common problems.
5.3.1 Infinite loops
What you all just experienced in the previous exercise is called an infinite loop. Infinite loops happen when the condition check is never false. In the case of the above while loop, the check variable was never incremented, so the loop would go through the same process until the programmer interrupts it, the computer fails, or time ends (whichever comes first).
5.3.2 Overwriting outputs
for x in files(1 to 500)
rename_file(x, "test-case")
Often when researchers are developing a script, they will use test case to develop their algorithm and work out the bugs. In this way, if mistakes are made, it’s on a small scale and easy to correct. Most development is the following.
- Develop code for a single case
- Test on a few cases
- Use on all of the cases
If a step is missed, it can be disastrous. I can attest. I wrote a script to rename about 500 files, and in my hubris of being an awesome coder, forgot to test it on a few cases before using it on all of the files. I forgot to change the single rename ("test-case")
to something that incorporated the loop ("test-case"+x)
and when the loop was finished, wondered why I went from 500 to only one file in the folder. Luckily, I always backup my data files, so it was an easy mistake to remedy (and makes for a good story on the importance of data backup).
5.4 Language examples: Formatting may vary
Programming languages may have specific formatting for loops. This may mean certain brackets must be used, tab indents are needed, or ranges are specified a certain way.
Formatting of outputs may also look slightly different, with some languages printing each output iteration on a new line, and some not.
Python requires indentation for code blocks. Code blocks in different structures must be aligned with the same indentation.
Programming structures such as for
expect :
at the end of each statement with the subsequent code block indented by one.
R does not require indentation for code blocks; however, indentation is used because it’s easier for humans to read!
Instead of :
R makes use of curly brackets {}
to indicate the main body of a for
statement.
for (x in 0:5) {
print(x)
}
The following example is C/C++ syntax. One big difference between C/C++ and Python or R is C/C++ is a compiled programming language, which must first be compiled into a file (often *.exe, but can be anything) that contains the machine-language instructions to be executed.
C/C++ syntax looks a lot like R, but requires semicolons at the end of each line to be executed in a code block, while R does not require the semicolon (although R will also run if there are semicolons).
Note that C/C++ also uses the cout
command to specify what should be returned to the console, rather than print
like in Python or R.
PHP is very similar to C/C++, but is used for web development. Note that PHP uses echo
rather than cout
or print
to specify what should be returned to the console.