5 Loops

The nice thing about the terms used for the control structures in most programming languages is they are usually self-describing. We’re going to talk about loops now.

Loops are common events in programming scripts. It is what the computer uses to listen for user-input. It’s also what researchers use when they want to apply a statistical method over several data files, or several columns within a data table.

5.1 `for` loops

The for loop is probably the most common type of loop. It executes a chunk of code for a certain number of times. The common structure of most for loops is

for variable in a collection
  do something

Now, a collection can be a list of text items, such as cheesy mashed potato ingredients, but it can also be a range of numbers, like 1 to 5 (which is programming speak for 1 2 3 4 5) or 7 to 13 (which is 7 8 9 10 11 12 13).

Let’s see what this looks like in action using a basic list of cheesy mashed potatoes ingredients:

ingredients <- cheddar cheese, potatoes, milk, salt, butter

for ingredient in ingredients
  write(ingredient)

Before we look at any real world code, we will manually go through some example loops, without the computer doing the work for us. Practicing a trace on tasks that are simple, like writing each ingredient in a collection, will help us build skills for, and comfort with, reading complex code.

5.1.1 Activity: Practicing a Trace

Timing: 10 minutes
You will need: a piece of paper and writing implement

Let’s set up our trace exercise. The example loop is:

ingredients <- cheddar cheese, potatoes, milk, salt, butter

for ingredient in ingredients
  write(ingredient)

At the top of your paper, write out the ingredients assignment (ingredients <- cheddar cheese, potatoes, milk, salt, butter).

On the line below that we’re going to create headers for all of the variables used within the for loop code block. Starting on the left, write the term loop iteration, then ingredient in the center of the line, then the term write on the right of the line.

Your paper should look something like this:


loop iteration	ingredient	write(ingredient)

Your paper is formatted so it’s easy to see what’s going on with variable during each loop iteration. This is called a trace, and is often used in software development to help debug the program. It is also a good exercise to help build mental muscles to read code.

The next step is to fill in the values for each iteration of the loop.

Your paper will now look something like this:

## Warning: package 'ftExtra' was built under R version 4.4.2


loop iteration	ingredient	write(ingredient)
1	cheddar cheese	cheddar cheese
2	potatoes	potatoes
3	milk	milk (yes, this is a little redundant)
4	salt	salt (acknowledge the redunancy)
5	butter	butter (is there an equivalent to horizontal ditto marks?)

Note: It’s okay if you didn’t completely fill out both the variable ingredient and the function write ingredient because in this case they are the same. But, it’s important to check what actions are happening to variables when you do a trace.

You’ve now completed the first trace exercise of a pseudocode example. Congratulations!

In this workshop, we’re focused on the programming logic underlying code, rather than specific syntax, but it can be helpful to look at “real” code doing the same task, to further build out your mental model.

If we asked a computer to do the same task we did, using the R language, it would look something like this:

ingredients <- c("cheddar cheese", "potatoes", "milk", "salt", "butter")
for(ingredient in ingredients){
  print(ingredient)
}

## [1] "cheddar cheese"
## [1] "potatoes"
## [1] "milk"
## [1] "salt"
## [1] "butter"

5.1.2 Activity - Loop Trace

Timing: 10 minutes
You will need: a piece of paper and writing implement

Now that we’ve practiced a trace using text, let’s practice using number ranges Remember that to a computer, the range written as 2 to 7 represents numbers 2 3 4 5 6 7.

For this loop trace, you’ll write out the output of each iteration of this loop:

for x in 0 to 5 
  write x
  write x*2

Like in the ingredients trace example, you’ll create a table with headers for all of the variables used within the for loop code block. Starting on the left, write the term loop iteration, then write(x) in the center of the line, then the term write(x*2) on the right of the line.

Then, fill out the table for each iteration of for x in 0 to 5.

Loop Trace


loop iteration	write(x)	write(x*2)
1	0	0
2	1	2
3	2	4
4	3	6
5	4	8
6	5	10

By this point, you’re probably thinking that doing a manual trace is a bit tedious. And it is! But again - this is a good way to strengthen your mental model of how loops work, and get you comfortable with thinking through what a chunk of code is trying to do.

5.1.3 Exercise - Trace Nested `for` Loops

Timing: 15 minutes You will need: a piece of paper and writing implement

So far, we’ve tackled one loop at a time. But loops are flexible and can be “nested”, or embedded within each other.

Let’s look at an example the following matrix, which is created using two for loops

Index	Column 1	Column 2	Column 3
Row 1	2	4	6
Row 2	5	7	9
Row 3	8	10	12
Row 4	11	13	15

The pseudocode for how this matrix was created is as follows:

num_rows = 4
num_col = 3

matrix <- []

for (i in 1 to num_rows)
  for (j in 1 to num_col)
    matrix[i,j] <- (3*(i-1))+(j*2)

Before creating the table to help trace the code, answer the following

What variables need to be traced?
What are the ranges that are used?
What calculations do you need to keep track of?
Is there anything else?

Answers

What variables need to be traced?
- i, j
- you might think that you need to keep track of loop iteration, but it’s redundant because both i and j start at one. it’s enough to keep track of just their values.
What are the ranges that are used?
- 1 to 4 and 1 to 3
What calculations do you need to keep track of?
- (3*(i-1))+(j*2)
Is there anything else?
- With a small enough matrix, it’s okay to make an empty one and fill it in

Label columns and sketch out a matrix to fill in the variables as the loop iterates for the first five lines (you won’t fill in the whole matrix, but only a portion).

Loop Trace

num_rows = 4 & num_col = 3
Range 1 to num_rows is 1 2 3 4
Range 1 to num_col is 1 2 3
	i	j	(3(i-1))+(j2)
	1	1	(3(1-1))+(12)
	1	2	(3(1-1))+(22)
	1	3	(3(1-1))+(32)
	2	1	(3(2-1))+(12)
	2	2	(3(2-1))+(22)

Index	j = 1	j = 2	j = 3
i = 1	2	4	6
i = 2	5	7
i = 3
i = 4

5.2 `while` Loops

There are other types of loops in addition to the commonly used for loop.

The while loop uses a condition that the computer checks if it is TRUE or FALSE at the start of the loop to determine if the code chunk inside the loop will execute.

while (condition is TRUE)
  do something

This is different from a for loop, because while a for loop will execute a predetermined number of times, the while loop will execute upon a condition being met.

For this loop to work, you need to set the condition before the loop, and change it in the loop.

You can write a for loop into a while loop by using a variable which will increment by one for each time the loop is run. This usually isn’t done, but we want to show something you’re already familiar with.

For example, recall the initial for loop:

ingredients <- cheddar cheese, potatoes, milk, salt, butter

for ingredient in ingredients
  write(ingredient)

An equivalent while loop would look like:

ingredients <- cheddar cheese, potatoes, milk, salt, butter

i = 1
condition_check = length(ingredients) + 1 
while i < condition_check
  write ingredients[i]
  i <- i + 1 // change for condition check

Here’s what it look like in R

ingredients <- c("cheddar cheese", "potatoes", "milk", "salt", "butter")

i <- 1
condition_check = length(ingredients) + 1

while(i < condition_check){
  print(ingredients[i])
  i <- i + 1
}

## [1] "cheddar cheese"
## [1] "potatoes"
## [1] "milk"
## [1] "salt"
## [1] "butter"

When we identify the variables to keep track of in a while loop trace, we need to add one to check if the condition for entering the loop is met.


loop iteration	i	ingredients[i]	i < condition_check

Try filling out this table for condition_check = length(ingredients) + 1 (condition_check = 6). What is the last value of ingredients[i] that is printed?

Now let’s try a thought exercise:

What would be the last value of ingredients[i] printed if condition_check = 3?
What would be the last value of ingredients[i] printed if condition_check = 8?

Answers

If condition_check = 3, then the loop would stop after the i = 2 iteration. Once i = 3, i < condition_check would be False, and the loop would not continue. So, the last printed value of ingredients[i] would be ingredients[2], which is potatoes.

In this case, we would be all out of values in ingredients before condition_check was False, meaning the loop would keep going even if there was nothing to return. Since there are only 5 values in ingredients, when we got to i = 6 and ingredients[6], the code would return NA (missing value). The same would happen for i = 7. So, our full results would look like:

[1] "cheddar cheese"
[1] "potatoes"
[1] "milk"
[1] "salt"
[1] "butter"
[1] NA
[1] NA

5.2.1 Other types of loops

Two other types of loops are a do while loop (check at end) and an until loop (check at end).

do
  something
while (condition is TRUE)

The logic in the do while loop is while a condition is true, the loop will run.
As with the while loop, the condition needs to be set before the loop, and changed within the loop.

For the until loop, it’s slightly different. The loop structure is like this

do
  something
until (condition is TRUE)

The logic is opposite of a do while loop. It will run while the condition is false, and will only exit when the condition is true.

These aren’t used often, in fact IMO it’s better to just re-write the logic to use a while loop.

5.2.2 Exercise - While Trace

Timing: 10 minutes You will need: a piece of paper and writing implement

Let’s practice a while combined with a for trace

condition_check <- 5
i <- 1
while (i < condition_check):
  for j in 1 to 3:
    write (i*j)+i

I’ll give you all a few minutes to write out your trace. Once you’re done, type the largest number you traced in the write statement, but don’t press enter until I give the signal.

5.3 Caveats

As you’ve just experienced, loops come with their own set of common issues.
Since loops are a frequently encountered concept in programming, we’ll go over the common problems.

5.3.1 Infinite loops

What you all just experienced in the previous exercise is called an infinite loop. Infinite loops happen when the condition check is never false. In the case of the above while loop, the check variable was never incremented, so the loop would go through the same process until the programmer interrupts it, the computer fails, or time ends (whichever comes first).

5.3.2 Overwriting outputs

for x in files(1 to 500) 
  rename_file(x, "test-case")

Often when researchers are developing a script, they will use test case to develop their algorithm and work out the bugs. In this way, if mistakes are made, it’s on a small scale and easy to correct. Most development is the following.

Develop code for a single case
Test on a few cases
Use on all of the cases

If a step is missed, it can be disastrous. I can attest. I wrote a script to rename about 500 files, and in my hubris of being an awesome coder, forgot to test it on a few cases before using it on all of the files. I forgot to change the single rename ("test-case") to something that incorporated the loop ("test-case"+x) and when the loop was finished, wondered why I went from 500 to only one file in the folder. Luckily, I always backup my data files, so it was an easy mistake to remedy (and makes for a good story on the importance of data backup).

5.4 Language examples: Formatting may vary

Programming languages may have specific formatting for loops. This may mean certain brackets must be used, tab indents are needed, or ranges are specified a certain way.

Formatting of outputs may also look slightly different, with some languages printing each output iteration on a new line, and some not.

Python requires indentation for code blocks. Code blocks in different structures must be aligned with the same indentation.

Programming structures such as for expect : at the end of each statement with the subsequent code block indented by one.

for x in range(0, 5):
  print(x)

R does not require indentation for code blocks; however, indentation is used because it’s easier for humans to read!

Instead of : R makes use of curly brackets {} to indicate the main body of a for statement.

for (x in 0:5) {
  print(x)
}

The following example is C/C++ syntax. One big difference between C/C++ and Python or R is C/C++ is a compiled programming language, which must first be compiled into a file (often *.exe, but can be anything) that contains the machine-language instructions to be executed.

C/C++ syntax looks a lot like R, but requires semicolons at the end of each line to be executed in a code block, while R does not require the semicolon (although R will also run if there are semicolons).

Note that C/C++ also uses the cout command to specify what should be returned to the console, rather than print like in Python or R.

for (int i = 0; i <= 5; i++) {
  cout << i;
}

PHP is very similar to C/C++, but is used for web development. Note that PHP uses echo rather than cout or print to specify what should be returned to the console.

for ($x = 0; $x <= 5; $x++) {
  echo $x";
}

4 Data types

6 Conditionals and Making Choices

num_rows = 4 & num_col = 3
Range 1 to num_rows is 1 2 3 4
Range 1 to num_col is 1 2 3
	i	j	(3(i-1))+(j2)
	1	1	(3(1-1))+(12)
	1	2	(3(1-1))+(22)
	1	3	(3(1-1))+(32)
	2	1	(3(2-1))+(12)
	2	2	(3(2-1))+(22)