8 Functions

8.1 What is a function?

A “function” in programming is a piece of code that does a specific task, but packages it so it can be called with a single line of code. Functions can be predefined by developers, and also may be custom to any script.

We’ve already been using examples of functions in our psuedocode - such as when we would include the write statement - and formal functions, such as the print, cout, or echo statements we saw in the conditional examples at the end of the previous chapter.

The print function does something very simple, which is provide output, either on the R or Python console or interface, that people can read. It is simple to use, but internal code that creates the print function is a little more complex (to say the least). Since communicating output is ubiquitous in all computer languages, each language has some sort of print function as a built in function so it makes it easier for programmers to focus on their particular code.

Functions can do something very simple or very complex. But in essence, functions are used when you know you’re going to do the same task over and over again.

Take our cheesy mashed potatoes example. A simple function might be something like peel_potato(), which would take our first example algorithm on peeling potatoes, and create a function.

The original algorithm Algorithm for peeling potatoes for Cheesy Mashed Potatoes

  1. Get three pounds of russet potatoes.
  2. Get bowl large enough to hold three pounds of russet potatoes.
  3. Get a vegetable peeler.
  4. Take one potato.
  5. Peel potato.
  6. Rinse potato with water.
  7. Place peeled potato in bowl
  8. Repeat 4-7 until all potatoes are washed and peeled

But, we don’t want to create a function for the whole thing. We want to create a function that allows us to easily execute what needs to be repeated, Steps 4-7. So the function might look like

define function peel_potato(potato, peeler):

  while(potato has skin):
    location <- on potato, locate skin
    potato <- at location on potato, with peeler remove strip of skin
    potato <- rotate potato
  // end-while (this is a comment, it begins with // )
  
  potato <- rinse potato with water
  
  return (potato)
// end function definition

A bonus of functions, is you can make them generic if you know the action can be applied over many objects. Perhaps you want to peel both potatoes and carrots, you can make a function for that.

define function peel_vegetable(vegetable, peeler):

  while(vegetable has skin):
    location <- on vegetable, locate skin
    vegetable <- at location on vegetable, with peeler remove strip of skin
    vegetable <- rotate(vegetable)
  // end-while
  
  vegetable <- rinse_with_water(vegetable)
  
  return (vegetable)
// end function definition

Now the peel_vegetable function can be used with multiple objects. (Well, it could before, but using the function name peel_potato would be confusing when trying to peel carrot or zucchini. Naming matters, especially for understandability!)

Now that we’ve created a function for peeling vegetables, our algorithm might look like this

potatoes <- 16
peeler <- "swiss peeler"
bowl <- [] // empty array

for potato in potatoes:
  bowl <- bowl+peel_vegetable(potato, peeler)

// end for-statement  

8.1.1 New Programming Concepts

Sometimes, we may want to add a variable, or function output, to an existing variable type that can accept new information. Other times, we may want to change a variable in a completely different way.

8.1.1.1 Adding to a Variable

These two examples add to something that already exists:

  • bowl <- bowl+peel_vegetable(potato)
  • x <- 0; x <- x + 1

A key piece of info here is the original variable needs to be of a data type that can accept the new piece of data

We defined the bowl as an array [], so we can keep adding the same data type (potato, maybe vegetable) to the array. 

potatoes <- 16
peeler <- "swiss peeler"
bowl <- [] // empty array

for potato in potatoes:
  bowl <- bowl+peel_vegetable(potato, peeler)

// bowl array after for loop finishes
[potato, potato, potato, potato,...] // 16 'potato' items in array

For the other example, if we tried to add two different data types

  • x <- "zero"; x <- x + 1

The computer would return an error message, because adding text and numbers, in "zero" + 1, is not possible.

The task of adding content to something that already exists happens so often that some languages, such as C/C++, will combine assignment operators with math functions, as in +=. In this case, x += 1 is the same as x = x + 1

8.1.1.2 Modifying a Variable

For this statement

vegetable <- rotate(vegetable)

the vegetable is rotating, and being reassigned to the same variable name. This is a little more tricky and in need of caution, because you’re completely rewriting the value in vegetable instead of modifying it. If it were to be modified to a different data type accidentally, the program would still execute with no error message for this assignment, because it simply sees a variable assignment.

x <- 123
x <- as.text(x) // accidentally uses the wrong function
// now the data type of x is a string

// many lines of code later

x <- x + 1  // this will cause an error in the code

The error will happen because a string and a numeric value can’t be added together. But, this will be tricky to troubleshoot because the error will result from the x <- x + 1 part of the code, even though the actual error was the assignment which happened many lines prior in x <- as.text(x).


8.2 Back to Functions

A complex function might be something like make_cheesy_mashed_potatoes(), which is the equivalent to having an in-home chef who can make you cheesy mashed potatoes on demand, as long as you provide the raw ingredients.

make_cheesy_mashed_potatoes(cheese, potatoes, milk, salt, butter)

In programming parlance, these ingredients listed (cheese, potatoes, milk, salt, butter) would be the “arguments”.

8.2.1 Arguments

Functions will have inputs (usually) and outputs. The function inputs are formally called “arguments,” and are specified by the programmer, entered within the parentheses as part of the function call. For cheesy mashed potatoes, this could be specifying what kind of cheese to use, or the quantity of each ingredient.

While our example is in pseudocode, luckily function calls look the same across most languages.

# these variables have quantities in oz
cheese <- 16
potatoes <- 32
milk <- 8
salt <- 0.125
butter <- 4

make_cheesy_mashed_potatoes(cheese, potatoes, milk, salt, butter)

paste("this function uses", cheese, "oz of cheese")

We already mentioned that print is a common built-in function. Most languages a number of built-in functions which are commonly used, such as the paste function in R, represented above. Let’s look at another built in function that is common to both R and Python: round()

Built-in functions have help pages which will allow you to see the syntax and arguments. The help files will include definitions of the arguments, if they’re required or have a default, and examples for use.

The syntax for Python is round(number, ndigits=None)

The syntax for R is round(x, digits = 0, ...), where x is a numeric vector.

The second argument for both languages indicates the number of digits along with the default value. Since there’s a default value, you don’t necessarily need to include the argument in the function call.

An example in R is

round(c(1.4, 2.5, 3.5))
## [1] 1 2 4

Another argument common in many functions which deal with calculating something is how to handle missing or NA values.

For example, the function mean() in R is mean(x, trim = 0, na.rm = FALSE). The default value of na.rm (which you can think of as “NA remove”) is FALSE, indicating that NA values will not be removed from the calculation.

Optional arguments like this may come with a pre-supplied default, in this case that na.rm = FALSE and any NAvalues will be retained, unless this argument is manually changed to TRUE.

8.3 Function Takeaways

When working with functions, which is the vast majority of programming, it is important to keep in mind two things:

  1. The arguments in a function
  2. Which arguments are optional, and what are their defaults

A function will run as long as the required arguments are provided, but the resulting output may not match expectations unless you recognize which optional arguments were included with default values.

8.4 Libraries

We’ve already discussed built-in functions. Built in functions are often limited to basic tasks and do not include more complex or custom functions that you may with to use. Now, you can code more complex functions yourself, building off of the built in functions, but this would take a lot of time and require more in-depth programming knowledge.

The good news is that most programming languages will have optional “libraries” (or packages, or modules, depending on what term your programming language of choice uses) that include additional functions, beyond the built in function. So before creating a new function from scratch, it is worthwhile to check whether a library exists that includes a function that does what you want to do.

Using our cheesy mashed potatoes example, you might use a programming library to find a recipe that uses different ingredients. You want to make cheesy mashed potatoes from scratch, but don’t have the time to do so. Luckily someone else has created function that uses instant mashed potatoes instead of raw potatoes.

Libraries are also useful because they mean that your computer and programming language of choice don’t need to always have every single function on hand. This set up saves computer disk space, ensures you don’t have to recreate the wheel and make every function from scratch, and provides a level of standardization (e.g., everyone uses the same reference “recipe” so output should be the same for the same input, across users).

A good rule of thumb is if it seems like the function you want is broadly useful, then someone has likely created a library containing it. This is also true for niche or domain-specific functions: if the task is one that comes up a lot in analysis, there is likely a library that has functions for those analysis tasks.

Finding the ‘right’ library for the function you need can be overwhelming, but a good starting point is the official library collection for a programming language, such as CRAN for R or PyPI for Python.

8.4.1 Accessing Functions in Libraries

The syntax for accessing functions in libraries varies by programming language but follows the general process of:

  1. Install the library from the source. You only need to do this once.
  2. Load or import the library. You will need to do this every time you want to access a function in a library. By convention, libraries are loaded at the top of a script, so you, and other people, can see at a glance what libraries are needed to run the script.
  3. Use the functions as normal.

8.4.2 Function Names

There are only so many function names that make sense in the English language, so there may be functions from different libraries that have the same name. How does the programming language know which function you are trying to access? By default, the language will use the function of the more recently loaded or imported package.

Let’s say we have two mean() functions, one from library_A and one from library_B. They differ in their default settings:

  • library_A defauls to na.rm = TRUE
  • library_B defaults to na.rm = FALSE

If we load libraries in order library_A then library_B and then use mean() as a function, we will be using the mean() function from library_B.

load(library_A)
load(library_B)

values <- c(2, 4, 7, 5, 9, NA)
mean(values)

Our result will be NA.

Conversely, if we load library_B then library_A, we will use the mean() function from library_A.

load(library_B)
load(library_A)

values <- c(2, 4, 7, 5, 9, NA)
mean(values)

Our value will be 5.4.

The tricky part is that all this happens invisibly. There may or may not - depending on your programming language - be a warning that two libraries contain functions of the same name. So, keeping track of your order of loading is important. If you get any unexpected results, you can double check which library the function you are using is from.

A better method is to be explicit about which function you are calling. Most programming languages will allow a syntax along the lines of library:function() to specify use of a function from a stated library.

load(library_B)
load(library_A)

values <- c(2, 4, 7, 5, 9, NA)
library_B:mean(values)

#result will be `NA`

8.4.3 Aliases

You may encounter the concept of an “alias” for a library. This is common in Python, where users can set an alias for a library name, and use that going forward rather than writing out the full library name. This will mostly come up if you are looking for help online, or wondering why you are seeing abbreviations.

For example, Python uses the import term to load a library (or “package” as Python calls them) and allows setting an alias using import package as alias syntax. By convention, many Python users will use standard aliases for common packages, such as:

import pandas as pd
import numpy as np

Functions can then be called using the explicit package:function syntax, such as pd.DataFrame to designate a pandas DataFrame object.

8.4.4 Exercise: Create Popcorn Function

Timing: 15 minutes

Let’s return to the section 2 activity where your group created an algorithm to make popcorn.

  • How would you adapt your step by step narrative algorithm to be a function called make_popcorn()?
  • What arguments would be included, and would you have any default values set?
  • Does your function look different than others in your group? How would you tell the computer which function you wanted to use?
Example functions
# specify which library we want to get a function from, 
# in this case instructor Steph

from Steph import make_popcorn()

# The arguments and defaults for this library is
#
# make_popcorn(quantity_popcorn, type = microwave, time = 2, 
#              butter = movie_theater_flavor, seasoning = TRUE)
#
# where time is in minutes and quantity_popcorn is in bags

# example in use

make_popcorn(quantity_popcorn = 1, time = 1.5)
# specify which library we want to get a function from, 
# in this case instructor Kat

from Kat import make_popcorn()

# The arguments and defaults for this library is
#
# make_popcorn(quantity_popcorn, quantity_oil = 1, type = stovetop, 
#              butter = TRUE, salt = "light")
#
# where quantity is in tbsp

# example in use

make_popcorn(2)

8.5 Language examples: Formatting may vary [may not need in this section, since round() and mean() included above]

Python


R


C/C++


PHP


Bash