8 Functions
8.1 What is a function?
A “function” in programming is a piece of code that does a specific task, but packages it so it can be called with a single line of code. Functions can be predefined by developers, and also may be custom to any script.
We’ve already been using examples of functions in our psuedocode - such as when we would include the write
statement - and formal functions, such as the print
, cout
, or echo
statements we saw in the conditional examples at the end of the previous chapter.
The print
function does something very simple, which is provide output, either on the R or Python console or interface, that people can read. It is simple to use, but internal code that creates the print function is a little more complex (to say the least). Since communicating output is ubiquitous in all computer languages, each language has some sort of print function as a built in function so it makes it easier for programmers to focus on their particular code.
Functions can do something very simple or very complex. But in essence, functions are used when you know you’re going to do the same task over and over again.
Take our cheesy mashed potatoes example. A simple function might be something like peel_potato()
, which would take our first example algorithm on peeling potatoes, and create a function.
The original algorithm Algorithm for peeling potatoes for Cheesy Mashed Potatoes
- Get three pounds of russet potatoes.
- Get bowl large enough to hold three pounds of russet potatoes.
- Get a vegetable peeler.
- Take one potato.
- Peel potato.
- Rinse potato with water.
- Place peeled potato in bowl
- Repeat 4-7 until all potatoes are washed and peeled
But, we don’t want to create a function for the whole thing. We want to create a function that allows us to easily execute what needs to be repeated, Steps 4-7. So the function might look like
define function peel_potato(potato, peeler):
while(potato has skin):
location <- on potato, locate skin
potato <- at location on potato, with peeler remove strip of skin
potato <- rotate potato
// end-while (this is a comment, it begins with // )
potato <- rinse potato with water
return (potato)
// end function definition
A bonus of functions, is you can make them generic if you know the action can be applied over many objects. Perhaps you want to peel both potatoes and carrots, you can make a function for that.
define function peel_vegetable(vegetable, peeler):
while(vegetable has skin):
location <- on vegetable, locate skin
vegetable <- at location on vegetable, with peeler remove strip of skin
vegetable <- rotate(vegetable)
// end-while
vegetable <- rinse_with_water(vegetable)
return (vegetable)
// end function definition
Now the peel_vegetable
function can be used with multiple objects. (Well, it could before, but using the function name peel_potato
would be confusing when trying to peel carrot
or zucchini
. Naming matters, especially for understandability!)
Now that we’ve created a function for peeling vegetables, our algorithm might look like this
potatoes <- 16
peeler <- "swiss peeler"
bowl <- [] // empty array
for potato in potatoes:
bowl <- bowl+peel_vegetable(potato, peeler)
// end for-statement
8.1.1 New Programming Concepts
Sometimes, we may want to add a variable, or function output, to an existing variable type that can accept new information. Other times, we may want to change a variable in a completely different way.
8.1.1.1 Adding to a Variable
These two examples add to something that already exists:
bowl <- bowl+peel_vegetable(potato)
x <- 0; x <- x + 1
A key piece of info here is the original variable needs to be of a data type that can accept the new piece of data.
We defined the bowl as an array []
, so we can keep adding the same data type (potato, maybe vegetable) to the array.
potatoes <- 16
peeler <- "swiss peeler"
bowl <- [] // empty array
for potato in potatoes:
bowl <- bowl+peel_vegetable(potato, peeler)
// bowl array after for loop finishes
[potato, potato, potato, potato,...] // 16 'potato' items in array
For the other example, if we tried to add two different data types
x <- "zero"; x <- x + 1
The computer would return an error message, because adding text and numbers, in "zero" + 1
, is not possible.
The task of adding content to something that already exists happens so often that some languages, such as C/C++, will combine assignment operators with math functions, as in +=
. In this case, x += 1
is the same as x = x + 1
8.1.1.2 Modifying a Variable
For this statement
vegetable <- rotate(vegetable)
the vegetable is rotating, and being reassigned to the same variable name. This is a little more tricky and in need of caution, because you’re completely rewriting the value in vegetable
instead of modifying it. If it were to be modified to a different data type accidentally, the program would still execute with no error message for this assignment, because it simply sees a variable assignment.
x <- 123
x <- as.text(x) // accidentally uses the wrong function
// now the data type of x is a string
// many lines of code later
x <- x + 1 // this will cause an error in the code
The error will happen because a string and a numeric value can’t be added together. But, this will be tricky to troubleshoot because the error will result from the x <- x + 1
part of the code, even though the actual error was the assignment which happened many lines prior in x <- as.text(x)
.
8.2 Back to Functions
A complex function might be something like make_cheesy_mashed_potatoes()
, which is the equivalent to having an in-home chef who can make you cheesy mashed potatoes on demand, as long as you provide the raw ingredients.
make_cheesy_mashed_potatoes(cheese, potatoes, milk, salt, butter)
In programming parlance, these ingredients listed (cheese, potatoes, milk, salt, butter)
would be the “arguments”.
8.2.1 Arguments
Functions will have inputs (usually) and outputs. The function inputs are formally called “arguments,” and are specified by the programmer, entered within the parentheses as part of the function call. For cheesy mashed potatoes, this could be specifying what kind of cheese to use, or the quantity of each ingredient.
While our example is in pseudocode, luckily function calls look the same across most languages.
# these variables have quantities in oz
cheese <- 16
potatoes <- 32
milk <- 8
salt <- 0.125
butter <- 4
make_cheesy_mashed_potatoes(cheese, potatoes, milk, salt, butter)
paste("this function uses", cheese, "oz of cheese")
We already mentioned that print
is a common built-in function. Most languages a number of built-in functions which are commonly used, such as the paste
function in R, represented above. Let’s look at another built in function that is common to both R and Python: round()
Built-in functions have help pages which will allow you to see the syntax and arguments. The help files will include definitions of the arguments, if they’re required or have a default, and examples for use.
The syntax for Python is round(number, ndigits=None)
The syntax for R is round(x, digits = 0, ...)
, where x
is a numeric vector.
The second argument for both languages indicates the number of digits along with the default value. Since there’s a default value, you don’t necessarily need to include the argument in the function call.
An example in R is
## [1] 1 2 4
Another argument common in many functions which deal with calculating something is how to handle missing or NA
values.
For example, the function mean()
in R is mean(x, trim = 0, na.rm = FALSE)
. The default value of na.rm
(which you can think of as “NA remove”) is FALSE
, indicating that NA
values will not be removed from the calculation.
Optional arguments like this may come with a pre-supplied default, in this case that na.rm = FALSE
and any NA
values will be retained, unless this argument is manually changed to TRUE
.
8.3 Function Takeaways
When working with functions, which is the vast majority of programming, it is important to keep in mind two things:
- The arguments in a function
- Which arguments are optional, and what are their defaults
A function will run as long as the required arguments are provided, but the resulting output may not match expectations unless you recognize which optional arguments were included with default values.
8.4 Libraries
We’ve already discussed built-in functions. Built in functions are often limited to basic tasks and do not include more complex or custom functions that you may with to use. Now, you can code more complex functions yourself, building off of the built in functions, but this would take a lot of time and require more in-depth programming knowledge.
The good news is that most programming languages will have optional “libraries” (or packages, or modules, depending on what term your programming language of choice uses) that include additional functions, beyond the built in function. So before creating a new function from scratch, it is worthwhile to check whether a library exists that includes a function that does what you want to do.
Using our cheesy mashed potatoes example, you might use a programming library to find a recipe that uses different ingredients. You want to make cheesy mashed potatoes from scratch, but don’t have the time to do so. Luckily someone else has created function that uses instant mashed potatoes instead of raw potatoes.
Libraries are also useful because they mean that your computer and programming language of choice don’t need to always have every single function on hand. This set up saves computer disk space, ensures you don’t have to recreate the wheel and make every function from scratch, and provides a level of standardization (e.g., everyone uses the same reference “recipe” so output should be the same for the same input, across users).
A good rule of thumb is if it seems like the function you want is broadly useful, then someone has likely created a library containing it. This is also true for niche or domain-specific functions: if the task is one that comes up a lot in analysis, there is likely a library that has functions for those analysis tasks.
Finding the ‘right’ library for the function you need can be overwhelming, but a good starting point is the official library collection for a programming language, such as CRAN for R or PyPI for Python.
8.4.1 Accessing Functions in Libraries
The syntax for accessing functions in libraries varies by programming language but follows the general process of:
- Install the library from the source. You only need to do this once.
- Load or import the library. You will need to do this every time you want to access a function in a library. By convention, libraries are loaded at the top of a script, so you, and other people, can see at a glance what libraries are needed to run the script.
- Use the functions as normal.
8.4.2 Function Names
There are only so many function names that make sense in the English language, so there may be functions from different libraries that have the same name. How does the programming language know which function you are trying to access? By default, the language will use the function of the more recently loaded or imported package.
Let’s say we have two mean()
functions, one from library_A
and one from library_B
. They differ in their default settings:
-
library_A
defauls tona.rm = TRUE
-
library_B
defaults tona.rm = FALSE
If we load libraries in order library_A
then library_B
and then use mean()
as a function, we will be using the mean()
function from library_B
.
load(library_A)
load(library_B)
values <- c(2, 4, 7, 5, 9, NA)
mean(values)
Our result will be NA
.
Conversely, if we load library_B
then library_A
, we will use the mean()
function from library_A
.
load(library_B)
load(library_A)
values <- c(2, 4, 7, 5, 9, NA)
mean(values)
Our value will be 5.4
.
The tricky part is that all this happens invisibly. There may or may not - depending on your programming language - be a warning that two libraries contain functions of the same name. So, keeping track of your order of loading is important. If you get any unexpected results, you can double check which library the function you are using is from.
A better method is to be explicit about which function you are calling. Most programming languages will allow a syntax along the lines of library:function()
to specify use of a function from a stated library.
load(library_B)
load(library_A)
values <- c(2, 4, 7, 5, 9, NA)
library_B:mean(values)
#result will be `NA`
8.4.3 Aliases
You may encounter the concept of an “alias” for a library. This is common in Python, where users can set an alias for a library name, and use that going forward rather than writing out the full library name. This will mostly come up if you are looking for help online, or wondering why you are seeing abbreviations.
For example, Python uses the import
term to load a library (or “package” as Python calls them) and allows setting an alias using import package as alias
syntax. By convention, many Python users will use standard aliases for common packages, such as:
import pandas as pd
import numpy as np
Functions can then be called using the explicit package:function
syntax, such as pd.DataFrame
to designate a pandas
DataFrame
object.
8.4.4 Exercise: Create Popcorn Function
Timing: 15 minutes
Let’s return to the section 2 activity where your group created an algorithm to make popcorn.
- How would you adapt your step by step narrative algorithm to be a function called
make_popcorn()
? - What arguments would be included, and would you have any default values set?
- Does your function look different than others in your group? How would you tell the computer which function you wanted to use?
Example functions
# specify which library we want to get a function from,
# in this case instructor Steph
from Steph import make_popcorn()
# The arguments and defaults for this library is
#
# make_popcorn(quantity_popcorn, type = microwave, time = 2,
# butter = movie_theater_flavor, seasoning = TRUE)
#
# where time is in minutes and quantity_popcorn is in bags
# example in use
make_popcorn(quantity_popcorn = 1, time = 1.5)
# specify which library we want to get a function from,
# in this case instructor Kat
from Kat import make_popcorn()
# The arguments and defaults for this library is
#
# make_popcorn(quantity_popcorn, quantity_oil = 1, type = stovetop,
# butter = TRUE, salt = "light")
#
# where quantity is in tbsp
# example in use
make_popcorn(2)