13 Recap and Consultation Tips

The point of this workshop isn’t to make you an expert, but to build your skillset so you can read code and be more confident in consultations. Let’s first talk about where errors can pop up.

13.0.1 Clarifying variables

Something I’d commonly do when helping researchers with their code, especially when I’d see variable names which are abbreviations, is I’d ask them if the abbreviation is something which is commonly used in their field. My questions would usually go something like this:

So you know, I’m not an expert in your field. You’re the expert. Can you help me understand a little more about what you’re trying to do in the code?
What does w_f represent?
Is that something someone who is knowledgeable in your field would recognize?
It’s good practice to use self-describing variables which someone in the field would recognize. It’s also a good idea to keep track of information about your data in a README or data dictionary. Honestly, I’ve heard a lot of different terms to describe them, but it’s basically a file that describes your data. I can give you an example if you want.
Kat’s favorite dataset example: Leaf Dataset in UCI Data Science Repository (Silva and Maral 2013)
At the top of your code when you define your variables, if the variable name doesn’t exactly correlate to something in your dataset’s README file, you can add a comment with the exact term.

13.1 Narrowing Down Errors

You might think there’s a lot of places where errors can be introduced in code, especially if it’s a hundred lines, but you can usually narrow down the errors to a few places.

13.1.1 Why Doesn’t My Code Run?

The nested for loops and if else statements we practiced in the previous chapter is a useful skill and especially helpful when troubleshooting. For instance, if there is an error, a first item to check is often: are functions, especially loops and conditional statements, started and ended properly in matched fashion? If they aren’t, the error will be caused by a syntax error.

Examples of syntax errors are a missing quote, paren that’s supposed to close a delimiter pair, a missing semicolon, the computer trying to open a file that doesn’t exist in the location indicated by the path, and a malformed function call. With syntax errors, the code generally won’t even run, but will output error messages.

The nice thing about syntax errors is the code doesn’t run and will give you an indication of where to look to correct the error.

Here’s an example in R. The code is

if (x == y {
  print("values are equal");
} else if (x > y) {
  print("x greater than y");
} else {
  print("x must be less than y");
}

But, when we run it, the console indicates

> if (x == y {
Error: unexpected '{' in "if (x == y {"

Similarly, if we try to run something in Jupyter Notebook using Python with an error…

if (x == y:
  print("values are equal")
elif (x > y):
  print("x greater than y")
else:
  print("x must be less than y")

…we will get an error message…

   Cell In[7], line 1
     if (x == y:
               ^
SyntaxError: invalid syntax

…with the specific location where the error started

As stated in earlier chapters, it’s important to read through the error message, usually from bottom then working your way up. Sometimes error messages can be quite long. If you want to practice this for a particular language, there are recommended lessons in the Resources chapter.

13.1.2 The Code Runs, but Something is Wrong

It’s a little trickier to troubleshoot when the code runs, but does something unexpected. Unexpected output might look like incorrect results, it could be a graph or chart doesn’t look right. This is trickier because it might be the logic of the code.

Things I scan for before looking at the logic are making sure the correct data files are being requested, and making sure the location of the images the researcher is showing me matches with the file path indicated in the code.

If those match, then it’s useful to ask the researcher to walk through the code and explain it to you. Which brings us to tips for how to approach consultations.

13.2 Approaching Consultations

Consultations may seem scary, especially if you aren’t familiar with a particular language, but the nice thing about programming languages is the concepts are so similar across languages, many of the issues students and researchers come with aren’t specific to the language, per se. That is, sometimes it’s something as simple as a variable not being assigned properly. That isn’t language specific, but requires someone to be able to trace the code.

Here are some tips on approaching consultations

Prepare in Advance. Include a prompt for anyone scheduling a consultation to provide as much detail as possible. Please share anything that will help prepare for our meeting. Include your specific research question or problem. This will help you to prepare in advance. I’ve followed up via email, asking researchers if they’d share their code/data in advance.
Be clear with what you can and cannot do. You don’t need to be an expert in order to provide consultations for programming support, but you should be clear in what you can do. For example, I have provided many Stata consultations. I have never used Stata, and I am clear with that to establish reasonable expectations. Just so you know, I’m familiar with Stata, but not an expert. If you want, we can meet and talk through your research and code.
Have the researcher talk through their code. This does a few things: it helps you understand what they’re trying to do with their code and research project. More importantly, they will get a clearer understanding of what’s going on because they are describing it. Sometimes the act of them reading through and describing the code will allow the researcher to see some of the issues, and fix it on their own.
Programming vs statistics. Sometimes the researcher will ask for an explanation or interpretation of a PCA, linear regression, or other analysis. Full disclosure, I’ve taken some machine learning and stats classes, but I am not a statistician. Talking through the problem often helps.
- If they share that they’re using someone else’s code with their own data, the first thing to ask is if the code works with the original data. The next thing to ask is to walk through the similarities and differences with the new data, and how the variables of the different datasets relate.
- Know where they can get statistics consultations at your institution.
Troubleshooting vs Consult. Understanding the difference between these two will help you manage time and expectations.
- Troubleshooting is often shorter, even though it can take time depending on the problem. Troubleshooting benefits from having access to the code and data beforehand, especially for things that you aren’t familiar with. For example, I knew R, but for several years wasn’t familiar with the Tidyverse collection of packages. Having the code/data prior allowed me to save the time of the researcher (Ranganathan 1931), focus the time of our consultations, while giving me an opportunity to learn how to use Tidyverse.
- Consultations are often more in-depth than troubleshooting, and will include more how to do something, best practices for file/variable names, managing data, sometimes even how to program.
It’s ok to say you don’t know! As long as you clearly communicate what you can and can’t do, and share that you’re willing to help the researcher find assistance you can’t provide, most researchers will be thankful and understanding if you don’t know. One of the best professors I ever had would sometimes answer questions with I don’t know, but I’ll look that up and get back you you. It’s a philosophy I’ve incorporated into my professional services, I don’t know, but I’ll help you find the answers. Have a list of go-to documentation and learning resources to point to when needed.

12 Practice

14 Resources