Setting Up Your R Environment for Reproducible Science

Methodology
R
This blog post explains how to make your R code reproducible by avoiding absolute file paths and the setwd() function. It advocates for using RStudio Projects and relative paths, recommending the here package for robustly building file paths from the project root.
Author

Themis N. Efthimiou

Published

August 21, 2025

I’ve recently had the pleasure of reviewing several excellent manuscripts where the authors have gone the extra mile, making their data and analysis code publicly available on platforms like the Open Science Framework. This commitment to open science is fantastic and a crucial step towards greater transparency in research. However, a common pitfall often prevents this good practice from being truly effective: the lack of a reproducible R environment.

Many researchers, myself included at times, fall into the habit of using setwd() or relying on a specific local directory structure. While this works perfectly well for our own machines, it creates a significant roadblock for anyone trying to reproduce our work. When a reviewer, collaborator, or future researcher tries to run the code, it fails immediately because their file paths are different.

Worse still, as I’ve seen in one particular case, leaving directory paths with your name or institutional information can inadvertently de-anonymise a submission during a blind peer-review process. This can compromise the integrity of the review and is an entirely avoidable issue.

If you’re still not convinced, consider this: reproducible code makes your own life easier. It will significantly reduce the headaches if you ever need to switch between Windows and macOS. This is because Windows uses backslashes () for file paths, while macOS and Linux use forward slashes (/). A path copied from Windows Explorer will break on another system. By using methods like the here() package or relative paths, your code uses the universal forward slash, automatically resolving these dreaded back-slash issues. Making your code portable and functional “out of the box” doesn’t just save your collaborators from troubleshooting—it ensures your future self won’t have to either.

With that laundry list of reasons I thought I’d share how I started working, but you can also watch this useful video if a demonstration is better (https://youtu.be/StqDYjM6ULo?si=lCGSFP7NREf7lZdN). This rest of this post will explain how to set up your R project with parallel pathing, which makes your code portable and robust. The core idea is to rely on relative paths, not absolute ones. This means that your code will find files based on their location relative to the project’s root folder, rather than a fixed location on your hard drive.

Here’s a simple, step-by-step guide to get you started:

Step 1: Use RStudio Projects

The easiest way to manage this is by using RStudio Projects. When you open a new project, RStudio creates an .Rproj file. This file tells R that the directory it resides in is the “root” of your project. All your scripts, data, and outputs should be organised within this project folder.

Step 2: Avoid setwd()

With an RStudio Project, you no longer need to use setwd(). RStudio automatically sets your working directory to the project’s root when you open it. This single change eliminates the primary source of irreproducibility.

Step 3: Structuring Your Project

A common and effective project structure looks something like this:

MyProject/
├── data/
│   ├── raw_data.csv
│   └── cleaned_data.csv
├── scripts/
│   ├── 01_data_cleaning.R
│   └── 02_analysis.R
├── outputs/
│   └── my_plot.png
├── README.md
└── MyProject.Rproj

Step 4: Choose Your Pathing Method

You now have two primary methods for referencing files reproducibly. Both are better than setwd(), but each has its own pros and cons.

Method 2: Relative Paths with ../

An alternative, which doesn’t require an additional package, is to use standard relative pathing. The ../ syntax means “move up one directory level”. If your script is in the scripts folder and you want to access a file in the data folder, you can go up a level to the project root and then down into the data folder.

For example, a script in scripts/ could access data with this code:

# The relative pathing way
data <- read.csv("../data/raw_data.csv")

This method is simple and effective for many projects. However, a key caveat is that if you move the script to a different location within your project (e.g., to a new subdirectory scripts/analysis/), this path will break because the relative relationship has changed. This is why the here package is often a more reliable choice, as it always builds paths from the project root.

By adopting one of these practices, you ensure that your code is not just available, but truly reproducible. You remove a significant barrier for those trying to understand and build upon your work, and you protect yourself from accidental deanonymisation during the review process. It’s a small change that makes a big difference for the entire research community.