Tips for organising your R code

Posted on January 31, 2023 by tomaztsql — 4 Comments

Keeping your R code organised is not as straightforward as one might think. Just think about the libraries, variables, functions, and many more. All these objects can be defined and later rewritten, some might get obsolete during the process.

This process is proven to be even more crucial when you are part of a larger group of engineers, and scientists, who collaborate with you.

Motivation

The most important step toward code reproducibility is to keep code organised, and atomic, storing it by layers or components and keeping up-to-date documentation. Because R language is a scripting language, organising files into directories and subdirectories is important for later re-usage, and collaboration with different departments in an organisation.

Ideally, the names of files and subdirectories are self-explanatory, so that one can tell at a glance what data files contain, what scripts do, and what came from what.

The following R tips are based on frequent problems, many organisations are facing. All R samples and themes are created to be fictional and can be attached to your organisational environment. Dataset used is the iris dataset, used to show the custom theme and use of functions. All images unless otherwise noted are by the author.

1. Organising R files

You can always call R files, libraries, functions and settings from a different file. This gives a great segue to creating a folder structure, where each developer can clone or access and get all necessary files, themes, and functions that the organisation is pushing.

Structuring R files, functions, data and many more is an essential step toward reproducibility.

Using Projects is a great place to start (also available in Posit — RStudio), but you can always create your own structure, that will help you with code and file organisation.

2. Installing and attaching R libraries

Installing and attaching R libraries is in almost all cases part of the R code. Whenever you are writing R code, there will be a point, that you will be referencing to an external library.

You don’t want to install and attach single or multiple libraries as the sample of the pseudo-code below.

install.package("a")
install.package("b")
library(a)
library(b)

Instead, you can create a string vector with libraries names and install them if they do not exist and attach them with a shorter code:

required_Packages_Install <- c("ggplot2", "caret", "leaflet", "plotly", "magick")

for(Package in required_Packages_Install){
  if(!require(Package,character.only = TRUE)) { 
      install.packages(Package, dependencies=TRUE)
  }
  library(Package,character.only = TRUE)
}

In many enterprise environments, you might have issues with installing some packages on your local disk. These packages may contain *.zip or *.exe files and the security policy will deny them. In this case, the best solution is to install dedicated folder(s), as introduced in “Organise R files”. You will have to add the installation path and loading path for the packages.

The next step is to use a TXT file and write down all the packages needed for an R project/script. Let’s create a requirements.txt file (just like with YAML, Python,…) and put it inside package names.

Consider two R packages to help you achieve installation from the requirements.txt file. These two are requiRements and versions. Both are similar if your package list is stored in requirements.txt, but the versions function will also take the package version as input, which brings a whole new capability. On the other hand, the base function install.packages() gives you the possibility to specify in detail the arguments as repos, lib (path), destdir and many system variables. But these will be essential to store packages at the desired location.

If you want to simplify the process, you can always create a ZIP file of all working packages and restore and install it at any given time. In this case, it is advised to add the R version in the ZIP file as well.

Protip: use library() instead of require(). The first one will fail and give you a warning, whereas, the require() will silently fail, causing you later failures in the code.

3. Use the corporate themes

Every corporate environment should follow the theme, with predefined colours, table design, pixel-perfect diagrams and positions. With the ggplot package, you can create a theme, that will follow your design guidelines.

Furthermore, adding a condition to your theme, that the same colours will always reflect the same KPI, can also be achieved with the themes.

library(ggplot2)
iris <- iris
ggplot(data = iris, aes(Sepal.Length)) + geom_bar(color="grey", fill="red") +
  labs(x = "Length of Sepal", 
       y = "Count of flowers", 
       title = "Number of flowers \nby sepal length",
       caption = "Source: IRIS Dataset \nBase R Package")
theme_organisation <- function(){
font <- "Times New Roman"
  theme(
    panel.grid.major.x = element_blank(),
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.minor.y = element_blank(),    
    panel.border = element_rect(colour = "black", fill = NA, linetype = 3, size = 0.5),
    panel.background = element_rect(fill = "#05a6f0"),
    legend.position = "bottom",
    plot.title = element_text( family = font, size = 20, face = 'bold', hjust = 0,vjust = 2),               
    axis.text = element_text(family = font,size = 9),                
    axis.text.x = element_text(margin=margin(5, b = 10))
  )
}
ggplot(data = iris, aes(Sepal.Length)) + geom_bar(color="grey", fill="red") +
labs(x = "Length of Sepal", 
     y = "Count of flowers", 
     title = "Number of flowers \nby sepal length",
     caption = "Source: IRIS Dataset \nBase R Package") +
  theme_organisation()
library(magick)
logo <- image_read("../Useless_R_functions/image/myiriscompany.png")
#adding the logo
grid::grid.raster(logo, x = 0.1, y = 0.02, just = c('left', 'bottom'), width = unit(1.9, 'inches'))

Besides graphs, tables can also follow a similar theme. R package flextable offers a great framework for creating tables with astonishing formats, layouts, cell formats and plotting capabilities.

library(flextable)
myiris <- flextable(head(iris), 
                         col_keys = c("Species", "Petal.Width", "Sepal.Length", "Sepal.Width" ))
myiris <- color(myiris, ~ Sepal.Length > 4.5, ~ Sepal.Length, color = "red")
myiris <- add_header_row(
  x = myiris, values = c("Name and Petals", "Measures on Sepal"),colwidths = c(2, 2))
myiris

With both packages, you will be able to create corporative reports, with capabilities to export them to different tools or formats (word, PDF, PowerPoint, HTML, and others).

4. Use coding practices and never forget to document

There are many sections, that will improve your code readability and reusability. I have grouped them into scopes, that each delivers better code.

Documenting code

Starting your code with an annotated description of what the code does when it is run will help you when you have to look at or change it in the future. Give the author name, date and, change log.
Loading all of the dependencies, packages and files in accordance with your file structure. Also, add the global environments and R engine version. In addition, a nice way to do this is also to indicate which packages are necessary to run your code.
Use setwd()to determine the files (script, project or packages) location, unless there are standards in your organisation, that make this obvious.
Use comments to mark off sections of code.
Comment your code with care. Comments should explain the why, not the what. Add comments to your function with the added description of all input arguments and result set.

Syntax practices

Place spaces around all operators (=, +, -, <-, boolean, etc).
Use <-, not =, for the assignment.
When using packages with similar function names, add a package name to the function: dplyr::filter() and a Filter() function from base R.
To improve readability, indent the code inside the curly braces. You can also use the formatR package to help you refactor and indent your code.
Factor out common operations rather than repeating them. And keep your code in smaller chunks. If a single function or loop gets too long, consider looking for ways to break it into smaller pieces.
There is a 80 characters line, that will help you comfortably fit code on a printed page at a reasonable size. If you find yourself running out of room, consider encapsulating some of the work in a separate function.

Naming convention

There are many naming conventions to choose from and all are ok, as long as you are using the selected one consistently. I will just list a few:

alllowercase: e.g. irisdataset
period.separated: e.g. iris.dataset
underscore_separated: e.g. iris_dataset
lowerCamelCase: e.g. addIrisDataset
UpperCamelCase: e.g. AddIrisDataset

Keep names concise and meaningful, nouns and verbs should be used in functions and variables. Give the function a verb, eg.: add, calculate, reduce, and give a variable a noun, eg.: calculatedNumbers, vectorOfValues.

In general, you can separate helper functions with a prefix of “.”. And also distinguish between local and global variables, data objects and functions.

Also, store your files with meaningful names and always store them in *.R

Posit — RStudio tips

Choose your IDE. Consider using Posit — RStudio. My second favourite for writing R code is Visual Studio code.
There is no need to save the current workspace if you are writing reproducible code. You should be able to reproduce the workspace by re-running your script.
Keep track of data, variables, and functions versions, and use also integrated facilities to access SVN or Github.
R projects are a great way to organize your script files, and your outputs consider using Markdown to prepare finalised reports of your analysis
Check the memory used, use a garbage collector (gc()) and it always helps to keep session information in your project.

The complete code is available on Github in this repository.

This Article was originally published on Medium : https://medium.com/@tomazkastrun/tips-for-organising-your-r-code-ebbda2309b8

Tagged with: R, structure
Posted in Useless R functions

4 comments on “Tips for organising your R code”

Organizing R Code – Curated SQL says:

January 31, 2023 at 2:10 pm

[…] Tomaz Kastrun tidies up: […]

LikeLike

Reply
StatistikinDD says:

February 24, 2023 at 5:29 pm

Thanks for this post, Thomas!
So important to organize code, and to achieve consistency across projects.

Don’t agree with all details, though …
Wouldn’t use setwd(). Breaks code whenever locations change (e. g. archiving projects, working on a laptop on a business trip as opposed to a desktop pc at the office, …). Prefer to use RStudio projects, the here package, and relative file paths.

Wouldn’t automate installing packages. Hadley is strongly opposed to this – the user should actively agree to doing that. May not always be welcome in any given environment.

LikeLike

Reply
- tomaztsql says:
  
  February 24, 2023 at 6:27 pm
  
  Thanks for your point of view. I highly appreciate the debate.
  
  I must say, that I absolutely agree that RStudio project is the way to go. Outside of that, I was exploring the organization of the code.
  
  As for installing packages, I like the Python(ian) way of automation of environment preparation and installing packages with particular version can be an advantage. Especially, when reproducing the existing code.
  
  LikeLike
  
  Reply
  - StatistikinDD says:
    
    February 24, 2023 at 6:30 pm
    
    > As for installing packages, I like the Python(ian) way of automation of environment preparation …
    
    Fair enough. Worth to hint at renv, or maybe even docker.
    
    LikeLike

	tomaztsql on Retrieving user access list to…
	Paola A Zambrano on Retrieving user access list to…
	“Reverse Hello… on Little useless-useful R functi…
	Max Petter on Using R and Python in Microsof…
	detlef kissel on Using R and Python in Microsof…