R package naming conventions

Naming something in general, and especially in R, can be quite tricky because there seems to be no real consensus about naming conventions for packages and functions. Since I am currently in the process of releasing an R package, I wanted to get more recent data about the usage of different naming styles for R packages and package functions. I hypothesized that the users most probably are familiar with the naming conventions that are used by the most downloaded R packages and therefore it might make sense to adopt these naming styles.

For my small analysis project, I distinguished between the following naming styles:

  • lowercase (lc)
  • UPPERCASE (UC)
  • lowerCamelCase (lCC)
  • UpperCamelCase (UCC)
  • name_with_underscores (us / snake_case)
  • name.with.dots (dot)

Today (2017-06-04), I’ve pulled the package names and the names of the functions of a package from the API on rdocumentation for the 500 most downloaded R packages. After some processing, I created a contingency table of the naming styles of the function and package names and visualized it using the following mosaic plot. The modal value of the function naming style is used, if different styles occur. If lowercase and lowerCamelCase functions are used in one package, the lowercase occurrences are added to the lowerCamelCase counts.

Mosaic plot of R package names
Mosaic plot of R package names

First, we see that around 70% of the package names are in lowercase. This large proportion could be interpreted as a guideline for choosing a package name.

The function names are distributed in a more heterogeneous way. The combination of lowercase package names and lowerCamelCase function names appears most frequently. Also underscore, dot, and lowercase function names are used often in conjunction with lowercase package names.

Mosaic plot of R package function names
Mosaic plot of R package function names

When we have a look at the top 100 of the most downloaded R packages, we see that the combination between lowercase package names and function names with underscores is the second most frequent one.