R Packages: Organize, Test, Document, and Share Your Code

Author: Hadley Wickham
4.6
This Month Stack Overflow 1

Comments

by anonymous   2019-01-13

Bdecaf,

Ok, answer version 2.0 - chuckle.

You mentioned that "The question is how Makefiles and the package build workflow are supposed to go together". In that context, my recommendation is you review a set of example R package makefiles:

  • Makefile for Yihui Xie's knitr package for R.
  • Makefile for my R/qtlcharts package.

The knitr package makefile (in my view) provides a good example of how to build vignettes. You need to review the makefile and directory structure, that would be the template I would recommend you review and use.

I'd also recommend you look at maker, a Makefile for R package development. On top of this, I would start with Karl Broman guides - (this is what I used myself as a source reference a while back now eclipsed by Hadley's book on packages but still useful (in my view).

  • Minimal make: A minimal tutorial on Make
  • R package Primer.

The other recommendation is to read Rob Hynman's article I referenced previously

  • Makefiles for R/LaTeX projects

between them, you should be able to do what you request. Above and beyond that you have the base R package manual you referenced.

I hope the above helps.

T.


Referenced pages:

minimal make A minimal tutorial on make - Author Karl Broman

I would argue that the most important tool for reproducible research is not Sweave or knitr but GNU make.

Consider, for example, all of the files associated with a manuscript. In the simplest case, I would have an R script for each figure plus a LaTeX file for the main text. And then a BibTeX file for the references.

Compiling the final PDF is a bit of work:

  • Run each R script through R to produce the relevant figure.
  • Run latex and then bibtex and then latex a couple of more times.

And the R scripts need to be run before latex is, and only if they’ve changed.

A simple example

GNU make makes this easy. In your directory for the manuscript, you create a text file called Makefile that looks something like the following (here using pdflatex).

mypaper.pdf: mypaper.bib mypaper.tex Figs/fig1.pdf Figs/fig2.pdf
    pdflatex mypaper
    bibtex mypaper
    pdflatex mypaper
    pdflatex mypaper

Figs/fig1.pdf: R/fig1.R
    cd R;R CMD BATCH fig1.R

Figs/fig2.pdf: R/fig2.R
    cd R;R CMD BATCH fig2.R

Each batch of lines indicates a file to be created (the target), the files it depends on (the prerequisites), and then a set of commands needed to construct the target from the dependent files. Note that the lines with the commands must start with a tab character (not spaces).

Another great feature: in the example above, you’d only build fig1.pdf when fig1.R changed. And note that the dependencies propagate. If you change fig1.R, then fig1.pdf will change, and so mypaper.pdf will be re-built.

One oddity: if you need to change directories to run a command, do the cd on the same line as the related command. The following would not work:

### this doesn't work ###
Figs/fig1.pdf: R/fig1.R
    cd R
    R CMD BATCH fig1.R
You can, however, use \ for a continuation line, line so:

### this works ###
Figs/fig1.pdf: R/fig1.R
    cd R;\
    R CMD BATCH fig1.R

Note that you still need to use the semicolon (;).

Using GNU make

You probably already have GNU make installed on your computer. Type make --version in a terminal/shell to see. (On Windows, go here to download make.)

To use make:

  • Go into the the directory for your project.
  • Create the Makefile file.
  • Every time you want to build the project, type make.
  • In the example above, if you want to build fig1.pdf without building mypaper.pdf, just type make fig1.pdf.

Frills

You can go a long way with just simple make files as above, specifying the target files, their dependencies, and the commands to create them. But there are a lot of frills you can add, to save some typing.

Here are some of the options that I use. (See the make documentation for further details.)

Variables

If you’ll be repeating the same piece of code multiple times, you might want to define a variable.

For example, you might want to run R with the flag --vanilla. You could then define a variable R_OPTS:

R_OPTS=--vanilla You refer to this variable as $(R_OPTS) (or ${R_OPTS}; either parentheses or curly braces is allowed), so in the R commands you would use something like

cd R;R CMD BATCH $(R_OPTS) fig1.R An advantage of this is that you just need to type out the options you want once; if you change your mind about the R options you want to use, you just have to change them in the one place.

For example, I actually like to use the following:

R_OPTS=--no-save --no-restore --no-init-file --no-site-file This is like --vanilla but without --no-environ (which I need because I use the .Renviron file to define R_LIBS, to say that I have R packages defined in an alternative directory).

Automatic variables

There are a bunch of automatic variables that you can use to save yourself a lot of typing. Here are the ones that I use most:

$@    the file name of the target
$<    the name of the first prerequisite (i.e., dependency)
$^    the names of all prerequisites (i.e., dependencies)
$(@D)    the directory part of the target
$(@F)    the file part of the target
$(<D)    the directory part of the first prerequisite (i.e., dependency)
$(<F)    the file part of the first prerequisite (i.e., dependency)

For example, in our simple example, we could simplify the lines

Figs/fig1.pdf: R/fig1.R
    cd R;R CMD BATCH fig1.R

We could instead write

Figs/fig1.pdf: R/fig1.R
    cd $(<D);R CMD BATCH $(<F)

The automatic variable $(<D) will take the value of the directory of the first prerequisite, R in this case. $(<F) will take value of the file part of the first prerequisite, fig1.R in this case.

Okay, that’s not really a simplification. There doesn’t seem to be much advantage to this, unless perhaps the directory were an obnoxiously long string and we wanted to avoid having to type it twice. The main advantage comes in the next section.

Pattern rules

If a number of files are to be built in the same way, you may want to use a pattern rule. The key idea is that you can use the symbol % as a wildcard, to be expanded to any string of text.

For example, our two figures are being built in basically the same way. We could simplify the example by including one set of lines covering both fig1.pdf and fig2.pdf:

Figs/%.pdf: R/%.R
    cd $(<D);R CMD BATCH $(<F)

This saves typing and makes the file easier to maintain and extend. If you want to add a third figure, you just add it as another dependency (i.e., prerequisite) for mypaper.pdf.

Our example, with the frills

Adding all of this together, here’s what our example Makefile will look like.

R_OPTS=--vanilla

mypaper.pdf: mypaper.bib mypaper.tex Figs/fig1.pdf Figs/fig2.pdf
    pdflatex mypaper
    bibtex mypaper
    pdflatex mypaper
    pdflatex mypaper

Figs/%.pdf: R/%.R
    cd $(<D);R CMD BATCH $(R_OPTS) $(<F)

The advantage of the added frills: less typing, and it’s easier to extend to include additional figures. The disadvantage: it’s harder for others who are less familiar with GNU Make to understand what it’s doing.

More complicated examples

There are complicated Makefiles all over the place. Poke around github and study them.

Here are some of my own examples:

  • Makefile for my AIL probabilities paper
  • Makefile for my phylo QTL paper
  • Makefile for my pre-CC probabilities paper
  • Makefile for a talk on interactive graphs.
  • Makefile for a talk on QTL mapping for function-valued traits.
  • Makefile for my R/qtlcharts package.

And here are some examples from Mike Bostock:

  • Makefile for us-rivers
  • Makefile for protovis
  • Makefile for topotree

Also look at the Makefile for Yihui Xie’s knitr package for R.

Also of interest is maker, a Makefile for R package development.

Resources

  • GNU make webpage
  • Official manual
  • O’Reilly Managing projects with GNU make book (part of the Open Books project)
  • Software carpentry’s make tutorial
  • Mike Bostock’s “Why Use Make”
  • GNU Make for reproducible data analysis by Zachary Jones
  • Makefiles for R/LaTeX projects by Rob Hyndman

R package primer

a minimal tutorial

A minimal tutorial on how to make an R package.

R packages are the best way to distribute R code and documentation, and, despite the impression that the official manual (Writing R Extensions) might give, they really are quite simple to create.

You should make an R package even for code that you don't plan to distribute. You'll find it is easier to keep track of your own personal R functions if they are in a package. And it's good to write documentation, even if it's just for your future self.

Hadley Wickham wrote a book about R packages (free online; also available in paper form from Amazon). You might just jump straight there.

Hilary Parker wrote a short and clear tutorial on writing R packages. If you want a crash course, you should start there. A lot of people have successfully built R packages from her instructions.

But there is value in having a diversity of resources, so I thought I'd go ahead and write my own minimal tutorial. The following list of topics looks forbidding, but each is short and straightforward (and hopefully clear). If you're put off by the list of topics, and you've not already abandoned me in favor of Hadley's book, then why aren't you reading Hilary's tutorial?

If anyone's still with me, the following pages cover the essentials of making an R package.

  • Why write an R package?
  • The minimal R package
  • Building and installing an R package
  • Making it a proper package
  • Writing documentation with Roxygen2
  • Software licenses
  • Checking an R package

The following are important but not essential.

  • Putting it on GitHub
  • Getting it on CRAN
  • Writing vignettes
  • Writing tests
  • Including datasets
  • Connecting to other packages

The following contains links to other resources:

  • Further resources

If anything here is confusing (or wrong!), or if I've missed important details, please submit an issue, or (even better) fork the GitHub repository for this website, make modifications, and submit a pull request.


The source for this tutorial is on github.

Also see my tutorials on git/github, GNU make, knitr, making a web site with GitHub Pages, data organization, and reproducible research.

by anonymous   2019-01-13

My sense is you have at least two potential options here. This first is what I think you need but I include both for completeness.

  1. Create your own package and extend the base package
  2. Create your own function that extends the base package function

nb: If you could provide the package and function you wish to extend that would be super helpful as I had to make this slightly generic. I have referenced the original StackOverflow posts that helped me in this situation. In terms of further/deeper reading my recommendation would be to read:

  • R Inferno By: Patrick Burns
    • Covers the nuances of R
    • Read Section 7 - Circle 7 Tripping on Object Orientation
  • R Packages By: Hadley Wickham
    • Chapter 8. Namespaces
    • Hadley does a great job of explaining R namespaces.

Solution Options:

Create Your own Package and extend base package

In this context, my sense is to direct you to take a look at section 1.5.6 of the Writing R Extensions manual.

  • https://cran.r-project.org/doc/manuals/R-exts.pdf

Why? Well, based on your description my sense would be to import the functions from the package, and then write your extension function.

You can do this by importing the classes and methods explicitly, with directives

importClassesFrom(package, ...)
importMethodsFrom(package, ...)

listing the classes and functions with methods respectively. Suppose we had two small packages A and B with B using A. Then they could have NAMESPACE files

export(f1, ng1)
exportMethods("[")
exportClasses(c1)

and

importFrom(A, ng1)
importClassesFrom(A, c1)
importMethodsFrom(A, f1)
export(f4, f5)
exportMethods(f6, "[")
exportClasses(c1, c2)

respectively.

Note that importMethodsFrom will also import any generics defined in the namespace on those methods. It is important if you export S4 methods that the corresponding generics are available. You may, for example, need to import plot from graphics to make visible a function to be converted into its implicit generic. But it is better practice to make use of the generics exported by stats4 as this enables multiple packages to unambiguously set methods on those generics.

Here is the StackOverflow question and answer that helped me previously:

  • Ref: In R, how can I extend generic methods from one package in another?

Create your own function that extends the base package function

See: Overwrite method to extend it, using the original implementation

I hope the above helps.