If all the world were a monorepo

As a software engineer raised on a traditional diet of C, Java, and Lisp, I’ve found myself downright baffled by R. I’m no stranger to mastering new programming languages, but learning R was something else: it felt like studying Finnish after a lifetime of speaking Romance languages.

I’m not alone in this experience. There are piles of discussions online revealing the difficulty of using R, with some users becoming so enraged as to claim that R is “not actually a programming language”. My struggle with R continued even after developing my first package, grf. Once it was feature complete, it took me nearly 2 weeks to navigate the publication process to R’s main package manager, CRAN.

In the years since, my discomfort has given away to fascination. I’ve come to respect R’s bold choices, its clarity of focus, and the R community’s continued confidence to ‘do their own thing’. In what other ecosystem would a top package introduce itself using an eight-variable equation?

Learning R has expanded how I think as a software engineer, precisely because its perspective and community are so different to my own. This post explores one unique aspect of the R ecosystem, reverse dependency checks, and how it changed the way I approach software maintenance.

The Email

With many package managers like npm and PyPI, developers essentially publish and update packages ‘at will’. It’s largely the author’s responsibility to test the package before it’s released. Not so in the R ecosystem. CRAN, R’s central package manager, builds each package before publication, testing against a variety of R versions and operating systems. If one of your package’s unit tests fails on Windows Server 2022 plus a development version of R, you’ll receive an email from the CRAN team explaining why it can’t be published. Until 2021, CRAN even required packages to build against Sun Microsystems Solaris, an operating system for which it's hard to even track down a VM!

After an initial push to release my package grf on CRAN and several smooth version updates, it came time for version 2.0. I sent the package to CRAN for review and received a surprising email in response:

Dear maintainer, package grf_2.0.0.tar.gz has been auto-processed. The auto-check found problems when checking the first order strong reverse dependencies. Please reply-all and explain: Is this expected or do you need to fix anything in your package? If expected, have all maintainers of affected packages been informed well in advance? Are there false positives in our results?

What on earth does the CRAN team mean by “checking the first order strong reverse dependencies”? The package had passed all my tests against full the matrix of R versions and platforms. But… CRAN had also rerun the tests for all packages that depend on mine, even if they don’t belong to me! The email went on to explain the precise package and tests that were failing:

══ Failed tests ══════════════════════════════ ── Error (test_cran_smoke_test.R:10:3): a simple workflow works on CRAN ──────── Error: unused argument (orthog.boosting = FALSE) Backtrace: └─policytree::multi_causal_for est(X, Y, W) test_cran_smoke_test.R:10:2 2. └─base::lapply(...) 3. └─policytree:::FUN(X[[i]], ...)

... continue reading