Extending the loo Package
I’m excited to be accepted into the GSOC 2025 cohort, working on the loo
R package! I’ll be mentored by Aki Vehtari, Jonah Gabry, and Noa Kallioinen. Apparently only 1272 applicants were accepted from 15,240 applicants (23,559 proposals), which is an acceptance rate of ~8.35% (~5.4%). This was the only application I submitted.
The loo
package is widely used to cross-validate Bayesian models and has over three million downloads and several thousand citations to date. The focus of this project is to update the package to extend the API to support a wider array of predictive measures.
Gory, technical details, extracted from our project proposal, follow.
Expect more posts as the project starts proper.
PS: Getting a working bibliography and citations was annoying because the projects I found were old and didn’t work. I vibe coded (true vibe coding, pure verification no latent reasoning) my way from one of those to these shortcodes. Specify a bibliography (stored at data/$NAME.json
) by adding bib: $NAME
to your post’s YAML.
Technical Details
This project is focused on an overhaul of the existing loo
API to admit
new CV methods, metrics, and scores. There are a number of pertinent issues,
viz. #281,
#223, #220
#213, #201,
#135, and #106.
First, a brief overview of loo
and the goals and procedures of the loo
package.
Cross-validation (CV) is a common means to estimate a model’s predictive accuracy, e.g. for model selection or stacking (Vehtari et al., 2017). Leave-one-out CV (LOO) is a CV structure where the model is fit to the data with one data point left out, for every point in the data. This process is computationally expensive, though, so approximate LOO can be used, and can be computed simply through importance sampling (IS) (Vehtari et al., 2017). However, the importance weights used for IS can can have very large or infinite variance (Vehtari et al., 2017), and so one can use Pareto smoothed importance sampling (PSIS) (Vehtari et al., 2024) to enjoy more stable LOO estimates and additionally have a simple diagnostic to determine if PSIS estimate is likely to have a small error.
The loo
package is widely used to cross-validate Bayesian models and has
over three million downloads and several thousand citations to date. The focus
of this project is to update the package to extend the API to support
a wider array of predictive measures.
Currently, loo
returns an object, that we will extend to work with
more measures; adding metrics such as MAE, RMSE, MSE, R2,
ACC, balanced ACC, and Brie score, and scores such as RPS, SRPS, CRPS,
SCRPS, and log score. We will need to create a flexible object which
can report multiple metrics and scores. Additionally, we need to create
functions to support model comparisons for all these scores and metrics.
For all measures, we will return a loo
object with a pointwise measures
and estimates; the former allows us to quantify the uncertainty in model comparisons.
We will also allow for measures besides log score for LOO-CV to be used
under the same, unified interface as existing options and create a
consistent loo
object for all measures. We will further unify the
interface by allowing for non-log score measures for in-sample,
test data, and K-fold-CV use cases. Measures will be stratified into
scores and metrics since scores need draws from the predictive
distribution but metrics need draws from a point estimate or the
posterior. The interface will be cleanly split for these two broad cases.
Additionally, we will spin out PSIS functions to differentiate when PSIS
is being used as opposed to measures using in-sample, independent test data,
or K-fold-CV. We will also be extending the model comparison functions
to carry forward information on what measure is being compared, diagnostic
data, and information on how to calculate the standard error (SE) of the
differences of various measures.
Bibliography
Vehtari, Aki et al.(2017).Practical bayesian model evaluation using leave-one-out cross-validation and WAIC.Statistics and Computing, 27, 1413–1432.doi: 10.1007/s11222-016-9696-4.
Vehtari, Aki et al.(2024).Pareto smoothed importance sampling.Journal of Machine Learning Research, 25(72), 1–58.http://jmlr.org/papers/v25/19-556.html.