Workflow
This page describes the set of tools we use to generate these data and present them in this site.
GitHub
We use GitHub’s pages feature to serve the web site files.
At present, we are using a repository associated with the Databrary organization https://github.com/databrary/analytics. This results in the analytics site having the following url: https://databrary.github.io/analytics/.
The site is built locally by Rick Gilmore or Andrea Seisler, then pushed to GitHub.
RStudio
We use RStudio as the integrated development environment for the site. Most of the code is in Quarto and R, with some CSS and JavaScript.
We use a number of R packages in the workflow.
Databraryr
The databraryr package provides a set of tools for interacting with the Databrary API and gathering data from the site. This package may be useful to some analysts whether or not they care about Databrary-specific analytics.
Most data and metadata used in these reports can be accessed by the public, but specific data about individual participants requires that the user be authorized and logged in to the site using the databraryr::db_login() function.
The package may be installed via devtools::install_github(repo="https://github.com/NYU-Databrary/databraryr").
See https://databrary.github.io/databraryr/ for documentation about the package.
The package is under active development. The documentation may not be up-to-date.
Note that users wishing to script access to Databrary 2.0 must apply for permission to do so. See https://databrary.github.io/guide/more-information/api-access.html for details.
Quarto
To generate the site, we use Quarto.
A typical sequence of commands to regenerate the site is the following:
quarto render src
Configuration files using the YAML markup language control the rendering process. These files and the source R Markdown (.Rmd) files used to generate the site are in the src/ directory.
The rendering command creates a full website in the docs/ directory.
Package Reproducibility
We use the renv package to track package dependencies.
Strategy
Some of the data elements in the report change often, but others do not. We have found it is faster and more convenient in many cases to download various data files from Databrary and store copies as comma-separated value (CSV) text files in a private (local) directory.
Some of the CSVs contain potentially identifiable human subjects data, so we use a special .gitignore file to keep those files out of the git tracking scheme and prevent uploading them to GitHub.
These data analyses and visualizations have developed piecemeal over several years. They could undoubtedly be optimized and improved.
The primary developer (Rick Gilmore) has had a ‘git-er-done’ attitude toward the project.
Gilmore takes some solace in the following quotation from the father of literate programming, Donald Knuth:
…the real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.
Roadmap
- Devise visualizations of assets by investigator/institution.
- Create JSON lat, lon file for Databrary home page.
- Report by-user, by-institution summary data.
- Access and report on summary data about “private” volumes.
Clean-up
Log out of Databrary.