About Databrary • databraryr

Databrary is a powerful tool for storing and sharing video data and documentation with other researchers. With the databraryr package, it becomes even more powerful. Rather than interact with Databrary through a web browser, users can write their own code to download participant data or even specific files.

I wrote databraryr so that I could better understand how the site works under the hood, and so that I could streamline my own analysis and data sharing workflows.

Let’s get started.

Registering

Access to most of the material on Databrary requires prior registration and authorization from an institution. The authorization process requires formal agreement by an institution. But you’ll create an account ID (email) and secure password when you register. Then, when you log in with your new credentials, you’ll select an existing institution (if yours is on the list), a new institution (if yours isn’t), or an existing authorized investigator (if you are a student, postdoc, or collaborator) to request authorization from.

Installation

Official CRAN release

Install the package from CRAN via install.packages(“databraryr”).

Development release

Install the devtools package from CRAN: install.packages("devtools") if you have not already done so.
Load devtools into your local environment: library(devtools).
Install the databraryr package via install_github("databrary/databraryr"). Required dependencies will be installed at this time.

v0.6.5

The latest version of the code is v0.6.5. The v0.6.x code uses the httr2 package under the hood, and it runs much faster than v0.5.x.

First steps (while you await authorization)

But even before formal authorization is complete, a user can access the public materials on Databrary. For this vignette, we’ll assume you fall into this category.

Once you’ve installed the package following one of the above routes, it’s a good idea to check that your installation worked by loading it into your local workspace.

library(databraryr)

Then, try this command to pull data about one of Databrary’s founders:

# The default parameter settings return a very detailed set of information about
# a party that we do not need for this example.
party_6 <- databraryr::get_party_by_id(parents_children_access = FALSE)

party_6 |>
  as.data.frame()
#>   id sortname prename               orcid                       affiliation
#> 1  6  Gilmore Rick O. 0000-0002-7676-3982 The Pennsylvania State University
#>                            url
#> 1 http://gilmore-lab.github.io

Note that this command returns a data frame with columns that include the first name (prename), last name (sortname), affiliation, lab or personal website, and ORCID ID if available.

Databrary assigns a unique integer for each person and institution on the system called a ‘party id’. We can create a simple helper function to collect information about a larger group of people.

# Helper function
get_party_as_df <- function(party_id) {
  this_party <- databraryr::get_party_by_id(party_id, 
                                            parents_children_access = FALSE)
  if (!is.null(this_party)) {
    as.data.frame(this_party)
  } else {
    NULL
  }
}

# Party's 5, 6, and 7 are Databrary's founders
purrr::map(5:7, get_party_as_df, .progress = TRUE) |>
  purrr::list_rbind()
#>   id sortname prename                       affiliation
#> 1  5   Adolph   Karen               New York University
#> 2  6  Gilmore Rick O. The Pennsylvania State University
#> 3  7  Millman   David               New York University
#>                                url               orcid
#> 1 http://www.psych.nyu.edu/adolph/                <NA>
#> 2     http://gilmore-lab.github.io 0000-0002-7676-3982
#> 3                             <NA>                <NA>

You should see information about Databrary’s three founders.

You can also try seeing what’s new on Databrary. The get_db_stats() command gives you information about the newly authorized people, institutions, and newly uploaded datasets. Try this:

databraryr::get_db_stats("stats")
#> # A tibble: 1 × 9
#>   date                investigators affiliates institutions datasets_total
#>   <dttm>                      <int>      <int>        <int>          <int>
#> 1 2024-09-18 12:26:19          1805        622          807           1754
#> # ℹ 4 more variables: datasets_shared <int>, n_files <int>, hours <dbl>,
#> #   TB <dbl>
databraryr::get_db_stats("people")
#> # A tibble: 4 × 5
#>      id sortname prename   affiliation                     time                 
#>   <int> <chr>    <chr>     <chr>                           <chr>                
#> 1 10687 Addyman  Caspar    Stellenbosch University         2024-09-16T15:32:55.…
#> 2 22783 Mielke   Alexander Queen Mary University of London 2024-09-16T14:13:57.…
#> 3 22864 Hojeij   Zeina     Zayed University                2024-09-13T12:14:27.…
#> 4 22658 Fusi     Stefano   Columbia University             2024-09-12T17:53:58.…
databraryr::get_db_stats("institutions")
#> # A tibble: 3 × 5
#>      id sortname                           url                 institution time 
#>   <int> <chr>                              <chr>               <lgl>       <chr>
#> 1 22916 Stellenbosch University            http://www.sun.ac.… TRUE        2024…
#> 2 22908 Zayed University                   https://www.zu.ac.… TRUE        2024…
#> 3 22880 University of Texas at San Antonio https://www.utsa.e… TRUE        2024…
databraryr::get_db_stats("datasets")
#> # A tibble: 9 × 8
#>      id name        body  creation owners       permission publicsharefull time 
#>   <int> <chr>       <chr> <chr>    <list>            <int> <lgl>           <chr>
#> 1  1800 DBN - Nort… "Thi… 2024-09… <named list>          1 FALSE           2024…
#> 2  1800 DBN - Nort… "Thi… 2024-09… <named list>          1 FALSE           2024…
#> 3  1800 DBN - Nort… "Thi… 2024-09… <named list>          1 FALSE           2024…
#> 4  1799 Test        "Aud… 2024-09… <named list>          1 FALSE           2024…
#> 5  1798 IDCxContex… "The… 2024-09… <named list>          1 FALSE           2024…
#> 6  1797 DBN - Isra… "Thi… 2024-09… <named list>          1 FALSE           2024…
#> 7  1797 DBN - Isra… "Thi… 2024-09… <named list>          1 FALSE           2024…
#> 8  1797 DBN - Isra… "Thi… 2024-09… <named list>          1 FALSE           2024…
#> 9  1796 Global BAB… "Hom… 2024-09… <named list>          1 FALSE           2024…

Depending on when you run this command and how often, there may or may not be new items.

Next steps

To see more about how to access data on Databrary using databraryr visit the accessing data vignette.

To see how to log in and log out once you have authorization, see the vignette for authorized users.