Databrary is a powerful tool for storing and sharing video data and
documentation with other researchers. With the databraryr
package, it becomes even more powerful. Rather than interact with
Databrary through a web browser, users can write their own code to
download participant data or even specific files.
I wrote databraryr
so that I could better understand how
the site works under the hood, and so that I could streamline my own
analysis and data sharing workflows.
Let’s get started.
Registering
Access to most of the material on Databrary requires prior registration and authorization from an institution. The authorization process requires formal agreement by an institution. But you’ll create an account ID (email) and secure password when you register. Then, when you log in with your new credentials, you’ll select an existing institution (if yours is on the list), a new institution (if yours isn’t), or an existing authorized investigator (if you are a student, postdoc, or collaborator) to request authorization from.
Installation
Development release
- Install the devtools package from CRAN:
install.packages("devtools")
if you have not already done so. - Load
devtools
into your local environment:library(devtools)
. - Install the databraryr package via
install_github("databrary/databraryr")
. Required dependencies will be installed at this time.
First steps (while you await authorization)
But even before formal authorization is complete, a user can access the public materials on Databrary. For this vignette, we’ll assume you fall into this category.
Once you’ve installed the package following one of the above routes, it’s a good idea to check that your installation worked by loading it into your local workspace.
Then, try this command to pull data about Databrary’s founders:
databraryr::list_people()
#> Downloading: 120 B Downloading: 120 B Downloading: 120 B Downloading: 120 B Downloading: 170 B Downloading: 170 B Downloading: 170 B Downloading: 170 B Downloading: 83 B Downloading: 83 B Downloading: 83 B Downloading: 83 B
#> id sortname prename affiliation
#> 1 5 Adolph Karen New York University
#> 2 6 Gilmore Rick O. The Pennsylvania State University
#> 3 7 Millman David New York University
#> url orcid
#> 1 http://www.psych.nyu.edu/adolph/ <NA>
#> 2 http://gilmore-lab.github.io 0000-0002-7676-3982
#> 3 <NA> <NA>
Note that this command returns a data frame (tibble) with columns
that include the first name (prename
), last name
(sortname
), affiliation, lab or personal website, and ORCID
ID if available.
Databrary assigns a unique integer for each person and institution on
the system called a ‘party id’. When we run
list_people(1:25)
we are asking the system to provide us
information about all of the people whose party id’s are
between 1 and 25. Let’s try it:
databraryr::list_people(people_list = 1:25)
#> Downloading: 75 B Downloading: 75 B Downloading: 75 B Downloading: 75 B Downloading: 72 B Downloading: 72 B Downloading: 72 B Downloading: 72 B Downloading: 72 B Downloading: 72 B Downloading: 72 B Downloading: 72 B Downloading: 120 B Downloading: 120 B Downloading: 120 B Downloading: 120 B Downloading: 170 B Downloading: 170 B Downloading: 170 B Downloading: 170 B Downloading: 83 B Downloading: 83 B Downloading: 83 B Downloading: 83 B Downloading: 94 B Downloading: 94 B Downloading: 94 B Downloading: 94 B Downloading: 72 B Downloading: 72 B Downloading: 72 B Downloading: 72 B Downloading: 130 B Downloading: 130 B Downloading: 130 B Downloading: 130 B Downloading: 81 B Downloading: 81 B Downloading: 81 B Downloading: 81 B Downloading: 57 B Downloading: 57 B Downloading: 57 B Downloading: 57 B Downloading: 42 B Downloading: 42 B Downloading: 42 B Downloading: 42 B Downloading: 44 B Downloading: 44 B Downloading: 44 B Downloading: 44 B Downloading: 100 B Downloading: 100 B Downloading: 100 B Downloading: 100 B Downloading: 3.7 kB Downloading: 3.7 kB Downloading: 3.7 kB Downloading: 3.7 kB Downloading: 3.7 kB Downloading: 3.7 kB Downloading: 3.7 kB Downloading: 3.7 kB Downloading: 3.7 kB Downloading: 3.7 kB Downloading: 3.7 kB Downloading: 3.7 kB Downloading: 66 B Downloading: 66 B Downloading: 66 B Downloading: 66 B Downloading: 3.7 kB Downloading: 3.7 kB Downloading: 3.7 kB Downloading: 3.7 kB
#> id sortname prename orcid
#> 1 1 Simon Dylan 0000-0002-2793-1679
#> 2 3 Steiger Lisa <NA>
#> 3 4 Byrne Andrea <NA>
#> 4 5 Adolph Karen <NA>
#> 5 6 Gilmore Rick O. 0000-0002-7676-3982
#> 6 7 Millman David <NA>
#> 7 11 Tamis-LeMonda Catherine <NA>
#> 8 13 Roy Lina Wictoren <NA>
#> 9 14 Franchak John <NA>
#> 10 16 Professor Suzanne Q. <NA>
#> 11 17 Jimenez-Robbins Carmen <NA>
#> 12 18 Coe Jon <NA>
#> 13 19 Foo Vicky <NA>
#> 14 20 Gordon Peter <NA>
#> 15 24 Chan Gladys <NA>
#> affiliation url
#> 1 <NA> <NA>
#> 2 Databrary <NA>
#> 3 Databrary <NA>
#> 4 New York University http://www.psych.nyu.edu/adolph/
#> 5 The Pennsylvania State University http://gilmore-lab.github.io
#> 6 New York University <NA>
#> 7 New York University <NA>
#> 8 NYU <NA>
#> 9 University of California, Riverside http://padlab.ucr.edu
#> 10 Databrary <NA>
#> 11 <NA> <NA>
#> 12 <NA> <NA>
#> 13 <NA> <NA>
#> 14 Teachers College, Columbia University <NA>
#> 15 NYU <NA>
It’s a bit slow, but you should see information about people beginning with Dylan Simon, the developer who designed and built most of the Databrary system, and ending with Gladys Chan, a graphic designer who created the Databrary and Datavyu logos and other graphic identity elements.
You can also try seeing what’s new on Databrary. The
get_db_stats()
command gives you information about the
newly authorized people, institutions, and newly uploaded datasets. Try
this:
databraryr::get_db_stats("stats")
#> # A tibble: 1 × 9
#> date investigators affiliates institutions datasets_total
#> <dttm> <int> <int> <int> <int>
#> 1 2024-04-26 19:22:37 1751 684 787 1684
#> # ℹ 4 more variables: datasets_shared <int>, n_files <int>, hours <dbl>,
#> # TB <dbl>
databraryr::get_db_stats("people")
#> # A tibble: 5 × 6
#> id sortname prename affiliation time url
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 12529 Khan Azizuddin Indian Institute of Technology Bombay 2024… NA
#> 2 12590 Hochmann Jean-Remy Centre National de la Recherche Scienti… 2024… http…
#> 3 12604 Martin Andrew University of Kent 2024… NA
#> 4 12488 Scerif Gaia University of Oxford 2024… NA
#> 5 12335 Governale Amy North Park University 2024… NA
databraryr::get_db_stats("institutions")
#> # A tibble: 2 × 5
#> id sortname url institution time
#> <int> <chr> <chr> <lgl> <chr>
#> 1 12624 Indian Institute of Technology Bombay https://www.iit… TRUE 2024…
#> 2 12607 North Park University https://www.nor… TRUE 2024…
databraryr::get_db_stats("datasets")
#> # A tibble: 7 × 9
#> id name body creation owners permission publicsharefull time doi
#> <int> <chr> <chr> <chr> <list> <int> <lgl> <chr> <chr>
#> 1 1730 CHEE… "* p… 2024-04… <named list> 1 FALSE 2024… NA
#> 2 1609 Beha… "The… 2023-06… <named list> 1 TRUE 2024… NA
#> 3 1609 Beha… "The… 2023-06… <named list> 1 TRUE 2024… NA
#> 4 1609 Beha… "The… 2023-06… <named list> 1 TRUE 2024… NA
#> 5 1729 Eye-… "Thi… 2024-04… <named list> 1 FALSE 2024… NA
#> 6 1728 DBN … "“Th… 2024-04… <named list> 1 FALSE 2024… NA
#> 7 1037 spre… "sss… 2019-12… <named list> 1 FALSE 2024… 10.1…
Depending on when you run this command and how often, there may or may not be new items.
Next steps
To see more about how to access data on Databrary using
databraryr
visit the accessing
data vignette.
To see how to log in and log out once you have authorization, see the vignette for authorized users.