5 Demographic data
This page summarizes the demographic data currently reported by shared Databrary volumes.
Databrary analytics
Databrary maintains a website with information about the system’s users and data. The site can be found at:
https://databrary.github.io/analytics.
The data reported here are updated regularly, typically on a weekly or bi-weekly basis.
Data summarized here reflect the state of the Databrary system on 2023-08-31.
Release levels
By default, Databrary exposes certain variables to the public and hides others from public view. Who can see what files and variables is indicated by a set of labels and icons that apply system-wide called the “Databrary Release Levels”:
The above is taken from https://databrary.org/support/irb/release-levels.html.
By default, all files uploaded to the system are assigned a Private release level.
Databrary spreadsheet
Researchers who are authorized by their institutions to access Databrary may curate their own datasets and add a number of types of data and metadata. These data are entered into, and visualized via a “spreadsheet” type of interface:
The virtue of using this spreadsheet interface is that the current Databrary system permits a comma-separated value (CSV) format file of the spreadsheet to be easily exported. Indeed, the databraryr
package makes use of this feature, as does the analytics pipeline.
In the images below, the blue ‘checked’ variables are those the system presumes researchers will want to enter in the typical or default case. The grey ‘checked’ variables are generated internally by the Databrary system. For example, the ID
variable is a system-wide-unique value for a specific research participant’s data.
(Human) participant data
Most, but not all of Databrary’s current data depict humans. We have a small set of non-human animal data, and have plans to expand that in the future.
For now, when creating a new dataset/volume, the user is prompted to design a spreadsheet that is specific to the needs of their study. The following figure shows one of the “help” images associated with the process of designing a spreadsheet:
Most participant-level variables are shown to Public audiences (globe icon), with the exception of birthdate
and disability
.
Exact birthdates are shown only to Authorized Investigators or Affiliates with full Databrary access and from datasets (volumes) that have been fully shared.
Exact age
If a researcher provides a valid test date and a valid birthdate
, the system automatically calculates an age in days.
Here is information about age data from shared Databrary volumes:
Note: Databrary does not currently support entering age values directly.
Gender
Databrary does not currently use a standard vocabulary to collect information about participant gender
, nor does the system support information about related measures like natal sex.
Nevertheless, there appear to be \(n=142\) volumes that report gender
in some form. Here is a tabular summary of the values for the gender
variable across those volumes:
Race
Some \(n=127\) shared volumes report information about race. Here is a tabular summary of the values for the race
variable:
Note that the list contains some standard categories (OMB), but also a number of unique, non-standard, values.
Ethnicity
Note that the list contains some standard categories (OMB), but also a number of unique, non-standard, values.
Gestational age
The analytics report does not currently report data about the extent to which gestational_age
is reported.
Pregnancy term
\(N=21\) volumes report pregancy term.
A summary table of the reported results shows that these are not standardized:
Birth weight
Only \(n=3\) shared volumes report birth weight. Here is a histogram of shared birth weights converted to pounds and ounces:
The Play & Learning Across a Year project collects birth weight data among other measures, uses Databrary extensively, will eventually share the full dataset with the broader research community.
Disability
Some \(n=86\) volumes report on disability. Here is a table of the reported values:
Language
Some \(N=111\) volumes report participant_language. Here is a table of the reported values:
Country
Only \(n=2\) shared volumes report country. Here is a table summarizing that data:
State
The analytics report does not currently report data about the extent to which state
is reported.
However, the Play & Learning Across a Year (PLAY) Project has collected and will eventually report the U.S. States in which data was collected. This project is also exploring reporting more micro-level geographic data (e.g., county), and where participant (and IRB) permission exists, Census Block Group level data.
Other variables
The analytics report does not currently report data about the extent to which user defined variables like pilot status, exclusion criteria, task names, condition names, group names, context factors, or setting are reported in the Databrary spreadsheet. Most of these user-reported variables require additional metadata to be useful to others. Future work should focus on standardizing how that metadata is reported and incorporated into Databrary.