Best practices

How do I prepare my dataset to be shared in conjuction with the publication of my article?

Inspiration: BK’s scaffolding DB volume. From KEA: “What to share and with who and how and what info to redact etc to keep the permission levels kosher.”

Volumes used in example:

Citation of dataset vs. associated article

  • Should the citation of the dataset match the article that it is associated with?
    • For example…
      • Tamis-LeMonda, C., Adolph, K. & Herzberg, O. (2020). Infant exuberant object play at home: Immense amounts of time-distributed, variable practice. Databrary. Retrieved March 21, 2023 from http://doi.org/10.17910/b7.1118.
      • Herzberg, O., Fletcher, K. K., Schatz, J. L., Adolph, K. E., & Tamis-LeMonda, C. S. (2021). Infant exuberant object play at home: Immense amounts of time-distributed, variable practice. Child Development, 00, 1– 15. https://doi.org/10.1111/cdev.13669
    • Thoughts… is this a question of who owns the data/deserves citation for the data?

Description of dataset

  • It is not helpful to just copy and paste the article abstract in the dataset overview. Instead, it should contain information regarding the context of data collection and the types of data uploaded (e.g. video, pictures, programming code).
    • “This dataset includes videos of real-life falls experienced by older adult residents of long-term care homes.” is more useful than “Researchers can study complex developmental phenomena with all the inherent noise and complexity or simplify behaviors to hone in on the essential aspects of a phenomenon.”

    • Useful description of data set:

      This qualitative study interviewed 22 adult residents of two Boston-area communities, to find out about their difficulties paying water utility bills, the coping strategies they rely on to manage unaffordable water bills, and associated health effects. The dataset includes:

      • 22 interview transcripts (RTF files)
      • Participant attribute table
      • Code book
    • Additional sentence to be added if entire volume is not shared: Requests for sharing will be assessed on a case-by-case basis. Please contact person@university.edu for all inquiries and requests regarding this dataset.

    • Abstract can be added as supplemental information but should not be the only thing in the description.

    • Add year(s) when data was collected

Organizing the data itself

  • Individual sessions folders for each participant’s video data
    • Do not share birth date or test date publicly but can share age, gender, race, ethnicity, language, group, tasks, and context (lab/home, state, country)
    • If pulling videos from a “parent” volume, should researchers re-download and re-upload the videos? Or just keep them in the parent volume and link out to that said volume?
  • Exemplar folder with videos that are public. If public permissoin is not explicitly given, blurred out faces of participants.
  • Coding spreadsheet/datavyu folder with csv files from all participants. The same level of permission granted from each participant should reflect on the associated spreadsheet.
  • Separate folder for coding manual. Sharing permission is public
  • Seperate folder or same folder for coding scripts? Sharing permission is public
  • Folder for analysis scripts. Sharing permission is public
  • Folder for processed data. Sharing permission is public
  • Add .readme file on how to use the data and how to use files.

Datavyu files

  • Delete unnecessary or extraneous columns
  • Delete unnecessary or extraneous codes
  • What should be deleted/included in the Datavyu files themselves? Date of birth is already on the Databrary volume. SHould it also be in the Datavyu file?
  • SPSS and datavyu files should NOT have birth date and test date
  • alternative: create an age column for the kid that is openly shared

Notes from lab meeting

  • internal name versus paper name
  • part of the process at the end
  • proof stage check list?
  • what if parent volume isn’t shared?
  • child volumes go to the parent and vice versa
  • “This project is a longitudinal study of language, cognitive, social, emotional, and gender development from infants’ birth through various critical transitions to schooling (preschool, kindergarten, 1st grade). Children came from diverse backgrounds (African-American, Dominican, Mexican, Chinese) and lived in the New York City metro area. Videos were recorded during various activities and tasks specific to each observation age (e.g., book reading, mother-child free play, novel toy, familiar toy at the 14-month visit) in children’s homes or in the laboratory at 14, 24, 36, 52, 64, and 76 months of age.”
  • LEGO: The project includes a corpus of 234 hours of video recording (across 2 days, 2 hours per visit) of infants and mothers going about their daily routines. We video coded foundational passes on infant object play, motor play, and language behaviours, thereby producing rich descriptive data on how infants play, how play changes across the course of a day and the second year, and which environmental and social factors support infant play. This project yields several valuable products that are disseminated to and shared with the scientific community, parents, and other stakeholders.

Other best practices ideas

  • when do I need a new volume or the same volume
  • when do I use materials folder ## How I do describe data?
  • ensure DATA description is complete
  • EG. what file types are included
  • do not just paste abstract
  • avoid acronyms or lab conventions
  • describe displays and task agnostic to research question
    • bad - complimentary gesture study
    • good - videos of child and parent playing with 3 toys from 3 camera angles