REDCapTidieR

Making analysts lives easier through the power of tidy tibbles

Richard Hanna

2022-08-18

Who am I?

  • CGT DataOps Data Scientist
  • 5 years at CHOP
  • Engineering background
  • Trying to implement quarto today1

R / Medicine 2022

https://events.linuxfoundation.org/r-medicine/

R / Medicine 2022 - Workshops

Discount Code: RMED22CHOP for 10% off!

R / Medicine 2022 - Speaker Highlights

  • Stephan Kadauke (CHOP CGT DataOps)
    • R/Medicine 101: Intro to R for Clinical Data

    • Should we Teach Data Science to Physicians-in-Training?

  • Joy Payton (CHOP Arcus Education)
    • Using Public Data and Maps for Powerful Data Visualizations
  • Lihai Song (CHOP Data Scientist)
    • Automation of statistics summary and analysis using R Shiny
  • Jaclyn Janis (RStudio/Posit, CHOP Representative)
    • It’s time for nurses to learn R

Agenda

In today’s talk we will:

  • Review what REDCap is 💡
  • Review REDCapR as an extraction tool for the API 🔌
  • Implement REDCapTidieR to make our lives easier 🧹

What you need:

  • Familiarity with R 💻
  • Familiarity with REDCap 🧢

What is REDCap?

  • Free1 database solution for research
  • Secure and accessible from a web browser
  • Can collect “any type of data in any environment”
  • Particularly useful for compliance with 21 CFR Part 11, HIPAA, etc.
  • Requires little to get up and running, but offers complexity as needed

What is REDCap?

Record Status Dashboard

Front-End Data Entry UI

REDCap functions as a large data table, but data distribution can be complex depending on architectural choices.

Repeating instances can create headaches on the backend.

The Super Heroes Dataset

Open source dataset from SuperHeroDB and available on Kaggle. It contains two tables:

  • Super Hero Information (i.e. demographic data)
  • Super Hero Powers (i.e. TRUE/FALSE for specific powers)

On the Shoulders of Giants

Some core REDCapR functions:

  • redcap_read_oneshot
  • redcap_metadata_read
  • redcap_event_instruments
    • New as of v1.1.0

Requirements:

  • Active REDCap project
  • A REDCap API URI1
  • API token2

SuperHeroes Output

# Load applicable libraries:
library(dplyr)
library(REDCapR)

superheroes_db <- redcap_read_oneshot(redcap_uri, token, verbose = FALSE)$data

superheroes_db %>% 
  glimpse()
Rows: 6,700
Columns: 16
$ record_id                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, …
$ redcap_repeat_instrument    <chr> NA, "super_hero_powers", "super_hero_power…
$ redcap_repeat_instance      <dbl> NA, 1, 2, 3, 4, 5, 6, 7, NA, 1, 2, 3, 4, 5…
$ name                        <chr> "A-Bomb", NA, NA, NA, NA, NA, NA, NA, "Abe…
$ gender                      <chr> "Male", NA, NA, NA, NA, NA, NA, NA, "Male"…
$ eye_color                   <chr> "yellow", NA, NA, NA, NA, NA, NA, NA, "blu…
$ race                        <chr> "Human", NA, NA, NA, NA, NA, NA, NA, "Icth…
$ hair_color                  <chr> "No Hair", NA, NA, NA, NA, NA, NA, NA, "No…
$ height                      <dbl> 203, NA, NA, NA, NA, NA, NA, NA, 191, NA, …
$ weight                      <dbl> 441, NA, NA, NA, NA, NA, NA, NA, 65, NA, N…
$ publisher                   <chr> "Marvel Comics", NA, NA, NA, NA, NA, NA, N…
$ skin_color                  <chr> "-", NA, NA, NA, NA, NA, NA, NA, "blue", N…
$ alignment                   <chr> "good", NA, NA, NA, NA, NA, NA, NA, "good"…
$ heroes_information_complete <dbl> 0, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, …
$ power                       <chr> NA, "Accelerated Healing", "Durability", "…
$ super_hero_powers_complete  <dbl> NA, 0, 0, 0, 0, 0, 0, 0, NA, 0, 0, 0, 0, 0…

Remember redcap_repeat_instrument and redcap_repeat_instance, they’re coming back!

SuperHeroes Output

# View first 10 entries of SuperHeroes db tibble
superheroes_db %>% 
  head(10)
record_id redcap_repeat_instrument redcap_repeat_instance name gender eye_color race hair_color height weight publisher skin_color alignment heroes_information_complete power super_hero_powers_complete
0 NA NA A-Bomb Male yellow Human No Hair 203 441 Marvel Comics - good 0 NA NA
0 super_hero_powers 1 NA NA NA NA NA NA NA NA NA NA NA Accelerated Healing 0
0 super_hero_powers 2 NA NA NA NA NA NA NA NA NA NA NA Durability 0
0 super_hero_powers 3 NA NA NA NA NA NA NA NA NA NA NA Longevity 0
0 super_hero_powers 4 NA NA NA NA NA NA NA NA NA NA NA Super Strength 0
0 super_hero_powers 5 NA NA NA NA NA NA NA NA NA NA NA Stamina 0
0 super_hero_powers 6 NA NA NA NA NA NA NA NA NA NA NA Camouflage 0
0 super_hero_powers 7 NA NA NA NA NA NA NA NA NA NA NA Self-Sustenance 0
1 NA NA Abe Sapien Male blue Icthyo Sapien No Hair 191 65 Dark Horse Comics blue good 0 NA NA
1 super_hero_powers 1 NA NA NA NA NA NA NA NA NA NA NA Agility 0

REDCap Repeating Instruments

Record Status Dashboard

Front-End Data Entry UI

SuperHeroes Repeating Output

superheroes_db %>% 
  filter(record_id == 0) %>% 
  select(record_id, contains("redcap_"), name, power)
record_id redcap_repeat_instrument redcap_repeat_instance name power
0 NA NA A-Bomb NA
0 super_hero_powers 1 NA Accelerated Healing
0 super_hero_powers 2 NA Durability
0 super_hero_powers 3 NA Longevity
0 super_hero_powers 4 NA Super Strength
0 super_hero_powers 5 NA Stamina
0 super_hero_powers 6 NA Camouflage
0 super_hero_powers 7 NA Self-Sustenance

record_id, redcap_repeat_instrument, and recap_repeat_instance form a compound key.

A compound key is the combination of 2+ columns necessary to identify a row uniquely in a table

The Problem

  • Empty data introduced as an artifact of repeating instruments
  • Data export is often times large and unwieldy
  • Missing metadata linking field association to instruments
  • Row identification is confusing and inconsistent

Introducing REDCapTidieR

At a glance:

  • Built on top of REDCapR
  • Takes two inputs: REDCap URI and REDCap API token
  • Returns a set of tidy tibbles
    • One for each REDCap instrument

Revisiting Superheroes

library(REDCapTidieR)
superheroes_tidy <- read_redcap_tidy(redcap_uri, token)

superheroes_tidy
# A tibble: 2 × 3
  redcap_form_name   redcap_data          structure   
  <chr>              <list>               <chr>       
1 super_hero_powers  <tibble [5,966 × 4]> repeating   
2 heroes_information <tibble [734 × 12]>  nonrepeating

Revisiting Superheroes

Non-Repeating Hero Information

superheroes_tidy$redcap_data[[2]] %>% 
  head(10)
record_id name gender eye_color race hair_color height weight publisher skin_color alignment form_status_complete
0 A-Bomb Male yellow Human No Hair 203 441 Marvel Comics - good Incomplete
1 Abe Sapien Male blue Icthyo Sapien No Hair 191 65 Dark Horse Comics blue good Incomplete
2 Abin Sur Male blue Ungaran No Hair 185 90 DC Comics red good Incomplete
3 Abomination Male green Human / Radiation No Hair 203 441 Marvel Comics - bad Incomplete
4 Abraxas Male blue Cosmic Entity Black -99 -99 Marvel Comics - bad Incomplete
5 Absorbing Man Male blue Human No Hair 193 122 Marvel Comics - bad Incomplete
6 Adam Monroe Male blue - Blond -99 -99 NBC - Heroes - good Incomplete
7 Adam Strange Male blue Human Blond 185 88 DC Comics - good Incomplete
8 Agent 13 Female blue - Blond 173 61 Marvel Comics - good Incomplete
9 Agent Bob Male brown Human Brown 178 81 Marvel Comics - good Incomplete

Repeating Hero Powers

superheroes_tidy$redcap_data[[1]] %>% 
  head(10)
record_id redcap_repeat_instance power form_status_complete
0 1 Accelerated Healing Incomplete
0 2 Durability Incomplete
0 3 Longevity Incomplete
0 4 Super Strength Incomplete
0 5 Stamina Incomplete
0 6 Camouflage Incomplete
0 7 Self-Sustenance Incomplete
1 1 Agility Incomplete
1 2 Accelerated Healing Incomplete
1 3 Cold Resistance Incomplete

Revisiting Superheroes

Non-Repeating Hero Information

superheroes_tidy$redcap_data[[2]] %>% 
  head(5)
record_id --- form_status_complete
0 ... Incomplete
1 ... Incomplete
2 ... Incomplete
3 ... Incomplete
4 ... Incomplete

Repeating Hero Powers

superheroes_tidy$redcap_data[[1]] %>% 
  head(5)
record_id --- form_status_complete
0 ... Incomplete
0 ... Incomplete
0 ... Incomplete
0 ... Incomplete
0 ... Incomplete

Change in *_form_status_complete to form_status_complete

The Default Output

library(REDCapTidieR)
superheroes_tidy <- read_redcap_tidy(redcap_uri, token)

superheroes_tidy
# A tibble: 2 × 3
  redcap_form_name   redcap_data          structure   
  <chr>              <list>               <chr>       
1 super_hero_powers  <tibble [5,966 × 4]> repeating   
2 heroes_information <tibble [734 × 12]>  nonrepeating

bind_tables Direct to Environment

The function:

# How `bind_tables()` is called:
bind_tables <- function(.data,
                        environment = global_env(),
                        redcap_form_name = NULL,
                        structure = NULL)
  
  # How it looks in practice:
  read_redcap_tidy(redcap_uri, token) %>% 
  bind_tables()

Clear out our envionrment:

rm(list = ls())
ls.str(envir = globalenv())

Empty output, no global environment objects

Reload the superheroes_tidy dataset, pipe to bind_tables, check environment:

superheroes_tidy %>%
  bind_tables()

ls.str(envir = globalenv())
heroes_information : tibble [734 × 12] (S3: tbl_df/tbl/data.frame)
super_hero_powers : tibble [5,966 × 4] (S3: tbl_df/tbl/data.frame)
superheroes_tidy : tibble [2 × 3] (S3: tbl_df/tbl/data.frame)

bind_tables Direct to Environment

Longitudinal REDCap Projects

Classic

Longitudinal,

one arm

Longitudinal,

multi-arm

Nonrepeated record_id record_id +
redcap_event
record_id +
redcap_event +
redcap_arm
Repeated record_id +
redcap_repeat_instance
record_id +
redcap_repeat_instance +
redcap_event
record_id +
redcap_repeat_instance +
redcap_event +
redcap_arm

REDCap Projects with Arms

redcap_long_arms_tidy <- read_redcap_tidy(redcap_uri, token)

redcap_long_arms_tidy
# A tibble: 3 × 3
  redcap_form_name redcap_data      structure   
  <chr>            <list>           <chr>       
1 repeated         <tibble [9 × 7]> repeating   
2 nonrepeated      <tibble [8 × 6]> nonrepeating
3 nonrepeated2     <tibble [3 × 6]> nonrepeating

REDCap Projects with Arms

redcap_long_arms_tidy$redcap_data[[1]]
record_id redcap_repeat_instance redcap_event redcap_arm repeat_1 repeat_2 form_status_complete
1 1 event_1 1 1 2 Incomplete
1 2 event_1 1 3 4 Incomplete
1 3 event_1 1 5 6 Incomplete
1 1 event_2 1 A B Incomplete
1 2 event_2 1 C D Incomplete
3 1 event_1 1 C D Incomplete
3 1 event_2 1 E F Incomplete
3 2 event_2 1 G H Incomplete
4 1 event_3 2 R1 R2 Incomplete
redcap_long_arms_tidy$redcap_data[[3]]
record_id redcap_event redcap_arm nonrepeat_3 nonrepeat_4 form_status_complete
1 event_1 1 3 4 Incomplete
2 event_1 1 5 6 Incomplete
3 event_1 1 NA NA Incomplete

Try it for Yourself!*

Install from public GitHub and view the pkgdown site

*REDCapTidieR is in early alpha, but hopefully for not for long!

Future Work

  • raw_or_label compatibility
  • extract_table and extract_tables functions
  • Release to CRAN

REDCap Metadata

metadata <- redcap_metadata_read(redcap_uri, token)$data

metadata %>%
  kbl(booktabs = T, escape = F, table.attr = "style='width:20%;'") %>%
    # options for HTML output
    kable_styling(bootstrap_options = c("striped", "hover", "bordered"), 
                  position = "center",
                  full_width = F,
                  font_size = 12,
                  fixed_thead = T) %>% 
  column_spec(1, bold = T) %>% 
  scroll_box(width = "100%", height = "300px")
field_name form_name section_header field_type field_label select_choices_or_calculations field_note text_validation_type_or_show_slider_number text_validation_min text_validation_max identifier branching_logic required_field custom_alignment question_number matrix_group_name matrix_ranking field_annotation
record_id nonrepeated NA text Record ID NA NA NA NA NA NA NA NA NA NA NA NA NA
nonrepeat_1 nonrepeated NA text Text Box Input NA NA NA NA NA NA NA NA NA NA NA NA NA
nonrepeat_2 nonrepeated NA text Text Box Input NA NA NA NA NA NA NA NA NA NA NA NA NA
nonrepeat_3 nonrepeated2 NA text Test data NA NA NA NA NA NA NA NA NA NA NA NA NA
nonrepeat_4 nonrepeated2 NA text Test data NA NA NA NA NA NA NA NA NA NA NA NA NA
repeat_1 repeated NA text Text Box Input: NA NA NA NA NA NA NA NA NA NA NA NA NA
repeat_2 repeated NA text Text Box Input: NA NA NA NA NA NA NA NA NA NA NA NA NA
text data_field_types NA text NA NA NA NA NA NA NA NA NA NA NA NA NA NA
note data_field_types NA notes NA NA NA NA NA NA NA NA NA NA NA NA NA NA
calculated data_field_types NA calc NA 1+1 NA NA NA NA NA NA NA NA NA NA NA NA
dropdown_single data_field_types NA dropdown NA choice_1, one | choice_2, two | choice_3, three NA NA NA NA NA NA NA NA NA NA NA NA
radio_single data_field_types NA radio NA choice_1, A | choice_2, B | choice_3, C NA NA NA NA NA NA NA NA NA NA NA NA
checkbox_multiple data_field_types NA checkbox NA 1, 1 | 2, 2 | 3, 3 | 4, 4 | 5, 5 | 6, 6 | 7, 7 | 8, 8 | 9, 9 | 10, 10 | -99, Unknown | -98, Not Given NA NA NA NA NA NA NA NA NA NA NA NA
checkbox_multiple_2 data_field_types NA checkbox NA Aa, Red | B1b, Green | Ccc2, Blue | 3dddd, Yellow | 4eeee5, Purple NA NA NA NA NA NA NA NA NA NA NA NA
yesno data_field_types NA yesno NA NA NA NA NA NA NA NA NA NA NA NA NA NA
truefalse data_field_types NA truefalse NA NA NA NA NA NA NA NA NA NA NA NA NA NA
signature data_field_types NA file NA NA NA signature NA NA NA NA NA NA NA NA NA NA
fileupload data_field_types NA file NA NA NA NA NA NA NA NA NA NA NA NA NA NA
slider data_field_types NA slider NA NA NA NA NA NA NA NA NA RH NA NA NA NA
radio_dtxt_error data_field_types NA radio This is a radio selection meant to be a descriptive text field. NA NA NA NA NA NA NA NA NA NA NA NA NA
descriptive data_field_types begin new section descriptive DESCRIPTIVE NA NA NA NA NA NA NA NA NA NA NA NA NA
text_dmy text_input_validation_types NA text Text DMY NA NA date_dmy NA NA NA NA NA NA NA NA NA NA
text_mdy text_input_validation_types NA text Text MDY NA NA date_mdy NA NA NA NA NA NA NA NA NA NA
text_ymd text_input_validation_types NA text Text YMD NA NA date_ymd NA NA NA NA NA NA NA NA NA NA
text_dmy_hm text_input_validation_types NA text Text DMY HM NA NA datetime_dmy NA NA NA NA NA NA NA NA NA NA
text_mdy_hm text_input_validation_types NA text Text MDY HM NA NA datetime_mdy NA NA NA NA NA NA NA NA NA NA
text_ymd_hm text_input_validation_types NA text Text YMD HM NA NA datetime_ymd NA NA NA NA NA NA NA NA NA NA
text_dmy_hms text_input_validation_types NA text Text DMY HMS NA NA datetime_seconds_dmy NA NA NA NA NA NA NA NA NA NA
text_mdy_hms text_input_validation_types NA text Text MDY HMS NA NA datetime_seconds_mdy NA NA NA NA NA NA NA NA NA NA
text_ymd_hms text_input_validation_types NA text Text YMD HMS NA NA datetime_seconds_ymd NA NA NA NA NA NA NA NA NA NA
text_mrn text_input_validation_types NA text Text MRN NA NA mrn_8d NA NA NA NA NA NA NA NA NA NA
text_phone text_input_validation_types NA text Text Phone NA NA phone NA NA NA NA NA NA NA NA NA NA
text_ssn text_input_validation_types NA text Text SSN NA NA ssn NA NA NA NA NA NA NA NA NA NA
text_hms text_input_validation_types NA text Text HMS NA NA time_hh_mm_ss NA NA NA NA NA NA NA NA NA NA
text_hm text_input_validation_types NA text Text HM NA NA time NA NA NA NA NA NA NA NA NA NA
text_ms text_input_validation_types NA text Text MS NA NA time_mm_ss NA NA NA NA NA NA NA NA NA NA
text_zip text_input_validation_types NA text Text ZIP NA NA zipcode NA NA NA NA NA NA NA NA NA NA
api_test_dtxt api_no_access NA descriptive This instrument to be used to test user API privileges and administer warnings when metadata and db outputs don't align. NA NA NA NA NA NA NA NA NA NA NA NA NA
api_text api_no_access NA text Test Field NA NA NA NA NA NA NA NA NA NA NA NA NA
api_test_dtxt_2 api_no_access_2 NA descriptive This instrument to be used to test user API privileges and administer warnings when metadata and db outputs don't align. NA NA NA NA NA NA NA NA NA NA NA NA NA
api_text_2 api_no_access_2 NA text Test Field NA NA NA NA NA NA NA NA NA NA NA NA NA
api_text_3 api_no_access_2 NA text Test Field NA NA NA NA NA NA NA NA NA NA NA NA NA
api_text_4 api_no_access_2 NA text Test Field NA NA NA NA NA NA NA NA NA NA NA NA NA
survey_yesno survey NA yesno Yes - No field: NA NA NA NA NA NA NA NA NA NA NA NA NA
survey_radio survey NA radio Radio field: 1, Choice 1 | 2, Choice 2 | 3, Choice 3 NA NA NA NA NA NA NA NA NA NA NA NA
survey_checkbox survey NA checkbox Checkbox Field: one, Choice 1 | two, Choice 2 | three, Choice 3 NA NA NA NA NA NA NA NA NA NA NA NA
repeatsurvey_yesno repeat_survey NA yesno Yes - No field: NA NA NA NA NA NA NA NA NA NA NA NA NA
repeatsurvey_radio_v2 repeat_survey NA radio Radio field: 1, Choice 1 | 2, Choice 2 | 3, Choice 3 NA NA NA NA NA NA NA NA NA NA NA NA
repeatsurvey_checkbox_v2 repeat_survey NA checkbox Checkbox Field: one, Choice 1 | two, Choice 2 | three, Choice 3 NA NA NA NA NA NA NA NA NA NA NA NA