Extracting Data from REDCap's API with R

Introduction

The REDCap Project was started at Vanderbilt University in 2004 and has a robust following of research institutions and collaborative teams that rely on it for database creation, manipulation, and storage. One of the many key strengths to REDCap is that users don’t have to be computer science or database engineering experts to get started with database creation. The simple GUI system requires very minimal coding (if done at a shallow level a user can get away with zero coding whatsoever). This ease and flexibility create a very attainable bar of entry for teams of various backgrounds. REDCap is also free (a value that cannot be understated), secure, and IRB approved by many institutions. It is highly reputable within the research community and can be accessed online through a simple, secure web browser URL. It also excels in cross-collaborative data stream management.

For analysts and researchers who wish to extend REDCap’s functionality beyond that of the web database, connecting to the REDCap API is especially useful and the site itself provides a few built-in template calls through the “API Playground” tool. Supported languages include PHP, Perl, Python, Ruby, Java, R, and cURL. Nearly every one of the products I’ve made (reports, dashboards, applications via shiny) for pediRES-Q have been made possible thanks to interfacing with REDCap.

Although the templated version of the REDCap R call uses the curl package I actually prefer not to use this. Calling REDCap data is actually extremely simple and there much easier and more tailored ways of doing so in R and Python (shown at the end).

REDCapR & redcapAPI

I personally recommend using REDCapR for all of your REDCap extraction needs. There are two main functions that will get the job done: redcap_read and redcap_read_oneshot. For both you will need the following information from your specific REDCap project (accessible from the API tool):

  • URL
  • API Token (must be generated and kept secure)
  • Any specific fields or records of interest for the extraction

Then calling the data is as simple as:

df <- redcap_read_oneshot(redcap_uri = uri,
                                token =token,
                                fields = fields)$data

Obviously to use this code a token and URL will have to pre-defined. The fields input is where a user can specify a vector of variables of specific interest, if none is supplied then the command will attempt to retrieve all variables using the call. Another useful input is to set export_data_access_groups to TRUE. Often times when using REDCap there will be numerous data access groups that the project administrator will oversee. Setting this to TRUE will allow records to export with the data access group label they’re tied to. In addition to fields you can also specify specific records, forms, etc. I personally have found it useful to take advantage of raw_or_label to shift between variable and raw data outputs or much more aesthetically friendly and readable label outputs in end-products. It’s also very worthwhile to note the $data specifier at the end of the call. The call by itself will return a list object with multiple components, but $data is going to be what you work with 99% of the time.

If extraction time is a concern, REDCapR also offers redcap_read for which there is a multitude of additional inputs, namely batch_size and interbatch_delay. Instead of reading the entire database at once, batch sizes will specify chunks of records for R to process at a time. This can help with demand on the system, though to be honest I’ve rarely needed to use it. A sample call for this would be:

df <- redcap_read(redcap_uri = uri,
                                token =token,
                                batch_size = 10,
   interbatch_delay = 1,
   continue_on_error = TRUE)$data

If for some reason REDCapR doesn’t work or you prefer to try a different package option, redcapAPI is also similarly useful and I have found it particularly streamlined for retrieving project metadata. For a detailed list of the functionality, I recommend checking out the redcapAPI Wiki page on Github, but for the most part exporting data follow a format as shown below:

rcon <- redcapConnection(url=url, token=token)
df <- exportRecords(rcon, 
fields = fields, 
records = records)

Incorporating Python via reticulate

For the times when neither REDCapR or redcapAPI are able to successfully export an entire project, I’ve found that Python actually plays better with the REDCap API than R does. The good news is that you neither have to be an expert in Python nor reticulate to easily implement both in your data extraction.

The first thing you will want to do is create a separate Python file by simply saving a script in your RStudio session with a “.py” file extension. From here you can lean on Python to import the PyCap library in a manner like below:

from redcap import Project, RedcapError
TOKEN = TOKEN
URL = URL
project = Project(URL, TOKEN)

rc.df = project.export_records(forms = forms, 
format='df', export_data_access_groups= True)

From here, you can transfer back to your main R script and load up the reticulate library. Then execute the Python script using py_run_file and saving the output to a dataframe:

py_run_file("reticulate_pythong_file.py")

df <- py$rc.df

And that’s it! From here you can work on your data as normal. Of course if you are working in an RMarkdown you can always specify that a code chunk should be run as Python language instead of transferring data between R and Python as shown here. (To do so you simply specify Python instead of R in the {r} portion of the code chunk).