RMarkdown Workshop

RMarkdown Workshop

On June 3rd, 2019, I delivered a lecture open to the CHOP and University of Pennsylvania communities. The audience comprised of approximately 30 individuals of various backgrounds ranging from clinicians to statisticians to engineers to scientists. The subject matter was to provide an approachable, digestible, interactive introduction to R, RStudio, and specifically focus on RMarkdown.

Me Delivering the Presentation

I have made my thoughts on spreadsheets well known on this site, provided examples of how to mitigate using them, and even delivered these ideas on other platforms so I won’t go terribly in depth again here. Suffice it to say, between my own experience in optimization of reporting workflows and seeing the different ways teams can fall victim to inefficiency, I was honored to be given the opportunity to both speak on the topic as well as engage with the community so that they might benefit in their own workspaces. As someone who only just happened to discover what R even was about a year and a half ago, standing at a podium speaking to MDs, PhDs, and many other peers I look up to was gratifying, humbling, and incredibly rewarding.

To prepare for my lecture, I developed a R Project that I could easily distribute for participants to follow along and work with and act as reference material they could take back with them to their workspaces. It is made publicly available here at the GitHub link. My goals over the course of the hour were to:

  • Briefly introduce new users to R, the RStudio IDE, and R Projects
  • Introduce key concepts and foundations of markdown language
  • Work through an example RMarkdown document and introduce the concept of parameterization
  • Provide a framework by which users could model their own reporting systems after
  • Use data that was applicable, interesting, and approachable to clinicians and statisticians through use of heart disease data with applications in machine learning

Regarding the last bullet point, in my workshop template I provide a dataset made freely available on Kaggle.com investigating attributes in patients that either did or did not have heart disease. This dataset contains content that members of my community would find interest in, but also provides opportunity for those savvier in statistics to apply machine learning methods like logistic regression to run sample analyses. And for those not statistically savvy, there was still much to be gained learning about parameters in R and how to optimize graphic and statistical reporting.

Example Graph from Presentation

I highly encourage you to take the repo for a spin, and feel free to follow along with the audio/video lecture as well. The GitHub Repo and video are located at the top left of this page. If you have any comments or questions regarding anything involved in this workshop, please feel free contact me using the contact link on the main page!

Richard Hanna
Biomedical Engineer and Data Scientist