Creating content with Pandoc and RMarkdown

14 September 2014

I’m used to seeing complex scientific content produced in rather involved formats like XML or LaTeX, but recently I’ve been looking at tools that allow rich content to be authored in more user-friendly formats like Markdown. I first heard of RMarkdown via a presentation by a guy called Shaun Jackman that I saw on Twitter. Produced by the team at RStudio, RMarkdown brings together a bunch of underlying components which enable you to write content in markdown which includes snippets of R code, and then evaluate that code to insert results, tables, charts etc into HTML. PDF or other destination formats.

Pandoc

The foundation stone of all this is John McFarlane’s Pandoc, billed as a ‘universal document converter’ capable of coverting to and from a vast array of text-based file formats.

I installed that as per the instructions on the site and was immediately able to generate documents from markdown thus:

pandoc hello.md -o hello.html 
pandoc hello.md -o hello.docx 

In order to make PDFs you need to install a LaTeX engine. I didn’t want to install a big clunky TeX editor on my Mac, so I used BasicTex. I had to modify my $PATH to make it work.

Installing R and RMarkdown

Installing R itself is quite straightforward. Once you’ve got it running R has a package management system for installing libraries. I had a slight glitch there in that it seemed to need to open an Xwindow in order for me to manually choose a package mirror, and running OS/X 10.8 meant I needed to install Xwindowing software first.

I first needed to install the ‘devtools’ package, which for some reason turned out to involve downloading the package and installing it locally from a zip.

I could then install RMarkdown itself using the package management system as you’d expect:

devtools::install_github("rstudio/rmarkdown")

Ultimately I did all this because I wanted to be all command-liney and not just install the RStudio application instead.

Using RMarkdown

The starting point to authoring content in RMarkdown is to create a file with a .rmd extension. This will basically contain markdown with some yaml config at the top. It’s easiest just to illustrate this with an example of a .rmd file and the html and pdf that it turns into.

---
title: "RMarkdown - Blog example"
output:
  html_document:
    name: blog_example.html
  pdf_document:
    latex_engine: xelatex
---

Hello, I'm some **markdown**.

## Subtitle

List:

* Item
* Item

etc.

You then convert into the target format like thus, using the R commmand line environment:

rmarkdown::render("blog_example.rmd", "html_document")

rmarkdown::render("blog_example.rmd", "pdf_document")

So far this is just using Pandoc via R.

However if you include R code in your markdown then RMarkdown will evaulate it unless told not to. So this:

mydata = c(7,5,8,3,11,9,10)
mean(mydata)

Becomes this:

<pre class="r"><code>mydata = c(7,5,8,3,11,9,10)
mean(mydata)</code></pre>
<pre><code>## [1] 7.571</code></pre>

To do this RMarkdown is using an R component called Knitr. One of the most attractive things here is that you can take advatange of R’s excellent plotting capabilities.

data<-data.frame(Stat11=rnorm(100,mean=3,sd=2),
Stat21=rnorm(100,mean=4,sd=1),
Stat31=rnorm(100,mean=6,sd=0.5),
Stat41=rnorm(100,mean=10,sd=0.5),
Stat12=rnorm(100,mean=4,sd=2),
Stat22=rnorm(100,mean=4.5,sd=2),
Stat32=rnorm(100,mean=7,sd=0.5),
Stat42=rnorm(100,mean=8,sd=3),
Stat13=rnorm(100,mean=6,sd=0.5),
Stat23=rnorm(100,mean=5,sd=3),
Stat33=rnorm(100,mean=8,sd=0.2),
Stat43=rnorm(100,mean=4,sd=4))

boxplot(data, las = 2)

plot of chunk unnamed-chunk-2

RMarkdown and Jekyll

I’m toying with a few ideas for static site projects built in RMarkdown, but to be useful for a blog like this you really want to be able to write content in the .rmd format and have it processed by Jekyll. And a couple of people have shared ways of doing just that.