An R-based research notebook
I recently set up a fork of the Octopress blogging software to generate posts that contain explanatory text, LaTex math, and output from R code.
Motivation
For my research I’ve been preparing weekly reports to send out before meetings. For me at least, writing things out really cements concepts and usually helps to filter out a lot of the cruft of poorly thought out ideas. I started by emailing out pdfs I generated using a combination of knitr and multimarkdown.
The problem with emailing pdfs is that I needed to keep track of all the old reports, they weren’t indexed, and I couldn’t generate things like cross links between reports.
What I really wanted was a blog. I’ve been really liking Octopress as the platform for this blog.
I like the fact it creates static pages which I can then upload to my web account at my University and put them behind a password so only my supervisors can have access. It also keeps all old reports around, allows links among posts, and generates lists of posts for each category.
The project is set up as a fork of Octopress so if you’re interested in using it the project as well as the installation instructions are at https://github.com/gabysbrain/r-notebook.
Example post
So, what does the output look like? Here the plots are generated by R during execution and automatically linked into the post. The math is handled by MathJax.
Combining the pieces
I wanted to keep using knitr and MathJax as before. They work really well for my purposes.
I recently found out that version 0.7 of knitr supports executing languages other than R which is fantastic! I haven’t had a chance to try it out yet so I’m not sure how well it works. There’s a demo page for it here: http://yihui.name/knitr/demo/engines/
Adding MathJax to Octopress was just a simple matter of adding a link to the MathJax javascript file to the page template in Octopress.
The major additions are written as 2 plugins: multimarkdown.rb
, which adds multimarkdown support to Octopress, and knitr.rb
, which runs all the blog posts through knitr to execute the R code and generate the plots and such before the final mmd to html conversion.
mmd plugin
The original version is here. The only real change I made was that the extension is now multimarkdown
. I found that because octopress/jekyll’s extension mapping will match partial extensions mmd
was being detected as a different file type than multimarkdown.
# multimarkdown renderer for jekyll
#
# adapted from: http://git.io/9-RWUg
module Jekyll
'multimarkdown'
require
class MultimarkdownConverter < Converter
false
safe :low
priority
def matches(ext)
/multimarkdown/i
ext =~ end
def output_ext(ext)
".html"
end
def convert(content)
#puts MultiMarkdown.new(knit(content)).to_html
MultiMarkdown.new(content).to_html
end
end
end
knitr
The knitr plugin consists of 2 files knitr.rb
which is just a wrapper for knit_markdown.R
which does most of the work.
knitr.rb
Here’s the code for knitr.rb
. It uses tempfiles instead of just sending the text directly to knitr so that we can index the cache by blog post name. That way there’s a unique cache directory for each blog post and identical cache section names in different blog posts won’t clobber each other.
'tempfile'
require
module Jekyll
'post_filters'
require_relative
# A filter to pass mmd files through knitr
class KnitrPost < PostFilter
KNITR_PATH = File.join(File.dirname(__FILE__), "knit_markdown.R")
unless File.exists?(KNITR_PATH) and File.executable?(KNITR_PATH)
"knit_markdown.R is not found and executable"
throw end
def pre_render(post)
if post.is_post?
if post.ext == '.multimarkdown'
0..-post.ext.length-1]
postname = post.name[
post.content = knit(postname, post.content)end
end
end
# runs everything through knitr
def knit(name, content)
#knit_content, status = Open3.capture2(KNITR_PATH, name,
#:stdin_data=>content)
# set up the tempfiles to do the translation
Tempfile.new('srcfile')
src_file =
src_file.write(content)
src_file.closeTempfile.new('dstfile')
dst_file =
dst_file.close
# execute!
`#{KNITR_PATH} #{name} #{src_file.path} #{dst_file.path}`
# read back in the processed file
dst_file.open
knit_content = dst_file.read
dst_file.close
# remove the files
src_file.unlink
dst_file.unlink
# This is a hack to get the double backslashes in latex math
# working with liquid templates
/\\\\$/){"\\\\\\\\"}
knit_content.gsub(end
end
end
knit_markdown.R
This is the script that does most of the heavy lifting. Extensions to knitr’s processing is handled through various “hooks.” These are described in the knitr manual.
Lines 9-15 set up the cache and image directories that knitr will use. Lines 28-66 is an extension to support movies of multiple R plots. In order to get Octopress to highlight R code we need to wrap it in liquid codeblock
tags. The hook for that is done by lines 69-71. The rest just sets all the hooks I want to use and renders the files using knitr.
#!/usr/bin/Rscript
library(knitr)
commandArgs(trailingOnly=TRUE)
args <-
# the file name generating this R code
# needed so we can put separate cache and image links
args[1]
post.name <- if(is.na(post.name)) "" else post.name
store.prefix <- paste('cache', store.prefix, "", sep='/')
cache.path <- paste('source/images/knitr', store.prefix, "", sep='/')
image.save.path <- paste('/images/knitr', store.prefix, "", sep='/')
image.load.path <-$set(cache.path=cache.path)
opts_chunk$set(fig.path=image.save.path)
opts_chunk
# also get the input and output files
if(is.na(args[2])) file("stdin") else args[2]
in.file <- if(is.na(args[3])) stdout() else args[3]
out.file <-
function() {
pic.sample <-sample(1000,1)
}
# hook to force marked to reload output images
# uses a random query element on the image
# also supports creating animations
function(x, options) {
query_plot_hook <-# pull out all the relevant plot options
options$fig.show == 'animate'
animate <- options$fig.num
fig.num <- options$fig.cur
fig.cur <-if(is.null(fig.cur)) fig.cur <- 0
# Don't print out intermediate plots if we're animating
if(animate && fig.cur < fig.num) return('')
opts_knit$get('base.url')
base <-if (is.null(base)) base <- ''
# adjust the base for the base path
paste(image.load.path, basename(paste(x,collapse='.')), sep='')
filename <-if(options$fig.show == 'animate') {
# set up the ffmpeg run
options$aniopts
ffmpeg.opts <- paste(sub(paste(fig.num, '$',sep=''), '', x[1]), "%d.png", sep="")
fig.fname <- paste(sub(paste(fig.num, '$',sep=''), '', x[1]), ".mp4", sep="")
mov.fname <- paste(image.load.path, basename(mov.fname), sep='')
mov.linkname <-if(is.na(ffmpeg.opts)) ffmpeg.opts <- NULL
paste("ffmpeg", "-y", "-r", 1/options$interval,
ffmpeg.cmd <-"-i", fig.fname, mov.fname)
system(ffmpeg.cmd, ignore.stdout=TRUE)
# figure out the options for the movie itself
strsplit(options$aniopts, ';')[[1]]
mov.opts <- paste(
opt.str <-" ",
if(!is.null(options$out.width)) sprintf('width=%s', options$out.width),
if(!is.null(options$out.height)) sprintf('height=%s', options$out.height),
if('controls' %in% mov.opts) 'controls="controls"',
if('loop' %in% mov.opts) 'loop="loop"')
sprintf('<video %s><source src="%s?%d" type="video/mp4" />video of chunk %s</video>', opt.str, mov.linkname, pic.sample(), options$label)
else {
} sprintf(' ',
$label, base, filename, pic.sample())
options
}
}
# highlight R code on output
function(x, options) {
code_hook <-print(options)
sprintf("\n\n{{ "{%%" }} codeblock %s lang:r %%}", options$label)
prefix <- "{{ "{%" }} endcodeblock %}\n\n"
suffix <- paste(prefix, x, suffix, sep="\n")
}
# hack render_markdown so it doesn't override my custom hook
render_custom <- function() {
render_markdown(strict=TRUE)
knit_hooks$set(plot=query_plot_hook,
source=code_hook)
}
# need to read everything through stdin and stdout
pat_html()
render_custom()
opts_knit$set(progress=FALSE)
#opts_knit$set(dev='png')
opts_knit$set(out.format='custom')
opts_knit$set(input.dir=getwd())
knit(in.file, out.file)
Conclusion
And that’s about it. The rest of the changes are in the repository of course.
Feel free to fork the repository for your own work and let me know what you think!