How to use this document?

R commands will be presented in a gray box:

print('Hello world!')
## [1] "Hello world!"

White box following a gray box represents the R’s output. Blue box contains exercises to see if you have well understood notions:

EXERCISE

This box will contain some exercises that you must do to be sure that you have well understand all notions.

Solutions to exercises will be given in a green box (they will be only available after the session).

SOLUTION

This box will contain answers to the previous exercises.

Purple box contains some additional information (for advanced users), so it is not necessary to read or understand them in the first reading. You will probably understand them after you get some experiences with R.

INFO

This box will contain additional informations that are not necessary to understand in your first reading.

Orange box contains warning to care about in order to well use R and have good practices.

WARNING

This box will contain warnings or good practices, that you must take about.

1 Introduction

Some programmatic languages is intended to generate textual documents, like HTML that is intended to display web pages or LaTeX that is intended to generate printable documents.

Here we will introduce the Markdown language that is a light-weighted markup language and easy readable suitable to generate documents without knowing a lot of syntaxes. This language can be coupled with R to form the Rmarkdown syntax (Allaire et al. 2022).

INFO

This document is just an overview of the possibility that offer Rmarkdown and will only present the main functionalities and options. For a full documention you can refer to the R Markdown: The Definitive Guide.

2 Overview of an Rmarkdown document

A Rmarkdown document is a text document containing a sequence between markdown and R blocks of code, with an optional header block at the top level of the document in YAML 2.1). The file extension is “.Rmd” and it is a regular text file. You can create and edit them with any text editor. Nevertheless, RStudio is the recommended ones as it provides a lot of options to deal with such kind of file.

Overview of a R markdown text document. The next block code after the HEADER block can also be a R block.

Figure 2.1: Overview of a R markdown text document. The next block code after the HEADER block can also be a R block.

INFO

It exists also the Sweave documents (“.Rnw”) that combine R and LaTeX, instead of Markdown syntaxe. Its principle is identical to Rmarkdown but with a different syntax concerning the code generating the document.

3 Header section

The header section is optional (but greatly recommended) and indicates general options of the document when it will be compiled as the output format, the title/author/date of the document, the link towards a bibliographic file, inserting of a table of contents, etc. The header is in YAML format and look like:

+++
title: Le titre du document
author: Matthieu Jung
date: January 23, 2022
output: pdf_document
params:
  country: France 
---

We see here, that the header section must be declared between +++ and --- lines and could contains the following keys:

Key Description
title Indicates the title of the document.
author Indicates the name of the author(s).
date Indicates the date of the document. You could also used R code to generate it dynamically, for instance: `r Sys.Date()` to get today’s date.
output Indicates the desired output format. For instance: pdf_document to generate a PDF document or html_document to generate an HTML document. Other formats are also available like Word, PowerPoint, etc.
params Indicates a list of key/value pairs that will be available in the params R list. So, to access to these values you can use the params$country syntax. This is useful if you want to generalize your report, for instance you can have a parameter indicating the path of the input files, the options for function, etc.

WARNING

As in YAML : has a special meaning, if your title (or other value) include this character wrap it into two " characters:

title: "Report: second phase"

INFO

Remember that the header section is optional. You can compile a document without setting any header option. In this case all default values will be applied.

3.1 The output key

The output key is special in that it is possible to pass more than juste one value, ie defining many type of the output format for the same document and each of them with its own set of key/value pairs options. For instance:

+++
title: Le titre du document
author: Matthieu Jung
date: January 23, 2022
output: 
  pdf_document:
    toc: yes
    toc_depth: 3
    number_sections: yes
  html_document:
    toc: no
    number_sections: yes
---

In this example a table of contents with the three first levels of headings will be displayed if the document will be compiled in PDF, but not in HTML (the default value is FALSE so it is not necessary to specify this option, but it is just for the example…) and we want that the headings are automatically numbered in both format.

To see the complete list of available options and their definition you can do help(pdf_document) or ?html_document.

WARNING

Each type of output format has its own set of options with its own default values. For instance, the toc_depth option which indicates the depth of header to include in the table of contents has its default value to 3 for html_document and 2 for pdf_document.

3.2 Bibliography

Including a bibliography needs to specify a supplementary file that will contain the bibliography references formatted in a certain manner. Many tools like Zotero, EndNote or websites (PubMed, journal website, HAL, etc.) give you the possibility to export references formatted in the right way.

You have many possibility to format your bibliography file:

Format File extension
BibLaTeX .bib
BibTeX .bibtex
CSL JSON .json
CSL YAML .yaml
RIS .ris

INFO

See https://pandoc.org/MANUAL.html#citations for more information about the different available formats.

Here an example at the BibTex format:

@Manual{rmarkdown,
    title = {rmarkdown: Dynamic Documents for R},
    author = {JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone},
    year = {2022},
    note = {R package version 2.12},
    url = {https://github.com/rstudio/rmarkdown},
}

If your bibliography file bibliography.bib is in the same directory you can add the bibliography key in the header with the name of the file. Otherwise, you should indicate the absolute path were your file is or its relative path from where the document will be compiled. In your document you must also specify a section call References to indicate where you want to put the bibliography:

+++
title: Le titre du document
author: Matthieu Jung
date: January 23, 2022
bibliography: bibliography.bib
---
  
# References

To cite an entry, use @key or [@key] (the latter puts the citation in braces), eg @rmarkdown is rendered as Allaire et al. (2022), and [@rmarkdown] generates (Allaire et al. 2022).

WARNING

By default, only references that are cited in the document will appear in your document. So you can have a bibliographic reference file with all of your favorite papers and cite only those that you need for a given document.

If you want to cite all referencies included in your bibliographic file file, you can use the option nocite: '@*' in header.

INFO

In R you have the citation() function that gives you the list of citations to reference the package you used in your publications. Moreover, you can also get the reference in BibTex format with the toBibtex() function. For instance with the rmarkdown package:

citation('rmarkdown')
## 
## To cite the 'rmarkdown' package in publications, please use:
## 
##   JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi
##   and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and
##   Winston Chang and Richard Iannone (2022). rmarkdown: Dynamic
##   Documents for R. R package version 2.12. URL
##   https://rmarkdown.rstudio.com.
## 
##   Yihui Xie and J.J. Allaire and Garrett Grolemund (2018). R Markdown:
##   The Definitive Guide. Chapman and Hall/CRC. ISBN 9781138359338. URL
##   https://bookdown.org/yihui/rmarkdown.
## 
##   Yihui Xie and Christophe Dervieux and Emily Riederer (2020). R
##   Markdown Cookbook. Chapman and Hall/CRC. ISBN 9780367563837. URL
##   https://bookdown.org/yihui/rmarkdown-cookbook.
## 
## To see these entries in BibTeX format, use 'print(<citation>,
## bibtex=TRUE)', 'toBibtex(.)', or set
## 'options(citation.bibtex.max=999)'.
toBibtex(citation('rmarkdown'))
## @Manual{,
##   title = {rmarkdown: Dynamic Documents for R},
##   author = {JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone},
##   year = {2022},
##   note = {R package version 2.12},
##   url = {https://github.com/rstudio/rmarkdown},
## }
## 
## @Book{,
##   title = {R Markdown: The Definitive Guide},
##   author = {Yihui Xie and J.J. Allaire and Garrett Grolemund},
##   publisher = {Chapman and Hall/CRC},
##   address = {Boca Raton, Florida},
##   year = {2018},
##   note = {ISBN 9781138359338},
##   url = {https://bookdown.org/yihui/rmarkdown},
## }
## 
## @Book{,
##   title = {R Markdown Cookbook},
##   author = {Yihui Xie and Christophe Dervieux and Emily Riederer},
##   publisher = {Chapman and Hall/CRC},
##   address = {Boca Raton, Florida},
##   year = {2020},
##   note = {ISBN 9780367563837},
##   url = {https://bookdown.org/yihui/rmarkdown-cookbook},
## }

It is also possible to indicate many bibliographic files to the bibliography key, eg to separate citations by domain (bioinformatics, biology, mathematics, my own papers, etc.). To do that, just put the list of your files into brackets [ ]:

+++
bibliography: [jung.bib, biology.bib, bioinfo.bib]
---

4 Markdown basics

The text in a Rmarkdown document is written with the Markdown syntax. There are many flavors of Markdown invented by different people, but rmarkdown uses the Pandoc’s Markdown. For a full description of the available you can have a look at https://pandoc.org/MANUAL.html#pandocs-markdown.

We present here a summary of the frequently used commands.

4.1 Headings

Headings is preceded by the # character, how many # you use indicates the level of the heading from 1 to 6.

# H1

## H2

### H3

#### H4

##### H5

###### H6

INFO

If you have activated the number_sections option in the header, then all sections will be automatically prefixed by section numbers (like this document). To prevent that for a given heading, you can add {-} at the end of the header:

# H1 {-}

INFO

Alternatively, for H1 and H2 levels, you can use the underline style:

Alt-H1
======

Alt-H2
------

4.2 Line breaks & paragraphs

Markdown uses two line breaks to indicate a new paragraph. A line that ends with two spaces indicates a new line inside a paragraph.

This is a paragraph.

This is another on.  
And a new line inside a paragraph.

4.3 Horizontal rules

Horizontal rules are inserted when you use only three (or more) underscores _, asterisks * or hyphens - in a paragraph.

This paragraph is display before the first brek line.

___

This one between two.

***
  
This one too.

---

And this one is the last one.

4.4 Emphasis

Text between one asterisks * is displayed in italic, two ** in bold, and *** in bold and italic. Text between two tildes ~~ is strike-throught, between one tilde ~ is subscripted and between one circumflex ^ is upperscripted.

This paragraph contains text in *italic*, **bold** and ***bold and italic***.
Moreover both ~ indicates some ~~strike-through texts~~.  
You can also indicates molecule H~2~0 or exponential number 10^2^.

4.6 Lists

Unordered lists are introduced with -, ordered lists with a number 1. (the final dot is optional, is like you prefere) and checked list with - [ ].

Ordered list:
  
1. First item
2. Second item
  - First sub-item
3. Third item
  1.1. First sub-item

Unordered list:
  
- First item
- Second item
- Third item:
  1. Sub-first item
  2. Sub-second item
  
Checked list:
  
- [ ] First item
- [x] Second item
- [ ] Third item

4.7 Code and blockquote

In-line code are wrapped between two backquotes `, and paragraph code between two lines composed of only three backquotes ```. Each line of a blockquote must be prexifexd by > character.

I insert commands inside a pragraphe `if then else`.

```
if (r>0) {
  print('R is positive')
} else {
  print('r is negative or null')
}
```

> This is a blockquote.
> On two lines.

4.8 Tables

When you create a table, columns must be separated by a pipe character | and header must be separated of the body by a line of characters -. You can use the : character to indicate if the content must be left, center or right align. Spaces between separator are optional and only for human reading. For instance:

Column | Column
------ | ------
Cell   | Cell 


Letter|Digit|Character
---|---|---
a|4|$
 |365|(
b| |^  


Column | Column | Column
:----- | :----: | -----:
Left   | Center | Right
align  | align  | align

5 R chunks

Syntax for writing R code that will be interpreted is similar to the syntaxe of code citation, except that we add {r} after the three first backquotes. For instance:

```{r}
a <- 1:10
cat(a)
```

The length of the variable a is `r length(a)`.

A such kind of block is called a chunk. You can also execute little piece of code inside text using the `r ` syntax. Note the later is more often used when you want to insert the content of a variable inside a paragraph, not for doing a task.

Keep in mind that all variables declared inside a chunk are seen through all next chunks. So you can split your code in many chunk as you need, each chunk eventually separated by markdown’s code.

INFO

Rmarkdown supports also many other programming languages. So it is possible to declare a python chunk (with ```{python} syntax), and the code inside the chunk must therefore be in python and will be interpreted by a python interpreter. See https://bookdown.org/yihui/rmarkdown/language-engines.html for more information.

Warning. Objects are not shared between different programming languages. So variables declared in a R chunk are only available in next R chunks, the same stands for python and julia programming languages which have the particularity to share the same session throught all chunks. All other languages do not share session between chunks, so all objects declared in a chunk are only available for the given chunk.

5.1 Chunk options

For each chunk, you can set options to parametrize a certain number of elements. A complete list of options can be found here https://yihui.org/knitr/options/, we present below only the most often used:

Option Default Description
eval TRUE If FALSE, the code in the code chunk will not be run.
echo TRUE If FALSE, do not display the code in the code chunk in the final document.
results 'markup' If 'hide', do not display the code’s results in the final document. If 'hold', display all output pieces at the end of the chunk. If 'asis', pass through results without reformatting them (useful if results return raw HTML, etc.)
error TRUE If FALSE, do not display any error messages generated by the code.
message TRUE If FALSE, do not display any messages generated by the code.
warning TRUE If FALSE, do not display any warning messages generated by the code.
cache FALSE If TRUE, the results will be cached to reuse in future. Results will be reused until the code chunk is altered. Warning: If you change previous chunks that have an impact to a cached chunk, the chunk will not be recomputed.

Moreover, chunks can have a name to reference it in the document or called it again later. Each option must be separated by a comma , and arguments to be passed to each one must be written in R (ie it is also possible to provide R variables, define in previous chunk or in header section). For instance:

+++
params:
  echo: no
---

```{r name1, echo=params$echo, results='asis'}
a <- 1:10
cat(a)
```

It is possible to set default options for all later chunks using this trick:

```{r}
knitr::opts_chunk$set(echo=FALSE)
```

WARNING

Chunk name must be unique. An error will arrive when you compile a document with at least two chunk with the same name… Be careful when you copy/paste.

5.2 Insert figures

Previously we have seen how we can introduce picture with markdown. But what about graphics that we generate in the document with R?

Each graphic generated inside a chunk will be automatically included in the final document (through adequate chunk options) and where the chunk is.

```{r}
library(ggplot2)
ggplot(data.frame(x=1:10, y=1:10), aes(x=x, y=y)) + geom_line()
```

To manipulate figures, some supplementary chunk options are available:

Option Default Description
fig.align 'default' How to align graphics in the final document. One of 'left', 'right', or 'center'.
fig.cap NULL A character string to be used as a figure caption.
fig.height, fig.width 7 The width and height to use in R for plots created by the chunk (in inches).
out.height, out.width NULL The width and height to scale plots to in the final output.

5.3 Insert tables

As for figures, it is also possible to automatically generate tables from data.frame or matrix R objects. For this you could use the kable function of the knitr package (come with rmarkdown package).

INFO

It exists a variety of other R packages that help you to generate table from your data. Another often used package is xtable; see https://bookdown.org/yihui/rmarkdown-cookbook/table-other.html for a more exhaustive list.


You can extend the possibility that offers you kable with the kableExtra package.

For instance:

```{r}
knitr::kable(head(mtcars[, 1:4]))
```

WARNING

Don’t try to print in the document your whole dataset…

6 How to compile a document

To compile a Rmarkdown document in a R session you must use the render function of the rmarkdown package:

rmarkdown::render('my_document.Rmd')

or if you are in RStudio, you can use the button just over the file.

INFO

During the process compilation, rmarkdown will use the knitr package (it is for this reason that chunk options depend of this one) to generate a markdown file, then it will use pandoc software to transform your markdown document into a PDF, HTML, etc. one (6.1).

Pipeline realized by `rmardown` when a document is compiled. From <https://rmarkdown.rstudio.com/lesson-2.html>.

Figure 6.1: Pipeline realized by rmardown when a document is compiled. From https://rmarkdown.rstudio.com/lesson-2.html.

7 Best practices

  • End each report with the sessionInfo() command which gives you the version of all R packages used to generate it.
  • Set all parameters that can to be adjusted in the header section.
  • Use the cache option only during the development of your document, deactivate it at your final compilation.

8 To go further

You can visit many websites or read books to inspire you, to learn more about rmarkdown and associated packages you can have a look to:

You have also a cheat sheet that is very useful because it synthesize all functions:

9 Exercice

EXERCISE

Write and generate a document compiling the two previous exercice session.

References

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2022. Rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.