Reproducible Publications w/ Python and Quarto


Tom Mock, 

 thomasmock.quarto.pub/python

2022-12-03





RStudio PBC is now Posit PBC

  • Many assumed that RStudio only built tools for R, when we have pursued a multilingual vision for open-source data science for several years.
  • Posit stands for asking and answering the complex, valuable, sometimes vague questions that drive deeper insights.
  • Renaming helps communicate our public benefit corp mission to be the open source data science company for the next 100 years.

What is Quarto?

https://quarto.org

Quarto is an open-source scientific and technical publishing system that builds on standard markdown with features essential for scientific communication.

  • Computations: Python, R, Julia, Observable JS
  • Markdown: Pandoc flavored markdown with many enhancements
  • Output: Documents, presentations, websites, books, blogs

Literate programming system in the tradition of Org-Mode, Weave.jl, R Markdown, iPyPublish, Jupyter Book, etc.

Origins

  • Open Source project sponsored by Posit, PBC (formerly known as RStudio, PBC)
  • 10 years of experience with R Markdown, a similar system that was R-specific, convinced us that the core ideas were sound
  • The number of languages and runtimes used for scientific discourse is broad
  • Quarto is a ground-up re-imagining of R Markdown that is fundamentally multi-language and multi-engine
  • Quarto gets inspiration from both R Markdown and Jupyter, and provides a plain-text option or the use of native Jupyter notebooks

Goal: Computation document

  • Documents that include source code for their production
  • Notebook AND plain-text flavors
  • Programmatic automation and reproducibility

Goal: Scientific Markdown

Goal: Single Source Publishing

Simple Example

---
title: "matplotlib demo"
format:
  html:
    code-fold: true
jupyter: python3
---

For a demonstration of a line plot on a polar 
axis, see @fig-polar.
```{python}
#| label: fig-polar
#| fig-cap: "A line plot on a polar axis"

import numpy as np
import matplotlib.pyplot as plt

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(
  subplot_kw = {'projection': 'polar'} 
)
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()
```

Simple Example, multi-format


Can be rendered to dozens of output formats with Quarto (via Pandoc):

$ quarto render hello.qmd --to html
$ quarto render hello.qmd --to pdf
$ quarto render hello.qmd --to docx
$ quarto render hello.qmd --to epub
$ quarto render hello.qmd --to pptx
$ quarto render hello.qmd --to revealjs

Feature R Markdown Quarto
Cross References
Websites & Blogs
Books
Interactivity Shiny Documents Quarto Interactive Documents
Paged HTML pagedown Coming soon!
Journal Articles rticles Out and more coming!
Dashboards flexdashboard Coming soon!

So what is Quarto?

Quarto® is an open-source scientific and technical publishing system built on Pandoc.

  • quarto is a language agnostic command line interface (CLI)
thomasmock$ quarto --help
Usage:   quarto
Version: 1.2.269

Commands:
  render  [input] [args...] - Render input file(s) to various document types.            
  preview [file] [args...]  - Render and preview a document or website project.          
  publish [provider] [path] - Publish a document or project.

Basic Workflow

Rendering (execute and write to disk):

# plain text qmd
$ quarto render python.qmd
$ quarto render python.qmd --to pdf

# ipynb notebook
$ quarto render python.ipynb
$ quarto render python.ipynb --execute

Preview (execute, write to disk, and maintain a LIVE preview of content):

# plain text qmd
$ quarto preview python.qmd
$ quarto preview python.qmd --to pdf

# ipynb notebook
$ quarto preview python.ipynb
$ quarto preview python.ipynb --execute

IPython

For execution of R, Quarto uses knitr as the engine, but for Python Quarto natively executes Python with Jupyter kernels such as IPython.

  • The indicated or default Python Jupyter kernel is bound automatically when {python} executable cells are present. You can set a specific kernel via the YAML:
---
title: "My doc"
date: today
jupyter: python3
---
  • IPython executes Python code and transforms it to plain text, graphics, markdown, HTML, etc.

  • For interactive sessions, Quarto keeps the Jupyter Kernel resident as a daemon to mitigate startup times.

Stored/frozen computation and reproducibility

  1. Jupyter natively approaches this as storing the source code, output file, and cache the resulting computation in a single document (.ipynb which is JSON)
  1. Jupyter Cache provides transient caching of cell outputs for a doc (if any cells in doc change, then all of the cells will be re-executed)
  1. Quarto’s Freeze feature uses a multi-file approach:
  • Source code input (plain text .qmd and/or .ipynb)
  • Complete output file (some format like .html or .pdf)
  • Frozen computation stored separately by directory and file as .json, allows for permanately saving and re-use of computational outputs across entire project.

A .qmd is a plain text file

Metadata (YAML)

format: html
jupyter: python3
format: html
engine: knitr

Code

```{python}
from siuba import *
(mtcars
  >> group_by(_.cyl)
  >> summarize(avg_mpg = _.mpg.mean()))
```
```{r}
library(dplyr)
mtcars |> 
  group_by(cyl) |> 
  summarize(mean = mean(mpg))
```

Text

# Heading 1
This is a sentence with some **bold text**, some *italic text* and an 
![image](image.png){fig-alt="Alt text for this image"}.

But Quarto doesn’t have to be plain-text

Rendering pipeline

Plain text workflow (.qmd uses Jupyter kernel to execute cells):

Notebook workflow (defaults to using existing stored computation):

What to do with my existing .ipynb?

You can keep using them! You get to choose whether to use the stored computation OR re-execute the document from top to bottom.


# --execute flag is optional - forces re-execution
quarto render my-favorite.ipynb --to html --execute


Quarto can help convert back and forth between plain text .qmd and .ipynb - kind of like jupytext but specific to Quarto:

quarto convert --help

Usage:   quarto convert <input>
Description:
    Convert documents to alternate representations.

Convert notebook to markdown:                quarto convert doc.ipynb                
Convert markdown to notebook:                quarto convert doc.qmd                  
Convert notebook to markdown, write to file: quarto convert doc.ipynb --output doc.qmd

nbdev + Quarto = super powers

A tweet by Jeremy Howard, FYI nbdev will be moving to Quarto and Fastdoc probably too

A tweet by Hamel Husain, 'I'm going to be announcing an epic new version of nbdev in tihs talk! The next version of nbdev is going to be built on top of Quarto'

fast.ai - nbdev+Quarto: A new secret weapon for productivity

Comfort of your own workspace

A screenshot of a Quarto document rendered inside JupyterLab

A screenshot of a Quarto document rendered inside VSCode

A screenshot of a Quarto document rendered inside RStudio

Auto-completion in RStudio + VSCode


Both RStudio and VSCode with the Quarto extension have rich auto-completion

YAML

A gif of auto-completion and search for YAML options inside RStudio

Chunk option

A gif of auto-completion of a R chunk inside RStudio

Quarto, unified document layout

quarto render boston-terrier.qmd --to html
quarto render boston-terrier.qmd --to pdf

A screenshot of a HTML article about Boston Terriers, the document has an image in the right hard margin, a floating table of contents, and different sections split up by headers

HTML

A screenshot of a PDF article about Boston Terriers, the document has an image in the right hard margin, a floating table of contents, and different sections split up by headers

PDF

Quarto, unified syntax across markdown and code

Add two images on disk to a two column layout.

::: {layout-ncol=2}
![Surus](surus.png)

![Hanno](hanno.png)
:::

Two plots from code, layout in two columns.

```{python}
#| layout-ncol: 2
#| fig-cap: ["Scatter", "Boxplot"]

from plotnine import ggplot, geom_point, geom_boxplot, aes, stat_smooth, facet_wrap, theme
from plotnine.data import mtcars

# plot 1 in column 1
plot1 = (ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)'))
   + geom_point() + stat_smooth(method='lm')
   + facet_wrap('~gear')).draw(show=True)

# plot 2 in column 2
plot2 = (ggplot(mtcars, aes('cyl', 'mpg', color='factor(cyl)'))
+ geom_boxplot()).draw(show=True)
```

Scatter

Boxplot

Built-in vs custom

One goal of Quarto is to provide a markdown-centric format-agnostic syntax as shown in previous slides.

  • Quarto bundles Bootstrap CSS and themes, and respects SASS variables for robust styling of HTML content (HTML documents, websites, books, slides, etc).
  • Quarto includes LaTeX templates for specific journals as well as good defaults for PDF outputs in general.
  • Quarto respects docx and pptx templates, again allowing for robust styling.
  • You shouldn’t HAVE to escape out to writing raw LaTeX, HTML, Jinja templates, etc
  • In vast majority of situations, can rely purely on Markdown syntax
  • BUT you can always include raw content such as LaTeX, CSS, HTML, JavaScript to further customize and optimize for a specific format.

Extending Quarto with extensions

Shortcodes

  • Replace inline “short codes” with output.
{{< fa thumbs-up >}} 


Filters

  • Affect rendering of specific items

A screenshot of a code chunk

Formats

  • Add entirely custom new formats
---
title: "Cool Company 2022 Presentation"
format: coolco-revealjs
---

Interactivity, Jupyter Widgets

import plotly.express as px
import plotly.io as pio
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", 
                 color="species", 
                 marginal_y="violin", marginal_x="box", 
                 trendline="ols", template="simple_white")
fig.show()

Interactivity, Observable

Quarto also includes native support for Observable JS, a set of enhancements to vanilla JavaScript created by Mike Bostock (also the author of D3)

Interactivity, on the fly Observable “widgets”

Quarto including Observable means you can create new “widgets” or allow the user to modify portions of the doc on the fly.


Converting temperature from ℃ to ℉

Celsius = and Fahrenheit = ℉.

```{ojs}
viewof temp = Inputs.range([0, 100], {step: 1, value: 34, label: htl.html`Temp &#x2103;`})
```

Converting temperature from &#x2103; to &#x2109; <br>  
Celsius = ${d3.format(".0f")(temp)}&#x2103; and Fahrenheit = ${d3.format(".1f")(temp * 9/5 + 32)}&#x2109;.

Quarto Publish

quarto publish --help

  Usage:   quarto publish [provider] [path]
  Version: 1.2.269                          
                                           
  Description:
    Publish a document or project. Available providers include:
                                                               
     - Quarto Pub (quarto-pub)                                 
     - GitHub Pages (gh-pages)                                 
     - Posit Connect (connect)                               
     - Netlify (netlify)                                       

Screenshot of the quartopub.com website

Quarto, crafted with love and care

Development of Quarto is sponsored by Posit, PBC (formerly known as RStudio, PBC). The same core team works on both Quarto and R Markdown:

Here is the full contributors list. Quarto is open source and we welcome contributions in our github repository as well! https://github.com/quarto-dev/quarto-cli.

Quarto

  • Batteries included, shared syntax across output types and languages
  • Single source publishing across document types, with raw customization allowed
  • Choose your own editor for plain text .qmd or Jupyter notebooks
  • Quarto projects + freeze for managing stored computation

Follow @quarto_pub #QuartoPub or me @thomas_mock on Twitter/Fosstodon.org to stay up to date!

Web resources

Quarto resources

General Quarto

Why the name “Quarto”?1