Reproducible Publications w/ Python and Quarto

thomasmock.quarto.pub/python-umbrella

Tom Mock

2023-07-11





RStudio PBC is now Posit PBC

  • Many assumed that RStudio only built tools for R, when we have pursued a multilingual vision for open-source data science for several years.
  • Posit stands for asking and answering the complex, valuable, sometimes vague questions that drive deeper insights.
  • Renaming helps communicate our public benefit corp mission to be the open source data science company for the next 100 years.

What is Quarto?

https://quarto.org

Quarto is an open-source scientific and technical publishing system that builds on standard markdown with features essential for scientific communication.

  • Computations: Python, R, Julia, Observable JS
  • Markdown: Pandoc flavored markdown with many enhancements
  • Output: Documents, presentations, websites, books, blogs

Literate programming system in the tradition of Org-Mode, Weave.jl, R Markdown, iPyPublish, Jupyter Book, etc.

Origins

  • Open Source project sponsored by Posit, PBC (formerly known as RStudio, PBC)
  • 10 years of experience with R Markdown, a similar system that was R-specific, convinced us that the core ideas were sound
  • The number of languages and runtimes used for scientific discourse is broad
  • Quarto is a ground-up re-imagining of R Markdown that is fundamentally multi-language and multi-engine
  • Quarto gets inspiration from both R Markdown and Jupyter, and provides a plain-text option or the use of native Jupyter notebooks

Goal: Computation document

  • Documents that include source code for their production
  • Notebook AND plain-text flavors
  • Programmatic automation and reproducibility

Goal: Scientific Markdown

Goal: Single Source Publishing

Simple Example

---
title: "matplotlib demo"
format:
  html:
    code-fold: true
jupyter: python3
---

For a demonstration of a line plot on a polar 
axis, see @fig-polar.
```{python}
#| label: fig-polar
#| fig-cap: "A line plot on a polar axis"

import numpy as np
import matplotlib.pyplot as plt

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(
  subplot_kw = {'projection': 'polar'} 
)
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()
```

Simple Example, multi-format


Can be rendered to dozens of output formats with Quarto (via Pandoc):

$ quarto render hello.qmd --to html
$ quarto render hello.qmd --to pdf
$ quarto render hello.qmd --to docx
$ quarto render hello.qmd --to epub
$ quarto render hello.qmd --to pptx
$ quarto render hello.qmd --to revealjs

Feature R Markdown Quarto
Cross References
Websites & Blogs
Books
Interactivity Shiny Documents Quarto Interactive Documents
Paged HTML pagedown Coming soon!
Journal Articles rticles Out and more coming!
Dashboards flexdashboard Coming soon!

So what is Quarto?

Quarto® is an open-source scientific and technical publishing system built on Pandoc.

  • quarto is a language agnostic command line interface (CLI)
thomasmock$ quarto --help
Usage:   quarto
Version 1.4.193

Commands:
  render  [input] [args...] - Render input file(s) to various document types.            
  preview [file] [args...]  - Render and preview a document or website project.          
  publish [provider] [path] - Publish a document or project.

Basic Workflow

Rendering (execute and write to disk):

# plain text qmd
$ quarto render python.qmd
$ quarto render python.qmd --to pdf

# ipynb notebook
$ quarto render python.ipynb
$ quarto render python.ipynb --execute

Preview (execute, write to disk, and maintain a LIVE preview of content):

# plain text qmd
$ quarto preview python.qmd
$ quarto preview python.qmd --to pdf

# ipynb notebook
$ quarto preview python.ipynb
$ quarto preview python.ipynb --execute

IPython

For execution of R, Quarto uses knitr as the engine, but for Python Quarto natively executes Python with Jupyter kernels such as IPython.

  • The indicated or default Python Jupyter kernel is bound automatically when {python} executable cells are present. You can set a specific kernel via the YAML:
---
title: "My doc"
date: today
jupyter: python3
---
  • IPython executes Python code and transforms it to plain text, graphics, markdown, HTML, etc.

  • For interactive sessions, Quarto keeps the Jupyter Kernel resident as a daemon to mitigate startup times.

Stored/frozen computation and reproducibility

  1. Jupyter natively approaches this as storing the source code, output file, and cache the resulting computation in a single document (.ipynb which is JSON)
  1. Jupyter Cache provides transient caching of cell outputs for a doc (if any cells in doc change, then all of the cells will be re-executed)
  1. Quarto’s Freeze feature uses a multi-file approach:
  • Source code input (plain text .qmd and/or .ipynb)
  • Complete output file (some format like .html or .pdf)
  • Frozen computation stored separately by directory and file as .json, allows for permanately saving and re-use of computational outputs across entire project.

A .qmd is a plain text file

Metadata (YAML)

format: html
jupyter: python3
format: html
engine: knitr

Code

```{python}
import polars as pl
(mtcars
  .groupby("cyl")
  .agg([(pl.col("mpg").mean())]))
```
```{r}
library(dplyr)
mtcars |> 
  group_by(cyl) |> 
  summarize(mean(mpg))
```

Text

# Heading 1
This is a sentence with some **bold text**, some *italic text* and an 
![image](image.png){fig-alt="Alt text for this image"}.

But Quarto doesn’t have to be plain-text

Rendering pipeline

Plain text workflow (.qmd uses Jupyter kernel to execute cells):

Notebook workflow (defaults to using existing stored computation):

What to do with my existing .ipynb?

You can keep using them! You get to choose whether to use the stored computation OR re-execute the document from top to bottom.


# --execute flag is optional - forces re-execution
quarto render my-favorite.ipynb --to html --execute


Quarto can help convert back and forth between plain text .qmd and .ipynb - kind of like jupytext but specific to Quarto:

quarto convert --help

Usage:   quarto convert <input>
Description:
    Convert documents to alternate representations.

Convert notebook to markdown:                quarto convert doc.ipynb                
Convert markdown to notebook:                quarto convert doc.qmd                  
Convert notebook to markdown, write to file: quarto convert doc.ipynb --output doc.qmd

Comfort of your own workspace

A screenshot of a Quarto document rendered inside JupyterLab

A screenshot of a Quarto document rendered inside VSCode

A screenshot of a Quarto document rendered inside RStudio

Auto-completion in RStudio + VSCode


Both RStudio and VSCode with the Quarto extension have rich auto-completion

YAML

A gif of auto-completion and search for YAML options inside RStudio

Chunk option

A gif of auto-completion of a R chunk inside RStudio

Quarto Extensions and Visual/Live Editor

A screenshot of a Quarto document rendered inside JupyterLab

A screenshot of a Quarto document rendered inside VSCode

A screenshot of a Quarto document rendered inside RStudio

Quarto, unified document layout

quarto render boston-terrier.qmd --to html
quarto render boston-terrier.qmd --to pdf

A screenshot of a HTML article about Boston Terriers, the document has an image in the right hard margin, a floating table of contents, and different sections split up by headers

HTML

A screenshot of a PDF article about Boston Terriers, the document has an image in the right hard margin, a floating table of contents, and different sections split up by headers

PDF

Quarto, unified syntax across markdown and code

Add two images on disk to a two column layout.

::: {layout-ncol=2}
![Surus](surus.png)

![Hanno](hanno.png)
:::

Two plots from code, layout in two columns.

```{python}
#| layout-ncol: 2
#| fig-cap: ["Scatter", "Boxplot"]

from plotnine import ggplot, geom_point, geom_boxplot, aes, stat_smooth, facet_wrap, theme
from plotnine.data import mtcars

# plot 1 in column 1
plot1 = (ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)'))
   + geom_point() + stat_smooth(method='lm')
   + facet_wrap('~gear')).draw(show=True)

# plot 2 in column 2
plot2 = (ggplot(mtcars, aes('cyl', 'mpg', color='factor(cyl)'))
+ geom_boxplot()).draw(show=True)
```

Scatter

Boxplot

Built-in vs custom

One goal of Quarto is to provide a markdown-centric format-agnostic syntax as shown in previous slides.

  • Quarto bundles Bootstrap CSS and themes, and respects SASS variables for robust styling of HTML content (HTML documents, websites, books, slides, etc).
  • Quarto includes LaTeX templates for specific journals as well as good defaults for PDF outputs in general.
  • Quarto respects docx and pptx templates, again allowing for robust styling.
  • You shouldn’t HAVE to escape out to writing raw LaTeX, HTML, Jinja templates, etc
  • In vast majority of situations, can rely purely on Markdown syntax
  • BUT you can always include raw content such as LaTeX, CSS, HTML, JavaScript to further customize and optimize for a specific format.

Extending Quarto with extensions

Shortcodes

  • Replace inline “short codes” with output.
{{< fa thumbs-up >}} 


Filters

  • Affect rendering of specific items

A screenshot of a code chunk

Formats

  • Add entirely custom new formats
---
title: "Cool Company 2022 Presentation"
format: coolco-revealjs
---

Interactivity, Jupyter Widgets

import plotly.express as px
import plotly.io as pio
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", 
                 color="species", 
                 marginal_y="violin", marginal_x="box", 
                 trendline="ols", template="simple_white")
fig.show()

Interactivity, Observable

Quarto also includes native support for Observable JS, a set of enhancements to vanilla JavaScript created by Mike Bostock (also the author of D3)

Interactivity, on the fly Observable “widgets”

Quarto including Observable means you can create new “widgets” or allow the user to modify portions of the doc on the fly.


Converting temperature from ℃ to ℉

Celsius = and Fahrenheit = ℉.

```{ojs}
viewof temp = Inputs.range([0, 100], {step: 1, value: 34, label: htl.html`Temp &#x2103;`})
```

Converting temperature from &#x2103; to &#x2109; <br>  
Celsius = ${d3.format(".0f")(temp)}&#x2103; and Fahrenheit = ${d3.format(".1f")(temp * 9/5 + 32)}&#x2109;.

Parameters - one source, many outputs

  • Showing results for a specific geographic location.
  • Running a report that covers a specific time period.
  • Running a single analysis multiple times for different assumptions.

Jupyter Engine

```{python}
#| tags: [parameters]

alpha = 0.1
ratio = 0.1
```

The parameters are available in the top level environment:

```{python}
str(alpha)
```

knitr Engine

---
title: "My Document"
params:
  alpha: 0.1
  ratio: 0.1
---

The parameters are available in the params list:

```{r}
str(params$alpha)
```

Rendering with parameters

To render using different parameters you can pass them on the command line using the -P flag:

quarto render notebook.ipynb -P alpha:0.2 -P ratio:0.3

Alternatively, if you have many parameters you can create a YAML file that defines the parameter values you want to render with, then call quarto render from the command line with the --execute-params flag:

quarto render notebook.ipynb --execute-params params.yml

Quarto Publish

quarto publish --help

  Usage:   quarto publish [provider] [path]
  Version 1.4.193                          
                                           
  Description:
    Publish a document or project. Available providers include:
                                                               
     - Quarto Pub (quarto-pub)                                 
     - GitHub Pages (gh-pages)                                 
     - Posit Connect (connect)                               
     - Netlify (netlify)                                       

Screenshot of the quartopub.com website

Quarto, crafted with love and care

Development of Quarto is sponsored by Posit, PBC (formerly known as RStudio, PBC). The same core team works on both Quarto and R Markdown:

Here is the full contributors list. Quarto is open source and we welcome contributions in our github repository as well! https://github.com/quarto-dev/quarto-cli.

Quarto

  • Batteries included, shared syntax across output types and languages
  • Single source publishing across document types, with raw customization allowed
  • Choose your own editor for plain text .qmd or Jupyter notebooks
  • Quarto projects + freeze for managing stored computation

Follow @quarto_pub #QuartoPub on Twitter/Fosstodon to stay up to date!

Web resources

Quarto resources

General Quarto

Why the name “Quarto”?1