Beautiful tables with gt and gtExtras

Tom Mock

Why do we care about tables?

Why do we care about graphs?

Both Graphs and tables are tools for communication

Better Graphs/Tables are better communication

A grammar of graphics

  • ggplot2 is an application of the grammar of graphics for R

  • A default dataset and set of mappings from variables to aesthetics

  • One or more layers of geometric objects

  • One scale for each aesthetic mapping used

  • A coordinate system

  • The facet specification

A grammar of graphics1

Easy enough to rapidly prototype graphics at the “speed of thought”

Powerful enough for final “publication”

Tables with R

Construct a wide variety of useful tables with a cohesive set of table parts. These include the table header, the stub, the column labels and spanner column labels, the table body and the table footer.

Easy enough to rapidly prototype

Powerful enough for final “publication”

What about merging graphs AND tables?

🌶️ Hot take - Horizontal bar charts and simple charts with facets are already tables.

body_mass plot
Adelie
male 4,043.49
female 3,368.84
Chinstrap
male 3,938.97
female 3,527.21
Gentoo
male 5,484.84
female 4,679.74

Bar plot code

ggplot(grp_df, aes(x = body_mass, y = sex)) +
  geom_col(fill = "purple") +
  geom_text(
    aes(
      label = scales::label_number(big.mark = ",", accuracy = .1)(body_mass)
    ),
    position = position_nudge(x = -100), hjust = 1,
    color = "white", fontface = "bold"
  ) +
  facet_wrap(~species, ncol = 1) +
  bbplot::bbc_style() +
  theme(
    panel.grid.major.x = element_line(color = "grey"),
    panel.grid.major.y = element_blank()
  )
gt(grp_tab_df, groupname_col = "species", rowname_col = "sex") |> 
  gtExtras::gt_duplicate_column(body_mass, dupe_name = "plot") |> 
  gtExtras::gt_plt_bar_pct(plot) |> 
  gt::cols_width(plot ~ px(150)) |> 
  gt::fmt_number(body_mass) |> 
  gtExtras::gt_theme_nytimes()

Hot take, continued

Distribution, code

library(ggridges)
ggplot(
  na.omit(penguins),
  aes(x = body_mass_g, y = sex)
) +
  geom_density_ridges() +
  facet_wrap(~species, ncol = 1)
pen_qu <- penguins |>
  na.omit() |>
  group_by(species, sex) |>
  summarise(
    quantile = list(
      quantile(body_mass_g,
        c(0.25, 0.5, 0.75),
        q = c(0.25, 0.5, 0.75)
      )
    ),
    plot = list(body_mass_g),
    .groups = "drop"
  ) |>
  unnest_wider(quantile) |>
  arrange(species, desc(sex))
gt(pen_qu,
  rowname_col = "sex",
  groupname_col = "species"
) |>
  gtExtras::gt_plt_dist(
    plot, 
    fig_dim = c(6,40)) |>
  fmt_number(
    where(is.numeric), 
    decimals = 1
    )

gtExtras

  • Themes: 7 themes that style almost every element of a gt table, built off of data journalism-styled tables

  • Utilities: Helper functions for aligning/padding numbers, adding fontawesome icons, images, highlighting, dividers, styling by group, creating two tables or two column layouts, extracting ordered data from a gt table internals, or generating a random dataset for reprex

  • Plotting: 12 plotting functions for inline sparklines, win-loss charts, distributions (density/histogram), percentiles, dot + bar, bar charts, confidence intervals, or summarizing an entire dataframe!

  • Colors: 3 functions, a palette for “Hulk” style scale (purple/green), coloring rows with good defaults, or adding a “color box” along with the cell value

Themes

Structuring data

car_summary <- mtcars %>%
  dplyr::group_by(cyl) %>%
  
  dplyr::summarize(
    mean = mean(mpg),
    sd = sd(mpg),
    # must end up with list of data for each row in the input dataframe
    mpg_data = list(mpg),
    .groups = "drop"
  )

car_summary
# A tibble: 3 × 4
    cyl  mean    sd mpg_data  
  <dbl> <dbl> <dbl> <list>    
1     4  26.7  4.51 <dbl [11]>
2     6  19.7  1.45 <dbl [7]> 
3     8  15.1  2.56 <dbl [14]>

Opinionated plots

gtExtras plotting provides an opinionated API that predominantly supports one way of doing things, rather than providing maximum flexibility to accommodate different workflows.

One way == one-liner

The “one way” is almost always list() data with focused function arguments.

Ultimate goal: Simple, Inline, Effective

  • Graph for quick trends
  • Tabular values for exact comparison

Sparklines

car_summary %>%
  arrange(desc(cyl)) %>% 
  gt() %>%
  gtExtras::gt_plt_sparkline(mpg_data) %>%
  fmt_number(columns = mean:sd, decimals = 1)

Dot + Bar

dot_bar_tab <- mtcars %>%
  head() %>%
  dplyr::mutate(cars = sapply(strsplit(rownames(.)," "), `[`, 1)) %>%
  dplyr::select(cars, mpg, disp) %>%
  gt() %>%
  gt_plt_dot(column = disp, category_column = cars, 
              palette = "ggthemes::fivethirtyeight") %>%
  cols_width(cars ~ px(125))

Point

point_tab <- dplyr::tibble(x = c(seq(1.2e6, 2e6, length.out = 5))) %>%
  gt::gt() %>%
  gt_duplicate_column(x,dupe_name = "point_plot") %>%
  gt_plt_point(point_plot, accuracy = .1, width = 25) %>%
  gt::fmt_number(x, suffixing = TRUE, decimals = 1)

Percentile

dot_plt <- dplyr::tibble(x = c(seq(10, 90, length.out = 5))) %>%
  gt() %>%
  gt_duplicate_column(x, dupe_name = "dot_plot") %>%
  gt_plt_percentile(dot_plot)

Confidence interval

# gtExtras can calculate basic conf int
# using confint() function
ci_table <- generate_df(
  n = 50, n_grps = 3,
  mean = c(10, 15, 20), sd = c(10, 10, 10),
  with_seed = 37
) %>%
  dplyr::group_by(grp) %>%
  dplyr::summarise(
    n = dplyr::n(),
    avg = mean(values),
    sd = sd(values),
    list_data = list(values)
  ) %>%
  gt::gt() %>%
  gt_plt_conf_int(list_data, ci = 0.9)

Confidence interval, user defined

# You can also provide your own values
# based on your own algorithm/calculations
pre_calc_ci_tab <- dplyr::tibble(
  mean = c(12, 10), ci1 = c(8, 5), ci2 = c(16, 15),
  ci_plot = c(12, 10)
) %>%
  gt::gt() %>%
  gt_plt_conf_int(
    column = ci_plot, 
    ci_columns = c(ci1, ci2),
    palette = c("red", "lightgrey", "black", "red")
    )

Creating custom gt functions

A short primer on tidyeval

Tidy evaluation is a special type of non-standard evaluation used throughout the tidyverse. - Programming with dplyr

This powers the ability to do:

mtcars %>% 
  # Knows where and how to find `cyl` as a column, not an object
  group_by(cyl) %>% 
  # knows how to find `mpg` as a column, not an object
  summarize(n = n(), mean = mean(mpg))

You can get most tidyeval things working with just two new concepts:

  • Embrace your variable with {{ var }}, also known as ‘curly-curly’
  • Pass the dots with ... for many arguments

tidyeval in practice

library(dplyr)
car_summary <- function(var){
  mtcars %>% 
    group_by({{var}}) %>% 
    summarize(mean = mean(mpg),
              n = n())
}
car_summary(vs)
# A tibble: 2 × 3
     vs  mean     n
  <dbl> <dbl> <int>
1     0  16.6    18
2     1  24.6    14
car_summary(vs, am)
Error in car_summary(vs, am): unused argument (am)

tidyeval in practice

car_summary_dots <- function(...){
  mtcars %>% 
    # add the dots
    group_by(...) %>% 
    summarize(mean = mean(mpg),
              n = n(), .groups = "drop")
}
car_summary_dots(vs, am, cyl)
# A tibble: 7 × 5
     vs    am   cyl  mean     n
  <dbl> <dbl> <dbl> <dbl> <int>
1     0     0     8  15.0    12
2     0     1     4  26       1
3     0     1     6  20.6     3
4     0     1     8  15.4     2
5     1     0     4  22.9     3
6     1     0     6  19.1     4
7     1     1     4  28.4     7

tidyeval with gt

library(gt)
gt_add_divider <- function(gt_object, columns, ..., include_labels = TRUE) {
  stopifnot("Table must be of class 'gt_tbl'" = "gt_tbl" %in% class(gt_object))
  gt_object %>%
    tab_style(
      # dots include passed named arguments to the internal function
      style = cell_borders(sides = "right", ...),
      locations = if (isTRUE(include_labels)) {
        # columns to affect
        list(cells_body(columns = {{ columns }}),
          cells_column_labels(columns = {{ columns }}))
      } else {
        cells_body(columns = {{ columns }})
      }
    )
}

tidyeval with gt in practice

basic_table <- head(mtcars, 6) %>% 
  gt()
basic_table
mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

tidyeval with gt in practice

basic_table %>% 
  # %>% data passed to 1st argument
  gt_add_divider(cyl)
mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

gt - more arguments

basic_table %>% 
  # optional arguments accepted by name via `...`
  gt_add_divider(cyl, weight = px(2), color = "red")
mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

gt - moar arguments

basic_table %>% 
  ### include_labels as an existing argument
  gt_add_divider(
    c(cyl,mpg), weight = px(3), 
    color = "orange", include_labels = FALSE
    )
mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

tidyeval with gt

library(gt)
gt_add_divider <- function(gt_object, columns, ..., include_labels = TRUE) {
  stopifnot("Table must be of class 'gt_tbl'" = "gt_tbl" %in% class(gt_object))
  gt_object %>%
    tab_style(
      # dots include passed named arguments to the internal function
      style = cell_borders(sides = "right", ...),
      locations = if (isTRUE(include_labels)) {
        # columns to affect
        list(cells_body(columns = {{ columns }}),
          cells_column_labels(columns = {{ columns }}))
      } else {
        cells_body(columns = {{ columns }})
      }
    )
}

gt themes!

gt_theme_nytimes <- function(gt_object, ...){

  stopifnot("'gt_object' must be a 'gt_tbl', have you accidentally passed raw data?" = "gt_tbl" %in% class(gt_object))

  gt_object %>%
    tab_options(
      heading.align = "left",
      column_labels.border.top.style = "none",
      table.border.top.style = "none",
      column_labels.border.bottom.style = "none",
      column_labels.border.bottom.width = 1,
      column_labels.border.bottom.color = "#334422",
      table_body.border.top.style = "none",
      table_body.border.bottom.color = "white",
      heading.border.bottom.style = "none",
      data_row.padding = px(7),
      column_labels.font.size = px(12),
      ...
    ) %>%
    tab_style(
      style = cell_text(
        color = "darkgrey",
        font = google_font("Source Sans Pro"),
        transform = "uppercase"
      ),
      locations = cells_column_labels(everything())
    )  %>%
    tab_style(
      style = cell_text(font = google_font("Libre Franklin"),
                        weight = 800),
      locations = cells_title(groups = "title")
    ) %>%
    tab_style(
      style = cell_text(
        font = google_font("Source Sans Pro"),
        weight =  400
      ),
      locations = cells_body()
    )

}

gt themes

gt_theme_basic <- function(gt_object, ...){

  gt_object %>%
    tab_options(
      heading.align = "left",
      data_row.padding = px(7),
      column_labels.font.size = px(12),
      ...
    ) %>%
    tab_style(
      style = cell_text(
        color = "darkgrey",
        font = google_font("Source Sans Pro"),
        transform = "uppercase"
      ),
      locations = cells_column_labels(everything())
    )
}

gt + Plots

library(ggplot2)

plot_object <-
  ggplot(
    data = gtcars,
    aes(x = hp, y = trq, size = msrp)
  ) +
  geom_point(color = "blue") +
  theme(legend.position = "none")
plot_object

gt + Plots

dplyr::tibble(
  text = "Here is a ggplot:",
  ggplot = NA
) %>%
  gt() %>%
  text_transform(
    locations = cells_body(columns = ggplot),
    fn = function(x) {
      plot_object %>%
        ggplot_image(height = px(200))
    }
  )
text ggplot
Here is a ggplot:

gtExtras + Plots

3 step process:

  • list() data by row/group
  • For specific column, create a tiny ggplot2 graph and save to disk
  • Read image back in, and embed image into HTML

Step infinity: protect against user-input/error handling

gtExtras + create graph

plot_out <- ggplot(data = NULL, aes(x = vals, y = factor("1"))) +
  geom_col(width = 0.1, color = palette[1], fill = palette[1]) +
  geom_vline(
    xintercept = target_vals, color = palette[2], size = 1.5,
    alpha = 0.7
  ) +
  geom_vline(xintercept = 0, color = "black", size = 1) +
  theme_void() +
  coord_cartesian(xlim = c(0, max_val)) +
  scale_x_continuous(expand = expansion(mult = c(0, 0.15))) +
  scale_y_discrete(expand = expansion(mult = c(0.1, 0.1))) +
  theme_void() +
  theme(
    legend.position = "none",
    plot.margin = margin(0, 0, 0, 0, "pt"),
    plot.background = element_blank(),
    panel.background = element_blank()
  )

gtExtras + save/read

out_name <- file.path(tempfile(
  pattern = "file", tmpdir = tempdir(),
  fileext = ".svg"
))

ggsave(out_name,
  plot = plot_out, dpi = 25.4, height = 5, width = width,
  units = "mm", device = "svg"
)

img_plot <- readLines(out_name) %>%
  paste0(collapse = "") %>%
  gt::html()

gtExtras + gt

gt_plt_bullet <- function(gt_object, column = NULL, target = NULL, width = 65,
                          palette = c("grey", "red")) {

  # extract the values from specified columns
  all_vals <- gt_index(gt_object, {{ column }})

  max_val <- max(all_vals, na.rm = TRUE)
  length_val <- length(all_vals)

  target_vals <- gt_index(gt_object, {{ target }})

  col_bare <- gt_index(gt_object, {{ column }}, as_vector = FALSE) %>%
    dplyr::select({{ column }}) %>%
    names()

  tab_out <- gt_object %>%
    text_transform(
      locations = cells_body({{ column }}),
      fn = function(x) {
        bar_fx <- function(vals, target_vals) {
          ...plotting_code...
          ...write_read_code...
          img_plot
        }
        tab_built <- mapply(bar_fx, all_vals, target_vals)
        tab_built
      }
    ) %>%
    gt::cols_align(align = "left", columns = {{ column }}) %>%
    gt::cols_hide({{ target }})

  tab_out
}

You can write your own gt “extra” functions!