Cross-Tabulation with Percentages, Weights, and Grouping

cross_tab() produces a cross-tabulation of x by y, with optional stratification using a grouping variable (by). It supports weighted frequencies, row or column percentages, and association statistics (Chi-squared test, Cramer's V).

Usage

cross_tab(
  d = parent.frame(),
  x,
  y = NULL,
  by = NULL,
  weights = NULL,
  rescale_weights = FALSE,
  digits = 1,
  rowprct = FALSE,
  row_total = TRUE,
  column_total = TRUE,
  n = TRUE,
  drop = TRUE,
  include_stats = TRUE,
  combine = FALSE,
  ...
)

Arguments

d: A data.frame, or a vector (when using vector input). Must contain all variables used in x, y, by, and weights.
x: Variable for table rows. Can be unquoted (tidy) or quoted (standard). Must match column name if d is a data frame.
y: Optional variable for table columns. Same rules as x. If NULL, computes a one-way frequency table.
by: Optional grouping variable (or interaction of variables). Used to produce stratified crosstabs. Must refer to columns in d, or be a vector of the same length as x.
weights: Optional numeric vector of weights. Must match length of x.
rescale_weights: Logical. If TRUE, rescales weights so that total weighted count matches unweighted count.
digits: Integer. Number of decimal places shown in percentages. Default is 1.
rowprct: Logical. If TRUE, computes percentages by row; otherwise by column.
row_total: Logical. If TRUE, adds row totals (default TRUE).
column_total: Logical. If TRUE, adds column totals (default TRUE).
n: Logical. If TRUE, displays effective counts N as an extra row or column (default TRUE).
drop: Logical. If TRUE, drops empty rows or columns (default TRUE).
include_stats: Logical. If TRUE, includes Chi-squared test and Cramer's V when possible (default TRUE).
combine: Logical. If TRUE, combines all stratified tables into one tibble with a by column.
...: Additional arguments passed to print.spicy(), such as show_all = TRUE

Value

A tibble of class spicy, or a list of such tibbles if combine = FALSE and by is used.

Details

The function is flexible:

Accepts both standard (quoted) and tidy (unquoted) variable input
Performs stratified tabulations using a grouping variable (by)
Optionally combines group-level tables into a single tibble with combine = TRUE
Pipe-friendly with both base R (|>) and magrittr (%>%)

All variables (x, y, by, weights) must be present in the data frame d (unless vector input is used).

Warnings and Errors

If weights is non-numeric, an error is thrown.
If weights does not match the number of observations, an error is thrown.
If rescale_weights = TRUE but no weights are provided, a warning is issued.
If all values in by are NA, an error is thrown.
If by has only one unique level (or all NA), a warning is issued.

Examples

data(mtcars)
mtcars$gear <- factor(mtcars$gear)
mtcars$cyl <- factor(mtcars$cyl)
mtcars$vs <- factor(mtcars$vs, labels = c("V", "S"))
mtcars$am <- factor(mtcars$am, labels = c("auto", "manual"))

# Basic usage
cross_tab(mtcars, cyl, gear)
#> Crosstable: cyl x gear (%)
#> ─────────────────────────────────────────
#>  Values           3     4     5 Row_Total
#> ─────────────────────────────────────────
#>  4              6.7  66.7  40.0      34.4
#>  6             13.3  33.3  20.0      21.9
#>  8             80.0   0.0  40.0      43.8
#>  Column_Total 100.0 100.0 100.0     100.0
#>  N             15.0  12.0   5.0      32.0
#> ─────────────────────────────────────────
#> Chi-2 = 18 (df = 4), p = 0.00121, Cramer's V = 0.53

# Using extracted variables
cross_tab(mtcars$cyl, mtcars$gear)
#> Crosstable: cyl x gear (%)
#> ─────────────────────────────────────────
#>  Values           3     4     5 Row_Total
#> ─────────────────────────────────────────
#>  4              6.7  66.7  40.0      34.4
#>  6             13.3  33.3  20.0      21.9
#>  8             80.0   0.0  40.0      43.8
#>  Column_Total 100.0 100.0 100.0     100.0
#>  N             15.0  12.0   5.0      32.0
#> ─────────────────────────────────────────
#> Chi-2 = 18 (df = 4), p = 0.00121, Cramer's V = 0.53

# Pipe-friendly syntax
mtcars |> cross_tab(cyl, gear, by = am)
#> $auto
#> Crosstable: cyl x gear | am = auto (%)
#> ───────────────────────────────────
#>  Values           3     4 Row_Total
#> ───────────────────────────────────
#>  4              6.7  50.0      15.8
#>  6             13.3  50.0      21.1
#>  8             80.0   0.0      63.2
#>  Column_Total 100.0 100.0     100.0
#>  N             15.0   4.0      19.0
#> ───────────────────────────────────
#> Chi-2 = 9 (df = 2), p = 0.0113, Cramer's V = 0.69
#> 
#> $manual
#> Crosstable: cyl x gear | am = manual (%)
#> ───────────────────────────────────
#>  Values           4     5 Row_Total
#> ───────────────────────────────────
#>  4             75.0  40.0      61.5
#>  6             25.0  20.0      23.1
#>  8              0.0  40.0      15.4
#>  Column_Total 100.0 100.0     100.0
#>  N              8.0   5.0      13.0
#> ───────────────────────────────────
#> Chi-2 = 3.8 (df = 2), p = 0.146, Cramer's V = 0.54
#> 

# With row percentages
cross_tab(mtcars, cyl, gear, by = am, rowprct = TRUE)
#> $auto
#> Crosstable: cyl x gear | am = auto (%)
#> ─────────────────────────────────────
#>  Values           3    4 Row_Total  N
#> ─────────────────────────────────────
#>  4             33.3 66.7     100.0  3
#>  6             50.0 50.0     100.0  4
#>  8            100.0  0.0     100.0 12
#>  Column_Total  78.9 21.1     100.0 19
#> ─────────────────────────────────────
#> Chi-2 = 9 (df = 2), p = 0.0113, Cramer's V = 0.69
#> 
#> $manual
#> Crosstable: cyl x gear | am = manual (%)
#> ─────────────────────────────────────
#>  Values          4     5 Row_Total  N
#> ─────────────────────────────────────
#>  4            75.0  25.0     100.0  8
#>  6            66.7  33.3     100.0  3
#>  8             0.0 100.0     100.0  2
#>  Column_Total 61.5  38.5     100.0 13
#> ─────────────────────────────────────
#> Chi-2 = 3.8 (df = 2), p = 0.146, Cramer's V = 0.54
#> 

# Using weights
cross_tab(mtcars, cyl, gear, weights = mpg)
#> Crosstable: cyl x gear (%)
#> ─────────────────────────────────────────
#>  Values           3     4     5 Row_Total
#> ─────────────────────────────────────────
#>  4              8.9  73.2  52.8      45.6
#>  6             16.3  26.8  18.4      21.5
#>  8             74.8   0.0  28.8      32.9
#>  Column_Total 100.0 100.0 100.0     100.0
#>  N            241.6 294.4 106.9     642.9
#> ─────────────────────────────────────────
#> Chi-2 = 355.1 (df = 4), p < 0.001, Cramer's V = 0.53

# With rescaled weights
cross_tab(mtcars, cyl, gear, weights = mpg, rescale_weights = TRUE)
#> Crosstable: cyl x gear (%)
#> ─────────────────────────────────────────
#>  Values           3     4     5 Row_Total
#> ─────────────────────────────────────────
#>  4              8.9  73.2  52.8      45.6
#>  6             16.3  26.8  18.4      21.5
#>  8             74.8   0.0  28.8      32.9
#>  Column_Total 100.0 100.0 100.0     100.0
#>  N             12.0  14.7   5.3      32.0
#> ─────────────────────────────────────────
#> Chi-2 = 17.7 (df = 4), p = 0.00143, Cramer's V = 0.53

# Grouped by a single variable
cross_tab(mtcars, cyl, gear, by = am)
#> $auto
#> Crosstable: cyl x gear | am = auto (%)
#> ───────────────────────────────────
#>  Values           3     4 Row_Total
#> ───────────────────────────────────
#>  4              6.7  50.0      15.8
#>  6             13.3  50.0      21.1
#>  8             80.0   0.0      63.2
#>  Column_Total 100.0 100.0     100.0
#>  N             15.0   4.0      19.0
#> ───────────────────────────────────
#> Chi-2 = 9 (df = 2), p = 0.0113, Cramer's V = 0.69
#> 
#> $manual
#> Crosstable: cyl x gear | am = manual (%)
#> ───────────────────────────────────
#>  Values           4     5 Row_Total
#> ───────────────────────────────────
#>  4             75.0  40.0      61.5
#>  6             25.0  20.0      23.1
#>  8              0.0  40.0      15.4
#>  Column_Total 100.0 100.0     100.0
#>  N              8.0   5.0      13.0
#> ───────────────────────────────────
#> Chi-2 = 3.8 (df = 2), p = 0.146, Cramer's V = 0.54
#> 

# Grouped by interaction of two variables
cross_tab(mtcars, cyl, gear, by = interaction(am, vs), combine = TRUE)
#> Crosstable: cyl x gear by interaction(am, vs)
#> ─────────────────────────────────────────────────────────────
#>  Values           3     4     5 Row_Total interaction(am, vs)
#> ─────────────────────────────────────────────────────────────
#>  8            100.0  <NA>  <NA>     100.0              auto.V
#>  Column_Total 100.0  <NA>  <NA>     100.0              auto.V
#>  N             12.0  <NA>  <NA>      12.0              auto.V
#>  4             <NA>   0.0  25.0      16.7            manual.V
#>  6             <NA> 100.0  25.0      50.0            manual.V
#>  8             <NA>   0.0  50.0      33.3            manual.V
#>  Column_Total  <NA> 100.0 100.0     100.0            manual.V
#>  N             <NA>   2.0   4.0       6.0            manual.V
#>  4             33.3  50.0  <NA>      42.9              auto.S
#>  6             66.7  50.0  <NA>      57.1              auto.S
#>  Column_Total 100.0 100.0  <NA>     100.0              auto.S
#>  N              3.0   4.0  <NA>       7.0              auto.S
#>  4             <NA> 100.0 100.0     100.0            manual.S
#>  Column_Total  <NA> 100.0 100.0     100.0            manual.S
#>  N             <NA>   6.0   1.0       7.0            manual.S
#> ─────────────────────────────────────────────────────────────
#> [interaction(am, vs) = auto.V] Chi-squared test not applicable (table too small).
#> [interaction(am, vs) = manual.V] Chi-2 = 3 (df = 2), p = 0.223, Cramer's V = 0.71
#> [interaction(am, vs) = auto.S] Chi-2 = 0.2 (df = 1), p = 0.659, Cramer's V = 0.17
#> [interaction(am, vs) = manual.S] Chi-squared test not applicable (table too small).

# Combined output for grouped data
cross_tab(mtcars, cyl, gear, by = am, combine = TRUE)
#> Crosstable: cyl x gear by am
#> ────────────────────────────────────────────────
#>  Values           3     4     5 Row_Total     am
#> ────────────────────────────────────────────────
#>  4              6.7  50.0  <NA>      15.8   auto
#>  6             13.3  50.0  <NA>      21.1   auto
#>  8             80.0   0.0  <NA>      63.2   auto
#>  Column_Total 100.0 100.0  <NA>     100.0   auto
#>  N             15.0   4.0  <NA>      19.0   auto
#>  4             <NA>  75.0  40.0      61.5 manual
#>  6             <NA>  25.0  20.0      23.1 manual
#>  8             <NA>   0.0  40.0      15.4 manual
#>  Column_Total  <NA> 100.0 100.0     100.0 manual
#>  N             <NA>   8.0   5.0      13.0 manual
#> ────────────────────────────────────────────────
#> [am = auto] Chi-2 = 9 (df = 2), p = 0.0113, Cramer's V = 0.69
#> [am = manual] Chi-2 = 3.8 (df = 2), p = 0.146, Cramer's V = 0.54

# Without totals or sample size
cross_tab(mtcars, cyl, gear, row_total = FALSE, column_total = FALSE, n = FALSE)
#> Crosstable: cyl x gear (%)
#> ──────────────────────
#>  Values    3    4    5
#> ──────────────────────
#>  4       6.7 66.7 40.0
#>  6      13.3 33.3 20.0
#>  8      80.0  0.0 40.0
#> ──────────────────────
#> Chi-2 = 18 (df = 4), p = 0.00121, Cramer's V = 0.53