cross_tab()
produces a cross-tabulation of x
by y
, with optional stratification using a grouping variable (by
).
It supports weighted frequencies, row or column percentages, and association statistics (Chi-squared test, Cramer's V).
Usage
cross_tab(
d = parent.frame(),
x,
y = NULL,
by = NULL,
weights = NULL,
rescale_weights = FALSE,
digits = 1,
rowprct = FALSE,
row_total = TRUE,
column_total = TRUE,
n = TRUE,
drop = TRUE,
include_stats = TRUE,
combine = FALSE,
...
)
Arguments
- d
A
data.frame
, or a vector (when using vector input). Must contain all variables used inx
,y
,by
, andweights
.- x
Variable for table rows. Can be unquoted (tidy) or quoted (standard). Must match column name if
d
is a data frame.- y
Optional variable for table columns. Same rules as
x
. IfNULL
, computes a one-way frequency table.- by
Optional grouping variable (or interaction of variables). Used to produce stratified crosstabs. Must refer to columns in
d
, or be a vector of the same length asx
.- weights
Optional numeric vector of weights. Must match length of
x
.- rescale_weights
Logical. If
TRUE
, rescales weights so that total weighted count matches unweighted count.- digits
Integer. Number of decimal places shown in percentages. Default is
1
.- rowprct
Logical. If
TRUE
, computes percentages by row; otherwise by column.- row_total
Logical. If
TRUE
, adds row totals (defaultTRUE
).- column_total
Logical. If
TRUE
, adds column totals (defaultTRUE
).- n
Logical. If
TRUE
, displays effective countsN
as an extra row or column (defaultTRUE
).- drop
Logical. If
TRUE
, drops empty rows or columns (defaultTRUE
).- include_stats
Logical. If
TRUE
, includes Chi-squared test and Cramer's V when possible (defaultTRUE
).- combine
Logical. If
TRUE
, combines all stratified tables into one tibble with aby
column.- ...
Additional arguments passed to
print.spicy()
, such asshow_all = TRUE
Details
The function is flexible:
Accepts both standard (quoted) and tidy (unquoted) variable input
Performs stratified tabulations using a grouping variable (
by
)Optionally combines group-level tables into a single tibble with
combine = TRUE
Pipe-friendly with both base R (
|>
) and magrittr (%>%
)
All variables (x
, y
, by
, weights
) must be present in the data frame d
(unless vector input is used).
Warnings and Errors
If
weights
is non-numeric, an error is thrown.If
weights
does not match the number of observations, an error is thrown.If
rescale_weights = TRUE
but no weights are provided, a warning is issued.If all values in
by
areNA
, an error is thrown.If
by
has only one unique level (or allNA
), a warning is issued.
Examples
data(mtcars)
mtcars$gear <- factor(mtcars$gear)
mtcars$cyl <- factor(mtcars$cyl)
mtcars$vs <- factor(mtcars$vs, labels = c("V", "S"))
mtcars$am <- factor(mtcars$am, labels = c("auto", "manual"))
# Basic usage
cross_tab(mtcars, cyl, gear)
#> Crosstable: cyl x gear (%)
#> ─────────────────────────────────────────
#> Values 3 4 5 Row_Total
#> ─────────────────────────────────────────
#> 4 6.7 66.7 40.0 34.4
#> 6 13.3 33.3 20.0 21.9
#> 8 80.0 0.0 40.0 43.8
#> Column_Total 100.0 100.0 100.0 100.0
#> N 15.0 12.0 5.0 32.0
#> ─────────────────────────────────────────
#> Chi-2 = 18 (df = 4), p = 0.00121, Cramer's V = 0.53
# Using extracted variables
cross_tab(mtcars$cyl, mtcars$gear)
#> Crosstable: cyl x gear (%)
#> ─────────────────────────────────────────
#> Values 3 4 5 Row_Total
#> ─────────────────────────────────────────
#> 4 6.7 66.7 40.0 34.4
#> 6 13.3 33.3 20.0 21.9
#> 8 80.0 0.0 40.0 43.8
#> Column_Total 100.0 100.0 100.0 100.0
#> N 15.0 12.0 5.0 32.0
#> ─────────────────────────────────────────
#> Chi-2 = 18 (df = 4), p = 0.00121, Cramer's V = 0.53
# Pipe-friendly syntax
mtcars |> cross_tab(cyl, gear, by = am)
#> $auto
#> Crosstable: cyl x gear | am = auto (%)
#> ───────────────────────────────────
#> Values 3 4 Row_Total
#> ───────────────────────────────────
#> 4 6.7 50.0 15.8
#> 6 13.3 50.0 21.1
#> 8 80.0 0.0 63.2
#> Column_Total 100.0 100.0 100.0
#> N 15.0 4.0 19.0
#> ───────────────────────────────────
#> Chi-2 = 9 (df = 2), p = 0.0113, Cramer's V = 0.69
#>
#> $manual
#> Crosstable: cyl x gear | am = manual (%)
#> ───────────────────────────────────
#> Values 4 5 Row_Total
#> ───────────────────────────────────
#> 4 75.0 40.0 61.5
#> 6 25.0 20.0 23.1
#> 8 0.0 40.0 15.4
#> Column_Total 100.0 100.0 100.0
#> N 8.0 5.0 13.0
#> ───────────────────────────────────
#> Chi-2 = 3.8 (df = 2), p = 0.146, Cramer's V = 0.54
#>
# With row percentages
cross_tab(mtcars, cyl, gear, by = am, rowprct = TRUE)
#> $auto
#> Crosstable: cyl x gear | am = auto (%)
#> ─────────────────────────────────────
#> Values 3 4 Row_Total N
#> ─────────────────────────────────────
#> 4 33.3 66.7 100.0 3
#> 6 50.0 50.0 100.0 4
#> 8 100.0 0.0 100.0 12
#> Column_Total 78.9 21.1 100.0 19
#> ─────────────────────────────────────
#> Chi-2 = 9 (df = 2), p = 0.0113, Cramer's V = 0.69
#>
#> $manual
#> Crosstable: cyl x gear | am = manual (%)
#> ─────────────────────────────────────
#> Values 4 5 Row_Total N
#> ─────────────────────────────────────
#> 4 75.0 25.0 100.0 8
#> 6 66.7 33.3 100.0 3
#> 8 0.0 100.0 100.0 2
#> Column_Total 61.5 38.5 100.0 13
#> ─────────────────────────────────────
#> Chi-2 = 3.8 (df = 2), p = 0.146, Cramer's V = 0.54
#>
# Using weights
cross_tab(mtcars, cyl, gear, weights = mpg)
#> Crosstable: cyl x gear (%)
#> ─────────────────────────────────────────
#> Values 3 4 5 Row_Total
#> ─────────────────────────────────────────
#> 4 8.9 73.2 52.8 45.6
#> 6 16.3 26.8 18.4 21.5
#> 8 74.8 0.0 28.8 32.9
#> Column_Total 100.0 100.0 100.0 100.0
#> N 241.6 294.4 106.9 642.9
#> ─────────────────────────────────────────
#> Chi-2 = 355.1 (df = 4), p < 0.001, Cramer's V = 0.53
# With rescaled weights
cross_tab(mtcars, cyl, gear, weights = mpg, rescale_weights = TRUE)
#> Crosstable: cyl x gear (%)
#> ─────────────────────────────────────────
#> Values 3 4 5 Row_Total
#> ─────────────────────────────────────────
#> 4 8.9 73.2 52.8 45.6
#> 6 16.3 26.8 18.4 21.5
#> 8 74.8 0.0 28.8 32.9
#> Column_Total 100.0 100.0 100.0 100.0
#> N 12.0 14.7 5.3 32.0
#> ─────────────────────────────────────────
#> Chi-2 = 17.7 (df = 4), p = 0.00143, Cramer's V = 0.53
# Grouped by a single variable
cross_tab(mtcars, cyl, gear, by = am)
#> $auto
#> Crosstable: cyl x gear | am = auto (%)
#> ───────────────────────────────────
#> Values 3 4 Row_Total
#> ───────────────────────────────────
#> 4 6.7 50.0 15.8
#> 6 13.3 50.0 21.1
#> 8 80.0 0.0 63.2
#> Column_Total 100.0 100.0 100.0
#> N 15.0 4.0 19.0
#> ───────────────────────────────────
#> Chi-2 = 9 (df = 2), p = 0.0113, Cramer's V = 0.69
#>
#> $manual
#> Crosstable: cyl x gear | am = manual (%)
#> ───────────────────────────────────
#> Values 4 5 Row_Total
#> ───────────────────────────────────
#> 4 75.0 40.0 61.5
#> 6 25.0 20.0 23.1
#> 8 0.0 40.0 15.4
#> Column_Total 100.0 100.0 100.0
#> N 8.0 5.0 13.0
#> ───────────────────────────────────
#> Chi-2 = 3.8 (df = 2), p = 0.146, Cramer's V = 0.54
#>
# Grouped by interaction of two variables
cross_tab(mtcars, cyl, gear, by = interaction(am, vs), combine = TRUE)
#> Crosstable: cyl x gear by interaction(am, vs)
#> ─────────────────────────────────────────────────────────────
#> Values 3 4 5 Row_Total interaction(am, vs)
#> ─────────────────────────────────────────────────────────────
#> 8 100.0 <NA> <NA> 100.0 auto.V
#> Column_Total 100.0 <NA> <NA> 100.0 auto.V
#> N 12.0 <NA> <NA> 12.0 auto.V
#> 4 <NA> 0.0 25.0 16.7 manual.V
#> 6 <NA> 100.0 25.0 50.0 manual.V
#> 8 <NA> 0.0 50.0 33.3 manual.V
#> Column_Total <NA> 100.0 100.0 100.0 manual.V
#> N <NA> 2.0 4.0 6.0 manual.V
#> 4 33.3 50.0 <NA> 42.9 auto.S
#> 6 66.7 50.0 <NA> 57.1 auto.S
#> Column_Total 100.0 100.0 <NA> 100.0 auto.S
#> N 3.0 4.0 <NA> 7.0 auto.S
#> 4 <NA> 100.0 100.0 100.0 manual.S
#> Column_Total <NA> 100.0 100.0 100.0 manual.S
#> N <NA> 6.0 1.0 7.0 manual.S
#> ─────────────────────────────────────────────────────────────
#> [interaction(am, vs) = auto.V] Chi-squared test not applicable (table too small).
#> [interaction(am, vs) = manual.V] Chi-2 = 3 (df = 2), p = 0.223, Cramer's V = 0.71
#> [interaction(am, vs) = auto.S] Chi-2 = 0.2 (df = 1), p = 0.659, Cramer's V = 0.17
#> [interaction(am, vs) = manual.S] Chi-squared test not applicable (table too small).
# Combined output for grouped data
cross_tab(mtcars, cyl, gear, by = am, combine = TRUE)
#> Crosstable: cyl x gear by am
#> ────────────────────────────────────────────────
#> Values 3 4 5 Row_Total am
#> ────────────────────────────────────────────────
#> 4 6.7 50.0 <NA> 15.8 auto
#> 6 13.3 50.0 <NA> 21.1 auto
#> 8 80.0 0.0 <NA> 63.2 auto
#> Column_Total 100.0 100.0 <NA> 100.0 auto
#> N 15.0 4.0 <NA> 19.0 auto
#> 4 <NA> 75.0 40.0 61.5 manual
#> 6 <NA> 25.0 20.0 23.1 manual
#> 8 <NA> 0.0 40.0 15.4 manual
#> Column_Total <NA> 100.0 100.0 100.0 manual
#> N <NA> 8.0 5.0 13.0 manual
#> ────────────────────────────────────────────────
#> [am = auto] Chi-2 = 9 (df = 2), p = 0.0113, Cramer's V = 0.69
#> [am = manual] Chi-2 = 3.8 (df = 2), p = 0.146, Cramer's V = 0.54
# Without totals or sample size
cross_tab(mtcars, cyl, gear, row_total = FALSE, column_total = FALSE, n = FALSE)
#> Crosstable: cyl x gear (%)
#> ──────────────────────
#> Values 3 4 5
#> ──────────────────────
#> 4 6.7 66.7 40.0
#> 6 13.3 33.3 20.0
#> 8 80.0 0.0 40.0
#> ──────────────────────
#> Chi-2 = 18 (df = 4), p = 0.00121, Cramer's V = 0.53