Categorical summary tables in R • spicy

Basic usage

For grouped tables, provide a data frame, one or more selected variables, and a grouping variable:

table_categorical(
  sochealth,
  select = c(smoking, physical_activity, dentist_12m),
  by = education
)
#> Categorical table by education
#> 
#>  Variable          │ Lower secondary n  Lower secondary %  Upper secondary n 
#> ───────────────────┼─────────────────────────────────────────────────────────
#>  smoking           │                                                         
#>    No              │        179               69.6                415        
#>    Yes             │         78               30.4                112        
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  physical_activity │                                                         
#>    No              │        177               67.8                310        
#>    Yes             │         84               32.2                229        
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  dentist_12m       │                                                         
#>    No              │        113               43.3                174        
#>    Yes             │        148               56.7                365        
#> 
#>  Variable          │ Upper secondary %  Tertiary n  Tertiary %  Total n 
#> ───────────────────┼────────────────────────────────────────────────────
#>  smoking           │                                                    
#>    No              │       78.7            332         84.9       926   
#>    Yes             │       21.3             59         15.1       249   
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  physical_activity │                                                    
#>    No              │       57.5            163         40.8       650   
#>    Yes             │       42.5            237         59.2       550   
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  dentist_12m       │                                                    
#>    No              │       32.3             67         16.8       354   
#>    Yes             │       67.7            333         83.2       846   
#> 
#>  Variable          │ Total %    p    Cramer's V 
#> ───────────────────┼────────────────────────────
#>  smoking           │          <.001     .14     
#>    No              │  78.8                      
#>    Yes             │  21.2                      
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  physical_activity │          <.001     .21     
#>    No              │  54.2                      
#>    Yes             │  45.8                      
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  dentist_12m       │          <.001     .22     
#>    No              │  29.5                      
#>    Yes             │  70.5

The default output is "default", which prints a styled ASCII table to the console. Use output = "data.frame" to get a plain numeric data frame suitable for further processing.

One-way tables

Omit by to build a frequency-style table for the selected variables:

table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  output = "default"
)
#> Categorical table
#> 
#>  Variable            │   n      %    
#> ─────────────────────┼───────────────
#>  smoking             │               
#>    No                │  926    78.8  
#>    Yes               │  249    21.2  
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  physical_activity   │               
#>    No                │  650    54.2  
#>    Yes               │  550    45.8

Output formats

table_categorical() supports several output formats. The table below summarizes the options:

Format	Description
`"default"`	Styled ASCII table in the console (default)
`"data.frame"`	Wide data frame, one row per modality
`"long"`	Long data frame, one row per modality x group
`"gt"`	Formatted gt table
`"tinytable"`	Formatted tinytable
`"flextable"`	Formatted flextable
`"excel"`	Excel file (requires `excel_path`)
`"clipboard"`	Copy to clipboard
`"word"`	Word document (requires `word_path`)

gt output

The "gt" format produces a table with APA-style borders, column spanners, and proper alignment:

pkgdown_dark_gt(
  table_categorical(
    sochealth,
    select = c(smoking, physical_activity, dentist_12m),
    by = education,
    output = "gt"
  )
)

Variable	Lower secondary		Upper secondary		Tertiary		Total		p	Cramer's V
	n	%	n	%	n	%	n	%
smoking									<.001	.14
No	179	69.6	415	78.7	332	84.9	926	78.8
Yes	78	30.4	112	21.3	59	15.1	249	21.2
physical_activity									<.001	.21
No	177	67.8	310	57.5	163	40.8	650	54.2
Yes	84	32.2	229	42.5	237	59.2	550	45.8
dentist_12m									<.001	.22
No	113	43.3	174	32.3	67	16.8	354	29.5
Yes	148	56.7	365	67.7	333	83.2	846	70.5

tinytable output

table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  by = sex,
  output = "tinytable"
)

Variable	Female		Male		Total		p	Phi
	n	%	n	%	n	%
smoking							.713	.01
No	475	78.4	451	79.3	926	78.8
Yes	131	21.6	118	20.7	249	21.2
physical_activity							.832	.01
No	334	53.9	316	54.5	650	54.2
Yes	286	46.1	264	45.5	550	45.8

Data frame output

Use output = "data.frame" for a wide numeric data frame (one row per modality), or output = "long" for a long format (one row per modality x group):

table_categorical(
  sochealth,
  select = smoking,
  by = education,
  output = "data.frame"
)
#>   Variable Level Lower secondary n Lower secondary % Upper secondary n
#> 1  smoking    No               179              69.6               415
#> 2  smoking   Yes                78              30.4               112
#>   Upper secondary % Tertiary n Tertiary % Total n Total %            p
#> 1              78.7        332       84.9     926    78.8 2.012877e-05
#> 2              21.3         59       15.1     249    21.2 2.012877e-05
#>   Cramer's V
#> 1  0.1356677
#> 2  0.1356677

Custom labels

By default, table_categorical() uses variable names as row headers. Use the labels argument to provide human-readable labels. Two forms are accepted (matching table_continuous() and table_continuous_lm()):

A named character vector keyed by column name in data – the recommended form. Only listed columns are relabelled; others fall back to the column name.
A positional character vector of the same length as select – the legacy spicy < 0.11.0 form, kept for backward compatibility.

pkgdown_dark_gt(
  table_categorical(
    sochealth,
    select = c(smoking, physical_activity),
    by = education,
    labels = c(
      smoking           = "Smoking status",
      physical_activity = "Regular physical activity"
    ),
    output = "gt"
  )
)

Variable	Lower secondary		Upper secondary		Tertiary		Total		p	Cramer's V
	n	%	n	%	n	%	n	%
Smoking status									<.001	.14
No	179	69.6	415	78.7	332	84.9	926	78.8
Yes	78	30.4	112	21.3	59	15.1	249	21.2
Regular physical activity									<.001	.21
No	177	67.8	310	57.5	163	40.8	650	54.2
Yes	84	32.2	229	42.5	237	59.2	550	45.8

Association measures and confidence intervals

table_categorical() picks the association measure per row variable based on the variable type (assoc_measure = "auto", the default):

2x2 (binary row variable vs. binary by) -> phi,
both ordered factors -> Kendall’s tau_b,
otherwise -> Cramer’s V.

When the chosen measures differ across rows, the column header collapses to "Effect size" and an APA-style Note. line documents which measure was used for each variable.

Override with a single string for uniform application, or with a named vector to mix measures per row:

# Uniform: same measure for every row variable
table_categorical(
  sochealth,
  select = smoking,
  by = education,
  assoc_measure = "lambda",
  output = "tinytable"
)

Variable	Lower secondary		Upper secondary		Tertiary		Total		p	Lambda
	n	%	n	%	n	%	n	%
smoking									<.001	.00
No	179	69.6	415	78.7	332	84.9	926	78.8
Yes	78	30.4	112	21.3	59	15.1	249	21.2

# Per-row: pick the right measure for each variable.
# `smoking` x `education` is 2x3 (binary x ordered) -> Cramer's V;
# `self_rated_health` x `education` is ordered x ordered -> Tau-b.
# The mixed result collapses the header to "Effect size" and adds an
# APA `Note.` line documenting the per-row measure.
table_categorical(
  sochealth,
  select = c(smoking, self_rated_health),
  by = education,
  assoc_measure = c(
    smoking           = "cramer_v",
    self_rated_health = "tau_b"
  ),
  output = "tinytable"
)

Variable	Lower secondary		Upper secondary		Tertiary		Total		p	Effect size
	n	%	n	%	n	%	n	%
smoking									<.001	.14
No	179	69.6	415	78.7	332	84.9	926	78.8
Yes	78	30.4	112	21.3	59	15.1	249	21.2
self_rated_health									<.001	.20
Poor	28	10.8	28	5.3	5	1.3	61	5.2
Fair	86	33.1	118	22.4	62	15.8	266	22.5
Good	102	39.2	263	49.9	193	49.1	558	47.3
Very good	44	16.9	118	22.4	133	33.8	295	25.0

Add confidence intervals with assoc_ci = TRUE. In rendered formats (gt, tinytable, flextable, word), the CI is shown inline:

pkgdown_dark_gt(
  table_categorical(
    sochealth,
    select = c(smoking, physical_activity),
    by = education,
    assoc_ci = TRUE,
    output = "gt"
  )
)

Variable	Lower secondary		Upper secondary		Tertiary		Total		p	Cramer's V
	n	%	n	%	n	%	n	%
smoking									<.001	.14 [.08, .19]
No	179	69.6	415	78.7	332	84.9	926	78.8
Yes	78	30.4	112	21.3	59	15.1	249	21.2
physical_activity									<.001	.21 [.15, .26]
No	177	67.8	310	57.5	163	40.8	650	54.2
Yes	84	32.2	229	42.5	237	59.2	550	45.8

In data formats ("data.frame", "long", "excel", "clipboard"), separate CI lower and CI upper columns are added:

table_categorical(
  sochealth,
  select = smoking,
  by = education,
  assoc_ci = TRUE,
  output = "data.frame"
)
#>   Variable Level Lower secondary n Lower secondary % Upper secondary n
#> 1  smoking    No               179              69.6               415
#> 2  smoking   Yes                78              30.4               112
#>   Upper secondary % Tertiary n Tertiary % Total n Total %            p
#> 1              78.7        332       84.9     926    78.8 2.012877e-05
#> 2              21.3         59       15.1     249    21.2 2.012877e-05
#>   Cramer's V   CI lower  CI upper
#> 1  0.1356677 0.07909264 0.1913716
#> 2  0.1356677 0.07909264 0.1913716

Weighted tables

Pass survey weights with the weights argument. Use rescale = TRUE so the total weighted N matches the unweighted N:

pkgdown_dark_gt(
  table_categorical(
    sochealth,
    select = c(smoking, physical_activity),
    by = education,
    weights = "weight",
    rescale = TRUE,
    output = "gt"
  )
)

Variable	Lower secondary		Upper secondary		Tertiary		Total		p	Cramer's V
	n	%	n	%	n	%	n	%
smoking									<.001	.13
No	176	69.0	419	78.5	325	84.4	920.9	78.4
Yes	79	31.0	115	21.5	60	15.6	254.1	21.6
physical_activity									<.001	.19
No	174	67.2	315	57.7	166	41.9	654.8	54.6
Yes	85	32.8	231	42.3	229	58.1	545.2	45.4

Handling missing values

By default, rows with missing values are dropped (drop_na = TRUE). Set drop_na = FALSE to display them as a “(Missing)” category:

pkgdown_dark_gt(
  table_categorical(
    sochealth,
    select = income_group,
    by = education,
    drop_na = FALSE,
    output = "gt"
  )
)

Variable	Lower secondary		Upper secondary		Tertiary		Total		p	Kendall's Tau-b
	n	%	n	%	n	%	n	%
income_group									<.001	-0.01
Low	87	33.3	115	21.3	45	11.2	247	20.6
Lower middle	92	35.2	186	34.5	110	27.5	388	32.3
Upper middle	58	22.2	135	25.0	135	33.8	328	27.3
High	21	8.0	94	17.4	104	26.0	219	18.2
(Missing)	3	1.1	9	1.7	6	1.5	18	1.5

Filtering and reordering levels

Use levels_keep to display only specific modalities. The order you specify controls the display order, which is useful for placing “(Missing)” first to highlight missingness:

pkgdown_dark_gt(
  table_categorical(
    sochealth,
    select = income_group,
    by = education,
    drop_na = FALSE,
    levels_keep = c("(Missing)", "Low", "High"),
    output = "gt"
  )
)

Variable	Lower secondary		Upper secondary		Tertiary		Total		p	Kendall's Tau-b
	n	%	n	%	n	%	n	%
income_group									<.001	-0.01
(Missing)	3	1.1	9	1.7	6	1.5	18	1.5
Low	87	33.3	115	21.3	45	11.2	247	20.6
High	21	8.0	94	17.4	104	26.0	219	18.2

Formatting options

Control the number of digits for percentages, p-values, and the association measure:

pkgdown_dark_gt(
  table_categorical(
    sochealth,
    select = smoking,
    by = education,
    percent_digits = 2,
    p_digits = 4,
    v_digits = 3,
    output = "gt"
  )
)

Variable	Lower secondary		Upper secondary		Tertiary		Total		p	Cramer's V
	n	%	n	%	n	%	n	%
smoking									<.0001	.136
No	179	69.60	415	78.70	332	84.90	926	78.80
Yes	78	30.40	112	21.30	59	15.10	249	21.20

p_digits drives both the displayed precision of the p column and the small-p threshold (p_digits = 3 -> <.001, p_digits = 4 -> <.0001), matching table_continuous() and table_continuous_lm().

Decimal alignment

By default (align = "decimal") numeric columns are aligned on the decimal mark, the standard scientific-publication convention used by SPSS, SAS, LaTeX siunitx, and the native primitives of gt::cols_align_decimal() and tinytable::style_tt(align = "d"). Engines without a native primitive (flextable, word, clipboard, ASCII print) get the alignment via leading / trailing space padding, with flextable / word switching the body font to Consolas so character widths match.

Pass align = "auto" to revert to the legacy uniform right-alignment used in spicy < 0.11.0:

table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  by = sex,
  align = "auto"
)
#> Categorical table by sex
#> 
#>  Variable          │ Female n  Female %  Male n  Male %  Total n  Total %     p 
#> ───────────────────┼────────────────────────────────────────────────────────────
#>  smoking           │                                                       .713 
#>    No              │      475      78.4     451    79.3      926     78.8       
#>    Yes             │      131      21.6     118    20.7      249     21.2       
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  physical_activity │                                                       .832 
#>    No              │      334      53.9     316    54.5      650     54.2       
#>    Yes             │      286      46.1     264    45.5      550     45.8       
#> 
#>  Variable          │ Phi 
#> ───────────────────┼─────
#>  smoking           │ .01 
#>    No              │     
#>    Yes             │     
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌
#>  physical_activity │ .01 
#>    No              │     
#>    Yes             │

"center" and "right" apply literal alignment.

Tidying for downstream pipelines

table_categorical() returns an object that can be coerced to a plain data.frame / tbl_df (stripping the spicy formatting attributes) or piped into broom::tidy() / broom::glance() for use with gtsummary, modelsummary, parameters, or any other tidyverse-stats workflow:

out <- table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  by = sex
)
#> Categorical table by sex
#> 
#>  Variable          │ Female n  Female %  Male n  Male %  Total n  Total %   p   
#> ───────────────────┼────────────────────────────────────────────────────────────
#>  smoking           │                                                       .713 
#>    No              │   475       78.4     451     79.3     926     78.8         
#>    Yes             │   131       21.6     118     20.7     249     21.2         
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  physical_activity │                                                       .832 
#>    No              │   334       53.9     316     54.5     650     54.2         
#>    Yes             │   286       46.1     264     45.5     550     45.8         
#> 
#>  Variable          │ Phi 
#> ───────────────────┼─────
#>  smoking           │ .01 
#>    No              │     
#>    Yes             │     
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌
#>  physical_activity │ .01 
#>    No              │     
#>    Yes             │

# One row per (variable x level x group) with broom-style columns
# (outcome, level, group, n, proportion). The synthetic Total
# margin is excluded so each observation is counted once.
broom::tidy(out)
#> # A tibble: 8 × 5
#>   outcome           level group      n proportion
#>   <chr>             <chr> <chr>  <int>      <dbl>
#> 1 smoking           No    Female   475      0.784
#> 2 smoking           No    Male     451      0.793
#> 3 smoking           Yes   Female   131      0.216
#> 4 smoking           Yes   Male     118      0.207
#> 5 physical_activity No    Female   334      0.539
#> 6 physical_activity No    Male     316      0.545
#> 7 physical_activity Yes   Female   286      0.461
#> 8 physical_activity Yes   Male     264      0.455

# One row per outcome with the omnibus chi-squared test and the
# chosen association measure (test_type, statistic, df, p.value,
# assoc_type, assoc_value, assoc_ci_lower / assoc_ci_upper, n_total).
broom::glance(out)
#> # A tibble: 2 × 10
#>   outcome           test_type   statistic    df p.value assoc_type assoc_value
#>   <chr>             <chr>           <dbl> <int>   <dbl> <chr>            <dbl>
#> 1 physical_activity chi_squared    0.0452     1   0.832 Phi            0.00614
#> 2 smoking           chi_squared    0.136      1   0.713 Phi            0.0107 
#> # ℹ 3 more variables: assoc_ci_lower <dbl>, assoc_ci_upper <dbl>, n_total <int>

Exporting to Excel, Word, or clipboard

For Excel export, provide a file path:

table_categorical(
  sochealth,
  select = c(smoking, physical_activity, dentist_12m),
  by = education,
  output = "excel",
  excel_path = "my_table.xlsx"
)

For Word, use output = "word":

table_categorical(
  sochealth,
  select = c(smoking, physical_activity, dentist_12m),
  by = education,
  output = "word",
  word_path = "my_table.docx"
)

You can also copy directly to the clipboard for pasting into a spreadsheet or a text editor:

table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  by = education,
  output = "clipboard"
)