Skip to contents

table_categorical() builds publication-ready categorical tables suitable for APA-style reporting in social science and data science research. With by, it produces grouped cross-tabulation tables with chi-squared \(p\)-values, effect sizes, confidence intervals, and multi-level headers. Without by, it produces one-way frequency-style tables for the selected variables. Export to gt, tinytable, flextable, Excel, or Word. This vignette walks through the main features.

Basic usage

For grouped tables, provide a data frame, one or more selected variables, and a grouping variable:

table_categorical(
  sochealth,
  select = c(smoking, physical_activity, dentist_12m),
  by = education
)
#> Categorical table by education
#> 
#>  Variable           Lower secondary n  Lower secondary %  Upper secondary n 
#> ───────────────────┼─────────────────────────────────────────────────────────
#>  smoking                                                                    
#>    No                             179               69.6                415 
#>    Yes                             78               30.4                112 
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  physical_activity                                                          
#>    No                             177               67.8                310 
#>    Yes                             84               32.2                229 
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  dentist_12m                                                                
#>    No                             113               43.3                174 
#>    Yes                            148               56.7                365 
#> 
#>  Variable           Upper secondary %  Tertiary n  Tertiary %  Total n 
#> ───────────────────┼────────────────────────────────────────────────────
#>  smoking                                                               
#>    No                            78.7         332        84.9      926 
#>    Yes                           21.3          59        15.1      249 
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  physical_activity                                                     
#>    No                            57.5         163        40.8      650 
#>    Yes                           42.5         237        59.2      550 
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  dentist_12m                                                           
#>    No                            32.3          67        16.8      354 
#>    Yes                           67.7         333        83.2      846 
#> 
#>  Variable           Total %       p  Cramer's V 
#> ───────────────────┼─────────────────────────────
#>  smoking                     < .001         .14 
#>    No                  78.8                     
#>    Yes                 21.2                     
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  physical_activity           < .001         .21 
#>    No                  54.2                     
#>    Yes                 45.8                     
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  dentist_12m                 < .001         .22 
#>    No                  29.5                     
#>    Yes                 70.5

The default output is "default", which prints a styled ASCII table to the console. Use output = "data.frame" to get a plain numeric data frame suitable for further processing.

One-way tables

Omit by to build a frequency-style table for the selected variables:

table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  output = "default"
)
#> Categorical table
#> 
#>  Variable                       n          % 
#> ────────────────────────┼─────────────────────
#>  smoking                                     
#>    No                         926       78.8 
#>    Yes                        249       21.2 
#> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
#>  physical_activity                           
#>    No                         650       54.2 
#>    Yes                        550       45.8

Output formats

table_categorical() supports several output formats. The table below summarizes the options:

Format Description
"default" Styled ASCII table in the console (default)
"data.frame" Wide data frame, one row per modality
"long" Long data frame, one row per modality x group
"gt" Formatted gt table
"tinytable" Formatted tinytable
"flextable" Formatted flextable
"excel" Excel file (requires excel_path)
"clipboard" Copy to clipboard
"word" Word document (requires word_path)

gt output

The "gt" format produces a table with APA-style borders, column spanners, and proper alignment:

pkgdown_dark_gt(
  table_categorical(
    sochealth,
    select = c(smoking, physical_activity, dentist_12m),
    by = education,
    output = "gt"
  )
)
Variable
Lower secondary
Upper secondary
Tertiary
Total
p
Cramer's V
n % n % n % n %
smoking < .001 .14
    No 179 69.6 415 78.7 332 84.9 926 78.8
    Yes 78 30.4 112 21.3 59 15.1 249 21.2
physical_activity < .001 .21
    No 177 67.8 310 57.5 163 40.8 650 54.2
    Yes 84 32.2 229 42.5 237 59.2 550 45.8
dentist_12m < .001 .22
    No 113 43.3 174 32.3 67 16.8 354 29.5
    Yes 148 56.7 365 67.7 333 83.2 846 70.5

tinytable output

table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  by = sex,
  output = "tinytable"
)
Variable Female Male Total p Cramer's V
n % n % n %
smoking .713 .01
    No 475 78.4 451 79.3 926 78.8
    Yes 131 21.6 118 20.7 249 21.2
physical_activity .832 .01
    No 334 53.9 316 54.5 650 54.2
    Yes 286 46.1 264 45.5 550 45.8

Data frame output

Use output = "data.frame" for a wide numeric data frame (one row per modality), or output = "long" for a long format (one row per modality x group):

table_categorical(
  sochealth,
  select = smoking,
  by = education,
  output = "data.frame"
)
#>   Variable Level Lower secondary n Lower secondary % Upper secondary n
#> 1  smoking    No               179              69.6               415
#> 2  smoking   Yes                78              30.4               112
#>   Upper secondary % Tertiary n Tertiary % Total n Total %            p
#> 1              78.7        332       84.9     926    78.8 2.012877e-05
#> 2              21.3         59       15.1     249    21.2 2.012877e-05
#>   Cramer's V
#> 1  0.1356677
#> 2  0.1356677

Custom labels

By default, table_categorical() uses variable names as row headers. Use the labels argument to provide human-readable labels:

pkgdown_dark_gt(
  table_categorical(
    sochealth,
    select = c(smoking, physical_activity),
    by = education,
    labels = c("Smoking status", "Regular physical activity"),
    output = "gt"
  )
)
Variable
Lower secondary
Upper secondary
Tertiary
Total
p
Cramer's V
n % n % n % n %
Smoking status < .001 .14
    No 179 69.6 415 78.7 332 84.9 926 78.8
    Yes 78 30.4 112 21.3 59 15.1 249 21.2
Regular physical activity < .001 .21
    No 177 67.8 310 57.5 163 40.8 650 54.2
    Yes 84 32.2 229 42.5 237 59.2 550 45.8

Association measures and confidence intervals

By default, table_categorical() reports Cramer’s V for nominal variables and automatically switches to Kendall’s Tau-b when both variables are ordered factors. Override with assoc_measure:

table_categorical(
  sochealth,
  select = smoking,
  by = education,
  assoc_measure = "lambda",
  output = "tinytable"
)
Variable Lower secondary Upper secondary Tertiary Total p Lambda
n % n % n % n %
smoking < .001 .00
    No 179 69.6 415 78.7 332 84.9 926 78.8
    Yes 78 30.4 112 21.3 59 15.1 249 21.2

Add confidence intervals with assoc_ci = TRUE. In rendered formats (gt, tinytable, flextable), the CI is shown inline:

pkgdown_dark_gt(
  table_categorical(
    sochealth,
    select = c(smoking, physical_activity),
    by = education,
    assoc_ci = TRUE,
    output = "gt"
  )
)
Variable
Lower secondary
Upper secondary
Tertiary
Total
p
Cramer's V
n % n % n % n %
smoking < .001 .14 [.08, .19]
    No 179 69.6 415 78.7 332 84.9 926 78.8
    Yes 78 30.4 112 21.3 59 15.1 249 21.2
physical_activity < .001 .21 [.15, .26]
    No 177 67.8 310 57.5 163 40.8 650 54.2
    Yes 84 32.2 229 42.5 237 59.2 550 45.8

In data formats ("data.frame", "long", "excel", "clipboard"), separate CI lower and CI upper columns are added:

table_categorical(
  sochealth,
  select = smoking,
  by = education,
  assoc_ci = TRUE,
  output = "data.frame"
)
#>   Variable Level Lower secondary n Lower secondary % Upper secondary n
#> 1  smoking    No               179              69.6               415
#> 2  smoking   Yes                78              30.4               112
#>   Upper secondary % Tertiary n Tertiary % Total n Total %            p
#> 1              78.7        332       84.9     926    78.8 2.012877e-05
#> 2              21.3         59       15.1     249    21.2 2.012877e-05
#>   Cramer's V   CI lower  CI upper
#> 1  0.1356677 0.07909264 0.1913716
#> 2  0.1356677 0.07909264 0.1913716

Weighted tables

Pass survey weights with the weights argument. Use rescale = TRUE so the total weighted N matches the unweighted N:

pkgdown_dark_gt(
  table_categorical(
    sochealth,
    select = c(smoking, physical_activity),
    by = education,
    weights = "weight",
    rescale = TRUE,
    output = "gt"
  )
)
Variable
Lower secondary
Upper secondary
Tertiary
Total
p
Cramer's V
n % n % n % n %
smoking < .001 .13
    No 176 69.0 419 78.5 325 84.4 920.9 78.4
    Yes 79 31.0 115 21.5 60 15.6 254.1 21.6
physical_activity < .001 .19
    No 174 67.2 315 57.7 166 41.9 654.8 54.6
    Yes 85 32.8 231 42.3 229 58.1 545.2 45.4

Handling missing values

By default, rows with missing values are dropped (drop_na = TRUE). Set drop_na = FALSE to display them as a “(Missing)” category:

pkgdown_dark_gt(
  table_categorical(
    sochealth,
    select = income_group,
    by = education,
    drop_na = FALSE,
    output = "gt"
  )
)
Variable
Lower secondary
Upper secondary
Tertiary
Total
p
Cramer's V
n % n % n % n %
income_group < .001 .18
    Low 87 33.3 115 21.3 45 11.2 247 20.6
    Lower middle 92 35.2 186 34.5 110 27.5 388 32.3
    Upper middle 58 22.2 135 25.0 135 33.8 328 27.3
    High 21 8.0 94 17.4 104 26.0 219 18.2
    (Missing) 3 1.1 9 1.7 6 1.5 18 1.5

Filtering and reordering levels

Use levels_keep to display only specific modalities. The order you specify controls the display order, which is useful for placing “(Missing)” first to highlight missingness:

pkgdown_dark_gt(
  table_categorical(
    sochealth,
    select = income_group,
    by = education,
    drop_na = FALSE,
    levels_keep = c("(Missing)", "Low", "High"),
    output = "gt"
  )
)
Variable
Lower secondary
Upper secondary
Tertiary
Total
p
Cramer's V
n % n % n % n %
income_group < .001 .18
    (Missing) 3 1.1 9 1.7 6 1.5 18 1.5
    Low 87 33.3 115 21.3 45 11.2 247 20.6
    High 21 8.0 94 17.4 104 26.0 219 18.2

Formatting options

Control the number of digits for percentages, p-values, and the association measure:

pkgdown_dark_gt(
  table_categorical(
    sochealth,
    select = smoking,
    by = education,
    percent_digits = 2,
    p_digits = 4,
    v_digits = 3,
    output = "gt"
  )
)
Variable
Lower secondary
Upper secondary
Tertiary
Total
p
Cramer's V
n % n % n % n %
smoking < .001 .136
    No 179 69.60 415 78.70 332 84.90 926 78.80
    Yes 78 30.40 112 21.30 59 15.10 249 21.20

Exporting to Excel, Word, or clipboard

For Excel export, provide a file path:

table_categorical(
  sochealth,
  select = c(smoking, physical_activity, dentist_12m),
  by = education,
  output = "excel",
  excel_path = "my_table.xlsx"
)

For Word, use output = "word":

table_categorical(
  sochealth,
  select = c(smoking, physical_activity, dentist_12m),
  by = education,
  output = "word",
  word_path = "my_table.docx"
)

You can also copy directly to the clipboard for pasting into a spreadsheet or a text editor:

table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  by = education,
  output = "clipboard"
)