Skip to contents

freq() and cross_tab() are the core tabulation functions in spicy. They handle factors, labelled variables (from haven or labelled), weights, and missing values out of the box. This vignette covers the main options using the bundled sochealth dataset.

Frequency tables with freq()

Basic usage

Pass a data frame and a variable name to get counts and percentages:

freq(sochealth, education)
#> Frequency table: education
#> 
#>  Category  Values           Freq.  Percent 
#> ──────────┼─────────────────────────────────
#>  Valid     Lower secondary    261     21.8 
#>            Upper secondary    539     44.9 
#>            Tertiary           400     33.3 
#> ──────────┼─────────────────────────────────
#>  Total                       1200    100.0 
#> 
#> Label: Highest education level
#> Class: ordered, factor
#> Data: sochealth

Sorting

Sort by frequency with sort = "-" (decreasing) or sort = "+" (increasing). Sort alphabetically with sort = "name+" or sort = "name-":

freq(sochealth, education, sort = "-")
#> Frequency table: education
#> 
#>  Category  Values           Freq.  Percent 
#> ──────────┼─────────────────────────────────
#>  Valid     Upper secondary    539     44.9 
#>            Tertiary           400     33.3 
#>            Lower secondary    261     21.8 
#> ──────────┼─────────────────────────────────
#>  Total                       1200    100.0 
#> 
#> Label: Highest education level
#> Class: ordered, factor
#> Data: sochealth

Sort alphabetically:

freq(sochealth, education, sort = "name+")
#> Frequency table: education
#> 
#>  Category  Values           Freq.  Percent 
#> ──────────┼─────────────────────────────────
#>  Valid     Lower secondary    261     21.8 
#>            Tertiary           400     33.3 
#>            Upper secondary    539     44.9 
#> ──────────┼─────────────────────────────────
#>  Total                       1200    100.0 
#> 
#> Label: Highest education level
#> Class: ordered, factor
#> Data: sochealth

Cumulative percentages

Add cumulative columns with cum = TRUE:

freq(sochealth, smoking, cum = TRUE)
#> Frequency table: smoking
#> 
#>  Category  Values  Freq.  Percent  Valid Percent  Cum. Percent 
#> ──────────┼─────────────────────────────────────────────────────
#>  Valid     No        926     77.2           78.8          77.2 
#>            Yes       249     20.8           21.2          97.9 
#>  Missing   NA         25      2.1                        100.0 
#> ──────────┼─────────────────────────────────────────────────────
#>  Total              1200    100.0          100.0         100.0 
#> 
#>  Category  Values  Cum. Valid Percent 
#> ──────────┼────────────────────────────
#>  Valid     No                    78.8 
#>            Yes                  100.0 
#>  Missing   NA                         
#> ──────────┼────────────────────────────
#>  Total                          100.0 
#> 
#> Label: Current smoker
#> Class: factor
#> Data: sochealth

Weighted frequencies

Supply a weight variable with weights. By default, rescale = TRUE adjusts the weighted total to match the unweighted sample size:

freq(sochealth, education, weights = weight)
#> Frequency table: education
#> 
#>  Category  Values            Freq.  Percent 
#> ──────────┼──────────────────────────────────
#>  Valid     Lower secondary  258.62     21.6 
#>            Upper secondary  546.40     45.5 
#>            Tertiary         394.99     32.9 
#> ──────────┼──────────────────────────────────
#>  Total                        1200    100.0 
#> 
#> Label: Highest education level
#> Class: ordered, factor
#> Data: sochealth
#> Weight: weight (rescaled)

Set rescale = FALSE to keep the raw weighted counts:

freq(sochealth, education, weights = weight, rescale = FALSE)
#> Frequency table: education
#> 
#>  Category  Values             Freq.  Percent 
#> ──────────┼───────────────────────────────────
#>  Valid     Lower secondary   257.86     21.6 
#>            Upper secondary   544.79     45.5 
#>            Tertiary          393.82     32.9 
#> ──────────┼───────────────────────────────────
#>  Total                      1196.47    100.0 
#> 
#> Label: Highest education level
#> Class: ordered, factor
#> Data: sochealth
#> Weight: weight

Labelled variables

When a variable has value labels (e.g., imported from SPSS or Stata with haven), freq() shows them by default with the [code] label format. Control this with labelled_levels:

# Create a labelled version of the smoking variable
sh <- sochealth
sh$smoking_lbl <- labelled::labelled(
  ifelse(sh$smoking == "Yes", 1L, 0L),
  labels = c("Non-smoker" = 0L, "Current smoker" = 1L)
)

# Default: [code] label
freq(sh, smoking_lbl)
#> Frequency table: smoking_lbl
#> 
#>  Category  Values              Freq.  Percent  Valid Percent 
#> ──────────┼───────────────────────────────────────────────────
#>  Valid     [0] Non-smoker        926     77.2           78.8 
#>            [1] Current smoker    249     20.8           21.2 
#>  Missing   NA                     25      2.1                
#> ──────────┼───────────────────────────────────────────────────
#>  Total                          1200    100.0          100.0 
#> 
#> Class: haven_labelled, vctrs_vctr, integer
#> Data: sh

# Labels only (no codes)
freq(sh, smoking_lbl, labelled_levels = "labels")
#> Frequency table: smoking_lbl
#> 
#>  Category  Values          Freq.  Percent  Valid Percent 
#> ──────────┼───────────────────────────────────────────────
#>  Valid     Non-smoker        926     77.2           78.8 
#>            Current smoker    249     20.8           21.2 
#>  Missing   NA                 25      2.1                
#> ──────────┼───────────────────────────────────────────────
#>  Total                      1200    100.0          100.0 
#> 
#> Class: haven_labelled, vctrs_vctr, integer
#> Data: sh

# Codes only (no labels)
freq(sh, smoking_lbl, labelled_levels = "values")
#> Frequency table: smoking_lbl
#> 
#>  Category  Values  Freq.  Percent  Valid Percent 
#> ──────────┼───────────────────────────────────────
#>  Valid     0         926     77.2           78.8 
#>            1         249     20.8           21.2 
#>  Missing   NA         25      2.1                
#> ──────────┼───────────────────────────────────────
#>  Total              1200    100.0          100.0 
#> 
#> Class: haven_labelled, vctrs_vctr, integer
#> Data: sh

Custom missing values

Treat specific values as missing with na_val:

freq(sochealth, income_group, na_val = "High")
#> Frequency table: income_group
#> 
#>  Category  Values        Freq.  Percent  Valid Percent 
#> ──────────┼─────────────────────────────────────────────
#>  Valid     Low             247     20.6           25.6 
#>            Lower middle    388     32.3           40.3 
#>            Upper middle    328     27.3           34.1 
#>  Missing   NA              237     19.8                
#> ──────────┼─────────────────────────────────────────────
#>  Total                    1200    100.0          100.0 
#> 
#> Label: Household income group
#> Class: ordered, factor
#> Data: sochealth

Cross-tabulations with cross_tab()

Basic two-way table

Cross two variables to get a contingency table with a chi-squared test and effect size:

cross_tab(sochealth, smoking, education)
#> Crosstable: smoking x education (N)
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                            179                   415            332 
#>  Yes                            78                   112             59 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                         257                   527            391 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                  926 
#>  Yes                 249 
#> ─────────────┼────────────
#>  Total              1175 
#> 
#> Chi-2(2) = 21.6, p < 0.001
#> Cramer's V = 0.14

Row and column percentages

Use percent = "row" or percent = "col" to display percentages instead of raw counts:

cross_tab(sochealth, smoking, education, percent = "col")
#> Crosstable: smoking x education (Column %)
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                           69.6                  78.7           84.9 
#>  Yes                          30.4                  21.3           15.1 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                       100.0                 100.0          100.0 
#>  N                             257                   527            391 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                 78.8 
#>  Yes                21.2 
#> ─────────────┼────────────
#>  Total             100.0 
#>  N                  1175 
#> 
#> Chi-2(2) = 21.6, p < 0.001
#> Cramer's V = 0.14
cross_tab(sochealth, smoking, education, percent = "row")
#> Crosstable: smoking x education (Row %)
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                           19.3                  44.8           35.9 
#>  Yes                          31.3                  45.0           23.7 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                        21.9                  44.9           33.3 
#> 
#>  Values            Total          N 
#> ─────────────┼───────────────────────
#>  No                100.0        926 
#>  Yes               100.0        249 
#> ─────────────┼───────────────────────
#>  Total             100.0       1175 
#> 
#> Chi-2(2) = 21.6, p < 0.001
#> Cramer's V = 0.14

Grouping with by

Stratify the table by a third variable:

cross_tab(sochealth, smoking, education, by = sex)
#> Crosstable: smoking x education (N) | sex = Female
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                             95                   220            160 
#>  Yes                            38                    62             31 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                         133                   282            191 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                  475 
#>  Yes                 131 
#> ─────────────┼────────────
#>  Total               606 
#> 
#> Chi-2(2) = 7.1, p = 0.029
#> Cramer's V = 0.11
#> 
#> Crosstable: smoking x education (N) | sex = Male
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                             84                   195            172 
#>  Yes                            40                    50             28 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                         124                   245            200 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                  451 
#>  Yes                 118 
#> ─────────────┼────────────
#>  Total               569 
#> 
#> Chi-2(2) = 15.6, p < 0.001
#> Cramer's V = 0.17

For more than one grouping variable, use interaction():

cross_tab(sochealth, smoking, education,
          by = interaction(sex, age_group))
#> Crosstable: smoking x education (N) | sex x age_group = Female.25-34
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                             23                    49             29 
#>  Yes                             9                     9              7 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                          32                    58             36 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                  101 
#>  Yes                  25 
#> ─────────────┼────────────
#>  Total               126 
#> 
#> Chi-2(2) = 2.1, p = 0.356
#> Cramer's V = 0.13
#> 
#> Crosstable: smoking x education (N) | sex x age_group = Male.25-34
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                              9                    42             32 
#>  Yes                            11                    11              4 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                          20                    53             36 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                   83 
#>  Yes                  26 
#> ─────────────┼────────────
#>  Total               109 
#> 
#> Chi-2(2) = 14.2, p < 0.001
#> Cramer's V = 0.36
#> 
#> Crosstable: smoking x education (N) | sex x age_group = Female.35-49
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                             24                    73             48 
#>  Yes                            10                    20              8 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                          34                    93             56 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                  145 
#>  Yes                  38 
#> ─────────────┼────────────
#>  Total               183 
#> 
#> Chi-2(2) = 3.0, p = 0.223
#> Cramer's V = 0.13
#> 
#> Crosstable: smoking x education (N) | sex x age_group = Male.35-49
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                             33                    59             60 
#>  Yes                            14                    17              7 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                          47                    76             67 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                  152 
#>  Yes                  38 
#> ─────────────┼────────────
#>  Total               190 
#> 
#> Chi-2(2) = 6.9, p = 0.032
#> Cramer's V = 0.19
#> 
#> Crosstable: smoking x education (N) | sex x age_group = Female.50-64
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                             28                    63             45 
#>  Yes                             8                    16              6 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                          36                    79             51 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                  136 
#>  Yes                  30 
#> ─────────────┼────────────
#>  Total               166 
#> 
#> Chi-2(2) = 2.0, p = 0.360
#> Cramer's V = 0.11
#> 
#> Crosstable: smoking x education (N) | sex x age_group = Male.50-64
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                             28                    58             42 
#>  Yes                             8                    13              5 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                          36                    71             47 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                  128 
#>  Yes                  26 
#> ─────────────┼────────────
#>  Total               154 
#> 
#> Chi-2(2) = 2.1, p = 0.343
#> Cramer's V = 0.12
#> 
#> Crosstable: smoking x education (N) | sex x age_group = Female.65-75
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                             20                    35             38 
#>  Yes                            11                    17             10 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                          31                    52             48 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                   93 
#>  Yes                  38 
#> ─────────────┼────────────
#>  Total               131 
#> 
#> Chi-2(2) = 2.5, p = 0.282
#> Cramer's V = 0.14
#> 
#> Crosstable: smoking x education (N) | sex x age_group = Male.65-75
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                             14                    36             38 
#>  Yes                             7                     9             12 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                          21                    45             50 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                   88 
#>  Yes                  28 
#> ─────────────┼────────────
#>  Total               116 
#> 
#> Chi-2(2) = 1.4, p = 0.499
#> Cramer's V = 0.11

Ordinal variables

When both variables are ordered factors, cross_tab() automatically switches from Cramer’s V to Kendall’s Tau-b:

cross_tab(sochealth, self_rated_health, education)
#> Crosstable: self_rated_health x education (N)
#> 
#>  Values               Lower secondary       Upper secondary       Tertiary 
#> ────────────────┼───────────────────────────────────────────────────────────
#>  Poor                              28                    28              5 
#>  Fair                              86                   118             62 
#>  Good                             102                   263            193 
#>  Very good                         44                   118            133 
#> ────────────────┼───────────────────────────────────────────────────────────
#>  Total                            260                   527            393 
#> 
#>  Values               Total 
#> ────────────────┼────────────
#>  Poor                    61 
#>  Fair                   266 
#>  Good                   558 
#>  Very good              295 
#> ────────────────┼────────────
#>  Total                 1180 
#> 
#> Chi-2(6) = 73.2, p < 0.001
#> Kendall's Tau-b = 0.20

You can override the automatic selection with assoc_measure:

cross_tab(sochealth, self_rated_health, education, assoc_measure = "gamma")
#> Crosstable: self_rated_health x education (N)
#> 
#>  Values               Lower secondary       Upper secondary       Tertiary 
#> ────────────────┼───────────────────────────────────────────────────────────
#>  Poor                              28                    28              5 
#>  Fair                              86                   118             62 
#>  Good                             102                   263            193 
#>  Very good                         44                   118            133 
#> ────────────────┼───────────────────────────────────────────────────────────
#>  Total                            260                   527            393 
#> 
#>  Values               Total 
#> ────────────────┼────────────
#>  Poor                    61 
#>  Fair                   266 
#>  Good                   558 
#>  Very good              295 
#> ────────────────┼────────────
#>  Total                 1180 
#> 
#> Chi-2(6) = 73.2, p < 0.001
#> Goodman-Kruskal Gamma = 0.31

Confidence intervals for effect sizes

Add a 95% confidence interval for the association measure with assoc_ci = TRUE:

cross_tab(sochealth, smoking, education, assoc_ci = TRUE)
#> Crosstable: smoking x education (N)
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                            179                   415            332 
#>  Yes                            78                   112             59 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                         257                   527            391 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                  926 
#>  Yes                 249 
#> ─────────────┼────────────
#>  Total              1175 
#> 
#> Chi-2(2) = 21.6, p < 0.001
#> Cramer's V = 0.14, 95% CI [0.08, 0.19]

Weighted cross-tabulations

Weights work the same as in freq(). Without rescaling, the table shows raw weighted counts:

cross_tab(sochealth, smoking, education, weights = weight)
#> Crosstable: smoking x education (N)
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                            176                   417            324 
#>  Yes                            79                   114             60 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                         255                   531            384 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                  917 
#>  Yes                 253 
#> ─────────────┼────────────
#>  Total              1170 
#> 
#> Chi-2(2) = 21.3, p < 0.001
#> Cramer's V = 0.13
#> Weight: weight

With rescale = TRUE, the weighted total matches the unweighted sample size:

cross_tab(sochealth, smoking, education, weights = weight, rescale = TRUE)
#> Crosstable: smoking x education (N)
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                            176                   419            325 
#>  Yes                            79                   115             60 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                         255                   534            385 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                  921 
#>  Yes                 254 
#> ─────────────┼────────────
#>  Total              1175 
#> 
#> Chi-2(2) = 21.4, p < 0.001
#> Cramer's V = 0.13
#> Weight: weight (rescaled)

Monte Carlo simulation

When expected cell counts are small, use simulated p-values:

cross_tab(sochealth, smoking, education,
          simulate_p = TRUE, simulate_B = 5000)
#> Crosstable: smoking x education (N)
#> 
#>  Values            Lower secondary       Upper secondary       Tertiary 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  No                            179                   415            332 
#>  Yes                            78                   112             59 
#> ─────────────┼───────────────────────────────────────────────────────────
#>  Total                         257                   527            391 
#> 
#>  Values            Total 
#> ─────────────┼────────────
#>  No                  926 
#>  Yes                 249 
#> ─────────────┼────────────
#>  Total              1175 
#> 
#> Chi-2(NA) = 21.6, p < 0.001 (simulated)
#> Cramer's V = 0.14

Data frame output

Set styled = FALSE to get a plain data frame for further processing:

cross_tab(sochealth, smoking, education,
          percent = "col", styled = FALSE)
#>   Values Lower secondary Upper secondary Tertiary
#> 1     No            69.6            78.7     84.9
#> 2    Yes            30.4            21.3     15.1

Setting global defaults

You can set package-wide defaults with options() so you don’t have to repeat arguments:

options(
  spicy.percent   = "column",
  spicy.simulate_p = TRUE,
  spicy.rescale   = TRUE
)

Learn more