Creates a frequency table for a vector or variable from a data frame, with options for weighting, sorting, handling labelled data, defining custom missing values, and displaying cumulative percentages.
When styled = TRUE, the function prints a spicy-formatted ASCII table
using print.spicy_freq_table() and spicy_print_table(); otherwise, it
returns a data.frame containing frequencies and proportions.
Arguments
- data
A
data.frame, vector, or factor. If a data frame is provided, specify the target variablex. If bothdataandxare supplied as vectors,datais ignored with a warning.- x
A variable from
data(unquoted).- weights
Optional numeric vector of weights (same length as
x). The variable may be referenced as a bare name when it belongs todata, or as a qualified expression likeother$w(evaluated in the calling environment), which always takes precedence overdatalookup. Observations withNAweights are dropped from the table with a warning; seeDetails.- digits
Number of decimal digits to display for percentages (default:
1).- valid
Logical. If
TRUE(default), display valid percentages (excluding missing values).- cum
Logical. If
FALSE(the default), cumulative percentages are omitted. IfTRUE, adds cumulative percentages.- sort
Sorting method for values:
""- no sorting (default)"+"- increasing frequency"-"- decreasing frequency"name+"- alphabetical A-Z"name-"- alphabetical Z-A
- na_val
Atomic vector of numeric or character values to be treated as missing (
NA).For labelled variables (from haven or labelled), this argument must refer to the underlying coded values, not the visible labels.
Example:
- labelled_levels
For
labelledvariables, defines how labels and values are displayed:"prefixed"or"p"- show labels as[value] label(default)"labels"or"l"- show only labels"values"or"v"- show only numeric codes
- factor_levels
Character. Controls how factor and labelled values are displayed in the frequency table.
"observed"(the default; matches Stata'stab) shows only levels present in the data."all"(matches SPSSFREQUENCIESandcode_book()'s default) keeps every declared level, including unused ones, which appear withn = 0.- rescale
Logical. If
TRUE(default), rescale weights so that their total equals the unweighted sample size (length(weights)). SeeDetailsfor the interaction withNAweights.- decimal_mark
Character used as the decimal mark in printed percentages. Either
"."(the default) or",". Matches thedecimal_markargument ofcross_tab()and the threetable_*()helpers, so European-locale users get a consistent experience across the package.- styled
Logical. If
TRUE(default), print the formatted spicy table. IfFALSE, return a plaindata.framewith frequency values.- ...
Additional arguments passed to
print.spicy_freq_table().
Value
With styled = FALSE, a plain data.frame with no extra attributes
and columns:
value- unique values or factor levelsn- frequency count (weighted if applicable)prop- proportion of totalvalid_prop- proportion of valid responses (ifvalid = TRUE)cum_prop,cum_valid_prop- cumulative percentages (ifcum = TRUE)
With styled = TRUE (default), prints the formatted table to the
console and invisibly returns a spicy_freq_table object: the same
data.frame carrying rendering metadata as attributes (digits,
data_name, var_name, var_label, class_name, n_total,
n_valid, weighted, rescaled, weight_var) used by
print.spicy_freq_table().
Details
Designed to mimic common frequency procedures from SPSS or Stata
while integrating the flexibility of R's data structures. The
input type (vector, factor, labelled) is auto-detected; see
@param labelled_levels and @param factor_levels for the
schema-vs-observed level controls, and @param na_val for
optional sentinel-value recoding.
Weighting (weights): frequencies and percentages are computed
proportionally to the weights. Missing values in weights cause
those observations to be dropped from the table entirely (with a
warning), matching the behaviour of cross_tab() in spicy
0.11.0+. With rescale = TRUE, the remaining (non-NA-weighted)
weights are normalised so the total weighted N equals the count
of non-NA-weighted rows. With rescale = FALSE, the total
weighted N is the actual sum of non-NA weights.
For schema-level inspection without computing frequencies, use
varlist() or code_book().
See also
cross_tab() for two-way cross-tabulations;
table_categorical() for multi-variable categorical summary
tables; varlist() / code_book() for variable inspection;
print.spicy_freq_table() for formatted printing;
spicy_print_table() for the underlying ASCII rendering engine.
Examples
# Frequency table with labelled ordered factor
freq(sochealth, education)
#> Frequency table: education
#>
#> Category │ Values Freq. Percent
#> ────────────┼───────────────────────────────────────
#> Valid │ Lower secondary 261 21.8
#> │ Upper secondary 539 44.9
#> │ Tertiary 400 33.3
#> ────────────┼───────────────────────────────────────
#> Total │ 1200 100.0
#>
#> Label: Highest education level
#> Class: ordered, factor
#> Data: sochealth
freq(sochealth, self_rated_health, sort = "-")
#> Frequency table: self_rated_health
#>
#> Category │ Values Freq. Percent Valid Percent
#> ────────────┼──────────────────────────────────────────────────
#> Valid │ Good 558 46.5 47.3
#> │ Very good 295 24.6 25.0
#> │ Fair 266 22.2 22.5
#> │ Poor 61 5.1 5.2
#> Missing │ NA 20 1.7
#> ────────────┼──────────────────────────────────────────────────
#> Total │ 1200 100.0 100.0
#>
#> Label: Self-rated health
#> Class: ordered, factor
#> Data: sochealth
library(labelled)
# Simple numeric vector
x <- c(1, 2, 2, 3, 3, 3, NA)
freq(x)
#> Frequency table: x
#>
#> Category │ Values Freq. Percent Valid Percent
#> ────────────┼───────────────────────────────────────────────
#> Valid │ 1 1 14.3 16.7
#> │ 2 2 28.6 33.3
#> │ 3 3 42.9 50.0
#> Missing │ NA 1 14.3
#> ────────────┼───────────────────────────────────────────────
#> Total │ 7 100.0 100.0
#>
#> Class: numeric
#> Data: x
# Plain vector with a sentinel value recoded as missing
freq(c(1, 2, 3, 99, 99), na_val = 99)
#> Frequency table: c(1, 2, 3, 99, 99)
#>
#> Category │ Values Freq. Percent Valid Percent
#> ────────────┼───────────────────────────────────────────────
#> Valid │ 1 1 20.0 33.3
#> │ 2 1 20.0 33.3
#> │ 3 1 20.0 33.3
#> Missing │ NA 2 40.0
#> ────────────┼───────────────────────────────────────────────
#> Total │ 5 100.0 100.0
#>
#> Class: numeric
#> Data: c(1, 2, 3, 99, 99)
# Labelled variable (haven-style)
x_lbl <- labelled(
c(1, 2, 3, 1, 2, 3, 1, 2, NA),
labels = c("Low" = 1, "Medium" = 2, "High" = 3)
)
var_label(x_lbl) <- "Satisfaction level"
# Treat value 1 ("Low") as missing
freq(x_lbl, na_val = 1)
#> Frequency table: x_lbl
#>
#> Category │ Values Freq. Percent Valid Percent
#> ────────────┼───────────────────────────────────────────────────
#> Valid │ [2] Medium 3 33.3 60.0
#> │ [3] High 2 22.2 40.0
#> Missing │ NA 4 44.4
#> ────────────┼───────────────────────────────────────────────────
#> Total │ 9 100.0 100.0
#>
#> Label: Satisfaction level
#> Class: haven_labelled, vctrs_vctr, double
#> Data: x_lbl
# Display only labels, add cumulative %
freq(x_lbl, labelled_levels = "labels", cum = TRUE)
#> Frequency table: x_lbl
#>
#> Category │ Values Freq. Percent Valid Percent Cum. Percent
#> ────────────┼───────────────────────────────────────────────────────────────
#> Valid │ Low 3 33.3 37.5 33.3
#> │ Medium 3 33.3 37.5 66.7
#> │ High 2 22.2 25.0 88.9
#> Missing │ NA 1 11.1 100.0
#> ────────────┼───────────────────────────────────────────────────────────────
#> Total │ 9 100.0 100.0 100.0
#>
#> Category │ Values Cum. Valid Percent
#> ────────────┼────────────────────────────────
#> Valid │ Low 37.5
#> │ Medium 75.0
#> │ High 100.0
#> Missing │ NA
#> ────────────┼────────────────────────────────
#> Total │ 100.0
#>
#> Label: Satisfaction level
#> Class: haven_labelled, vctrs_vctr, double
#> Data: x_lbl
# Display values only, sorted descending
freq(x_lbl, labelled_levels = "values", sort = "-")
#> Frequency table: x_lbl
#>
#> Category │ Values Freq. Percent Valid Percent
#> ────────────┼───────────────────────────────────────────────
#> Valid │ 1 3 33.3 37.5
#> │ 2 3 33.3 37.5
#> │ 3 2 22.2 25.0
#> Missing │ NA 1 11.1
#> ────────────┼───────────────────────────────────────────────
#> Total │ 9 100.0 100.0
#>
#> Label: Satisfaction level
#> Class: haven_labelled, vctrs_vctr, double
#> Data: x_lbl
# Show all declared factor levels, including unused ones (SPSS-style).
# The default "observed" mirrors Stata's `tab` and drops unused levels.
f <- factor(c("Yes", "No", "Yes"), levels = c("Yes", "No", "Maybe"))
freq(f, factor_levels = "all")
#> Frequency table: f
#>
#> Category │ Values Freq. Percent
#> ────────────┼──────────────────────────────
#> Valid │ Yes 2 66.7
#> │ No 1 33.3
#> │ Maybe 0 0.0
#> ────────────┼──────────────────────────────
#> Total │ 3 100.0
#>
#> Class: factor
#> Data: f
# With weighting
df <- data.frame(
sex = factor(c("Male", "Female", "Female", "Male", NA, "Female")),
weight = c(12, 8, 10, 15, 7, 9)
)
# Weighted frequencies (normalized)
freq(df, sex, weights = weight, rescale = TRUE)
#> Frequency table: sex
#>
#> Category │ Values Freq. Percent Valid Percent
#> ────────────┼───────────────────────────────────────────────
#> Valid │ Female 3 44.3 50.0
#> │ Male 3 44.3 50.0
#> Missing │ NA 1 11.5
#> ────────────┼───────────────────────────────────────────────
#> Total │ 6 100.0 100.0
#>
#> Class: factor
#> Data: df
#> Weight: weight (rescaled)
# Weighted frequencies (without rescaling)
freq(df, sex, weights = weight, rescale = FALSE)
#> Frequency table: sex
#>
#> Category │ Values Freq. Percent Valid Percent
#> ────────────┼───────────────────────────────────────────────
#> Valid │ Female 27 44.3 50.0
#> │ Male 27 44.3 50.0
#> Missing │ NA 7 11.5
#> ────────────┼───────────────────────────────────────────────
#> Total │ 61 100.0 100.0
#>
#> Class: factor
#> Data: df
#> Weight: weight
# Base R style, with weights and cumulative percentages
freq(df$sex, weights = df$weight, cum = TRUE)
#> Frequency table: sex
#>
#> Category │ Values Freq. Percent Valid Percent Cum. Percent
#> ────────────┼───────────────────────────────────────────────────────────────
#> Valid │ Female 3 44.3 50.0 44.3
#> │ Male 3 44.3 50.0 88.5
#> Missing │ NA 1 11.5 100.0
#> ────────────┼───────────────────────────────────────────────────────────────
#> Total │ 6 100.0 100.0 100.0
#>
#> Category │ Values Cum. Valid Percent
#> ────────────┼────────────────────────────────
#> Valid │ Female 50.0
#> │ Male 100.0
#> Missing │ NA
#> ────────────┼────────────────────────────────
#> Total │ 100.0
#>
#> Class: factor
#> Data: df
#> Weight: df$weight (rescaled)
# Piped version (tidy syntax) and sort alphabetically descending ("name-")
df |> freq(sex, sort = "name-")
#> Frequency table: sex
#>
#> Category │ Values Freq. Percent Valid Percent
#> ────────────┼───────────────────────────────────────────────
#> Valid │ Male 2 33.3 40.0
#> │ Female 3 50.0 60.0
#> Missing │ NA 1 16.7
#> ────────────┼───────────────────────────────────────────────
#> Total │ 6 100.0 100.0
#>
#> Class: factor
#> Data: df
# European decimal mark (matches `cross_tab()` and the `table_*()` family)
freq(sochealth, education, decimal_mark = ",")
#> Frequency table: education
#>
#> Category │ Values Freq. Percent
#> ────────────┼───────────────────────────────────────
#> Valid │ Lower secondary 261 21,8
#> │ Upper secondary 539 44,9
#> │ Tertiary 400 33,3
#> ────────────┼───────────────────────────────────────
#> Total │ 1200 100,0
#>
#> Label: Highest education level
#> Class: ordered, factor
#> Data: sochealth
# Non-styled return (for programmatic use)
f <- freq(df, sex, styled = FALSE)
head(f)
#> value n prop valid_prop
#> 1 Female 3 0.5000000 0.6
#> 2 Male 2 0.3333333 0.4
#> 3 <NA> 1 0.1666667 NA