uncertainty_coef() computes the Uncertainty Coefficient
(Theil's U) for a two-way contingency table, based on
information entropy.
Usage
uncertainty_coef(
x,
direction = c("symmetric", "row", "column"),
detail = FALSE,
conf_level = 0.95,
digits = 3L,
.include_se = FALSE
)Arguments
- x
A contingency table (of class
table).- direction
Direction of prediction:
"symmetric"(default),"row"(column predicts row), or"column"(row predicts column).- detail
Logical. If
FALSE(default), return the estimate as a numeric scalar. IfTRUE, return a named numeric vector including confidence interval and p-value.- conf_level
A number between 0 and 1 giving the confidence level (default
0.95). Only used whendetail = TRUE. Set toNULLto omit the confidence interval.- digits
Number of decimal places used when printing the result (default
3). Only affects thedetail = TRUEoutput.- .include_se
Internal parameter; do not use.
Value
Same structure as cramer_v(): a scalar when
detail = FALSE, a named vector when detail = TRUE.
The p-value tests H0: U = 0 (Wald z-test).
Details
The uncertainty coefficient measures association using
Shannon entropy.
For direction = "row":
\(U = (H_X + H_Y - H_{XY}) / H_X\), where \(H_X\),
\(H_Y\) are the marginal entropies and \(H_{XY}\) is
the joint entropy.
The symmetric version is
\(U = 2 (H_X + H_Y - H_{XY}) / (H_X + H_Y)\).
Standard error formulas follow the DescTools implementations
(Signorell et al., 2024); see cramer_v() for full references.
Examples
tab <- table(sochealth$smoking, sochealth$education)
uncertainty_coef(tab)
#> [1] 0.01148762
uncertainty_coef(tab, direction = "row", detail = TRUE)
#> Estimate CI lower CI upper p
#> 0.018 0.003 0.032 0.021
