Skip to contents

uncertainty_coef() computes the Uncertainty Coefficient (Theil's U) for a two-way contingency table, based on information entropy.

Usage

uncertainty_coef(
  x,
  direction = c("symmetric", "row", "column"),
  detail = FALSE,
  conf_level = 0.95,
  digits = 3L,
  .include_se = FALSE
)

Arguments

x

A contingency table (of class table).

direction

Direction of prediction: "symmetric" (default), "row" (column predicts row), or "column" (row predicts column).

detail

Logical. If FALSE (default), return the estimate as a numeric scalar. If TRUE, return a named numeric vector including confidence interval and p-value.

conf_level

A number between 0 and 1 giving the confidence level (default 0.95). Only used when detail = TRUE. Set to NULL to omit the confidence interval.

digits

Number of decimal places used when printing the result (default 3). Only affects the detail = TRUE output.

.include_se

Internal parameter; do not use.

Value

Same structure as cramer_v(): a scalar when detail = FALSE, a named vector when detail = TRUE. The p-value tests H0: U = 0 (Wald z-test).

Details

The uncertainty coefficient measures association using Shannon entropy. For direction = "row": \(U = (H_X + H_Y - H_{XY}) / H_X\), where \(H_X\), \(H_Y\) are the marginal entropies and \(H_{XY}\) is the joint entropy. The symmetric version is \(U = 2 (H_X + H_Y - H_{XY}) / (H_X + H_Y)\). Standard error formulas follow the DescTools implementations (Signorell et al., 2024); see cramer_v() for full references.

Examples

tab <- table(sochealth$smoking, sochealth$education)
uncertainty_coef(tab)
#> [1] 0.01148762
uncertainty_coef(tab, direction = "row", detail = TRUE)
#> Estimate  CI lower  CI upper      p
#>    0.018     0.003     0.032  0.021