Simulated social-health survey — sochealth • spicy

A simulated dataset of 1200 respondents from a fictional social-health survey, designed to illustrate the main features of the spicy package: variable labels, ordered factors, survey weights, association measures, and APA-style reporting.

Usage

sochealth

Format

A tibble with 1200 rows and 24 variables:

sex: Factor. Sex of the respondent.
age: Numeric. Age in years (25–75).
age_group: Ordered factor. Age group (25–34, 35–49, 50–64, 65–75).
education: Ordered factor. Highest education level (Lower secondary, Upper secondary, Tertiary).
social_class: Ordered factor. Subjective social class (Lower, Working, Lower middle, Middle, Upper middle).
region: Factor. Region of residence (6 regions).
employment_status: Factor. Employment status (Employed, Student, Unemployed, Inactive).
income_group: Ordered factor. Household income group (Low, Lower middle, Upper middle, High). Contains missing values.
income: Numeric. Monthly household income in CHF (1000–7400).
smoking: Factor. Current smoker (No, Yes). Contains missing values.
physical_activity: Factor. Regular physical activity (No, Yes).
dentist_12m: Factor. Dentist visit in the last 12 months (No, Yes).
self_rated_health: Ordered factor. Self-rated health (Poor, Fair, Good, Very good). Contains missing values.
wellbeing_score: Numeric. WHO-5 wellbeing index (0–100).
bmi: Numeric. Body mass index in kg/m\(^2\) (16–39). Contains missing values.
bmi_category: Ordered factor. BMI category (Normal weight, Overweight, Obesity). Contains missing values.
institutional_trust: Ordered factor. Trust in institutions (Very low, Low, High, Very high).
political_position: Numeric. Political position on a 0 (left) to 10 (right) scale. Contains missing values.
life_sat_health: Integer. Satisfaction with own health (1–5 Likert scale). Contains missing values.
life_sat_work: Integer. Satisfaction with work or main activity (1–5 Likert scale). Contains missing values.
life_sat_relationships: Integer. Satisfaction with personal relationships (1–5 Likert scale). Contains missing values.
life_sat_standard: Integer. Satisfaction with standard of living (1–5 Likert scale). Contains missing values.
response_date: POSIXct. Date and time of survey response (September–November 2024).
weight: Numeric. Survey design weight (range 0.29–3.45); calibrated so that sum(weight) matches the unweighted N and mean(weight) is approximately 1. See Details.

Source

Simulated data for illustration purposes; reproducible by sourcing data-raw/sochealth.R. The script seeds the main generation block with set.seed(2025), and the two missing-value injection blocks with set.seed(2027) (the four life_sat_* items) and set.seed(2026) (smoking, self_rated_health, income_group, political_position, bmi).

Details

Every variable carries a "label" attribute (read by labelled::var_label() and surfaced by varlist() / code_book()). The mix of factor types is deliberate: nominal factors (sex, region, ...) and ordered factors (education, self_rated_health, ...) live side by side so that cross_tab() and table_categorical() can demonstrate the automatic ordinal-vs-nominal dispatch (Cramer's V, Phi, Kendall's Tau-b, Goodman-Kruskal Gamma) on the same dataset.

Survey weights (weight) are calibrated: sum(weight) matches the unweighted N to within rounding (\(\approx 1200\)) and mean(weight) is \(\approx 1\). Weighted means therefore agree with unweighted means up to sampling noise without further rescaling.

Examples

data(sochealth)
varlist(sochealth)
#> Non-interactive session: use `tbl = TRUE` to return a tibble.
freq(sochealth, education)
#> Frequency table: education
#> 
#>  Category   │ Values               Freq.    Percent 
#> ────────────┼───────────────────────────────────────
#>  Valid      │ Lower secondary        261       21.8 
#>             │ Upper secondary        539       44.9 
#>             │ Tertiary               400       33.3 
#> ────────────┼───────────────────────────────────────
#>  Total      │                       1200      100.0 
#> 
#> Label: Highest education level
#> Class: ordered, factor
#> Data: sochealth
cross_tab(sochealth, education, self_rated_health)
#> Crosstable: education x self_rated_health (N)
#> 
#>  Values            │   Poor    Fair    Good    Very good │   Total 
#> ───────────────────┼─────────────────────────────────────┼─────────
#>  Lower secondary   │     28      86     102           44 │     260 
#>  Upper secondary   │     28     118     263          118 │     527 
#>  Tertiary          │      5      62     193          133 │     393 
#> ───────────────────┼─────────────────────────────────────┼─────────
#>  Total             │     61     266     558          295 │    1180 
#> 
#> Chi-2(6) = 73.2, p <.001
#> Kendall's Tau-b = 0.20