Survey-Weighted Logistic Regression¶
Introduction¶
For binary outcomes, ignoring survey weights can produce biased coefficient estimates and severely underestimated standard errors. In the original Zelig package (Imai, King, and Lau 2007, 2008), survey-weighted logistic regression required specifying model = "logit.survey" as a separate model type. zelig2 simplifies this: pass the weight vector to weights and the model is automatically estimated via survey::svyglm (Lumley 2004) with family = binomial(link = "logit").
Note on the original Zelig's survey implementation
The original Zelig's survey models contain a known double-weighting bug (IQSS/Zelig#332) that produces incorrect estimates. zelig2 does not have this bug --- its estimates match survey::svyglm() exactly. See Comparison with Zelig for details.
This vignette uses data from the U.S. Census Bureau's Household Pulse Survey (Week 62, N = 58,202) to examine predictors of food insecurity.
Data¶
- Outcome:
food_insecure(binary) --- household experienced food insufficiency in the past 7 days - Predictors:
age(years),college(bachelor's or higher, binary),income_k(household income in $1000s) - Survey weight:
pweight--- person-level probability weight
The Household Pulse Survey uses stratification by state and unequal selection probabilities. Probability weights ensure estimates reflect the U.S. household population.
Model Estimation¶
library(zelig2)
z_weighted <- zelig2(
food_insecure ~ age + college + income_k,
model = "logit",
data = pulse,
weights = pulse$pweight,
num = 1000L
)
summary(z_weighted)
zelig2: Logistic Regression
Formula: food_insecure ~ age + college + income_k
N: 58202
Survey-weighted: yes
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.20445025 0.09842872 2.0771 0.03779 *
age -0.01950071 0.00158885 -12.2735 < 2e-16 ***
college -0.83557721 0.07313967 -11.4244 < 2e-16 ***
income_k -0.01761030 0.00092506 -19.0369 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Survey-Weighted Standard Errors
Weighted standard errors account for the survey design and are typically larger than naive unweighted SEs.
Comparing Weighted vs. Unweighted Estimates¶
z_unweighted <- zelig2(food_insecure ~ age + college + income_k,
model = "logit", data = pulse)
coef(z_unweighted)
Substantial Differences
The weighted intercept (0.204) is much smaller than the unweighted estimate (0.523), suggesting that the unweighted sample overstates baseline food insecurity risk. The age and income coefficients are attenuated with weights. The college effect is similar (-0.836 vs. -0.898).
Standard errors are approximately 2x larger: college SE is 0.073 (weighted) vs. 0.038 (unweighted). Ignoring survey design leads to false precision.
Predicted Probabilities¶
z_weighted <- setx(z_weighted, age = 35, college = 0, income_k = 50)
z_weighted <- sim(z_weighted)
summary(z_weighted)
A 35-year-old without a college degree earning $50,000 has a 20.4% probability of food insecurity (95% CI: 19.1%, 21.7%).
First Differences¶
How much does doubling income from $50,000 to $100,000 reduce food insecurity risk?
z_weighted <- zelig2(food_insecure ~ age + college + income_k,
model = "logit", data = pulse,
weights = pulse$pweight, num = 1000L)
z_weighted <- setx(z_weighted, age = 35, college = 0, income_k = 50)
z_weighted <- setx1(z_weighted, age = 35, college = 0, income_k = 100)
z_weighted <- sim(z_weighted)
summary(z_weighted)
--- Simulation Summary ( 1000 draws) ---
Expected Values:
Mean: 0.2045 SD: 0.0071 [0.1908, 0.2188]
First Differences:
Mean: -0.1081 SD: 0.0048 [-0.1177, -0.0987]
Moving from $50,000 to $100,000 in income reduces food insecurity probability by 10.8 percentage points (95% CI: -11.8, -9.9). The unweighted model estimates 12.0 pp, overstating the effect by about 11%.
Range Scenarios¶
Simulate across the full income distribution:
z_weighted <- zelig2(food_insecure ~ age + college + income_k,
model = "logit", data = pulse,
weights = pulse$pweight, num = 1000L)
z_weighted <- setx(z_weighted, age = 35, college = 0,
income_k = seq(25, 250, by = 25))
z_weighted <- sim(z_weighted)
summary(z_weighted)
--- Simulation Summary ( 1000 draws) ---
Expected Values (range):
mean sd 2.5%.2.5% 97.5%.97.5%
25 0.2858 0.0089 0.2683 0.3031
50 0.2047 0.0068 0.1918 0.2181
75 0.1421 0.0062 0.1303 0.1545
100 0.0963 0.0058 0.0850 0.1082
125 0.0642 0.0052 0.0545 0.0748
150 0.0424 0.0043 0.0344 0.0510
175 0.0277 0.0035 0.0216 0.0349
200 0.0181 0.0027 0.0134 0.0238
225 0.0117 0.0020 0.0083 0.0161
250 0.0076 0.0015 0.0051 0.0108
The income gradient is steep at lower incomes:
- At $25,000: 28.6% probability of food insecurity
- At $100,000: 9.6% (19 pp reduction from $25k)
- At $250,000: 0.8% (near zero risk)
The steepest marginal reductions occur between $25k and $100k. Above $100k, the marginal effect diminishes as baseline risk becomes very low.

Summary¶
Following the Zelig framework (Imai, King, and Lau 2007, 2008), zelig2 makes survey-weighted logistic regression accessible through the same zelig2() -> setx() -> sim() pipeline used for unweighted models. Key takeaways:
- Simple syntax:
weights = your_weight_vector--- no separate model type needed. - Coefficients change: Survey weights can substantially affect estimates. The unweighted intercept overstates baseline risk.
- Larger standard errors: Weighted SEs are typically 1.5--3x larger, reflecting design effects.
- Probability-scale inference:
setx()andsim()produce predicted probabilities, which are far more interpretable than log-odds. - First differences: Quantify effects as changes in probability at specific covariate profiles.
- Range scenarios: Reveal the nonlinear income--food insecurity gradient, steepest at low incomes.