Quick Start¶
This page walks through a complete example in both R and Stata using generated test data. Both packages include a built-in test dataset generator so you can try the package immediately and verify that the DLM produces estimates identical to a canonical event study with binned endpoints.
Generate Test Data¶
Both packages include a data generator that creates a balanced panel with staggered treatment adoption. The generated data has:
- A configurable number of units and time periods
- Treatment assigned randomly to a fraction of units
- Treatment onset at time 7, 8, or 9 (uniform)
- A treatment effect of −3 on the outcome (post-treatment)
- Gaussian noise (σ = 5)
Output:
'data.frame': 10000 obs. of 7 variables:
$ group : Factor w/ 500 levels "group1","group2",..: 1 1 1 1 1 ...
$ time : int 1 2 3 4 5 6 7 8 9 10 ...
$ treatment_time : int 7 7 7 7 7 7 7 7 7 7 ...
$ treat : num 1 1 1 1 1 1 1 1 1 1 ...
$ years_to_treatment: num -6 -5 -4 -3 -2 -1 0 1 2 3 ...
$ post : num 0 0 0 0 0 0 1 1 1 1 ...
$ outcome : num -4.68 3.52 2.48 8.83 -2.54 ...
Output:
Generated balanced panel:
Units: 500
Time periods: 20
Total obs: 10000
Treated units: 195 (39%)
Treatment time: uniformly in {7, 8, 9}
Treatment effect: -3 (post-treatment)
Contains data
Observations: 10,000
Variables: 7
---------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
---------------------------------------------------------------
unit long %12.0g Unit identifier
time long %12.0g Time period
treat byte %8.0g Treatment group indicator
treatment_time long %12.0g Period when treatment begins
years_to_trea~t long %12.0g Periods relative to treatment
(-1000 = never treated)
post byte %8.0g Post-treatment indicator
outcome double %10.0g Outcome (noise + treatment
effect)
---------------------------------------------------------------
Sorted by: unit time
Note
R and Stata use different random number generators, so seed = 42 produces different data in each language. The coefficients differ numerically but the DLM works identically in both.
Estimate the DLM¶
library(dlm)
library(dplyr)
df <- generate_data(seed = 42, n_groups = 500, n_times = 20, treat_prob = 0.4)
outcome_data <- df %>% select(group, time, outcome)
exposure_data <- df %>% select(group, time, post) %>% distinct()
mod <- distributed_lags_model(
data = outcome_data,
exposure_data = exposure_data,
from_rt = -3, to_rt = 3,
outcome = "outcome", exposure = "post",
unit = "group", time = "time"
)
mod$betas
Output:
Output:
------------------------------------------------------------------------
Distributed Lag Model (Schmidheiny & Siegloch 2023)
------------------------------------------------------------------------
Outcome: outcome
Exposure: post
Window: [-3, 3], ref = -1
N obs: 7500
N clusters: 500
------------------------------------------------------------------------
Time Coef SE 95% CI lo 95% CI hi
------------------------------------------------------------------------
-3 -0.063959 0.483545 -1.011708 0.883790
-2 0.094669 0.480424 -0.846962 1.036299
-1 (ref) . . .
0 -2.763973 0.461907 -3.669310 -1.858636
1 -3.094282 0.520414 -4.114294 -2.074271
2 -2.707691 0.554940 -3.795373 -1.620010
3 -3.256921 0.426200 -4.092272 -2.421569
------------------------------------------------------------------------
Interpret Results¶
The output table shows beta coefficients — cumulative treatment effects at each event time relative to the reference period (default: −1).
- Pre-treatment betas (t < ref): Should be near zero if parallel trends hold. These test the "no pre-trend" assumption.
- Post-treatment betas (t ≥ 0): Estimate the cumulative causal effect at each horizon.
- Reference period (t = −1): Normalized to zero by construction.
In both examples, the true treatment effect is −3. The estimated post-treatment betas cluster around −3, while pre-treatment betas are close to zero — exactly as expected.
Because the test data uses a binary absorbing treatment, these betas are numerically identical to what a canonical binned-endpoint event study would produce on the same data. Both the R and Stata packages include equivalence tests that verify this match to machine precision (~10⁻¹³).
Plot the Results¶
Construct the plot from the e(betas) matrix:
matrix b = e(betas)
preserve
clear
local nr = rowsof(b)
set obs `nr'
gen time_to_event = .
gen coef = .
gen ci_lo = .
gen ci_hi = .
forvalues i = 1/`nr' {
replace time_to_event = b[`i', 1] in `i'
replace coef = b[`i', 2] in `i'
replace ci_lo = b[`i', 4] in `i'
replace ci_hi = b[`i', 5] in `i'
}
twoway (rcap ci_lo ci_hi time_to_event, lcolor(navy)) ///
(scatter coef time_to_event, mcolor(navy) msymbol(circle)), ///
yline(0, lpattern(dash) lcolor(gray)) ///
xline(-0.5, lpattern(dash) lcolor(gray)) ///
xtitle("Periods to Treatment") ytitle("Coefficient") ///
title("Event-Study Plot (DLM)") legend(off)
restore

Access Results Programmatically¶
# Beta coefficients (data.frame: time_to_event, coef, se)
mod$betas
# time_to_event coef se
# post_lead2 -3 -0.11846256 0.4729260
# post_lead1 -2 -0.06327287 0.5139965
# post_lag0 0 -2.64104056 0.5438700
# post_lag1 1 -2.26029265 0.5234023
# post_lag2 2 -3.04210098 0.5674556
# post_lag3 3 -2.61751913 0.4214689
# Event-study plot (ggplot2 object)
mod$plot
# Underlying fixest model object
summary(mod$model)
# Gamma variance-covariance matrix
mod$vcov
# Number of observations
nobs(mod$model)
# [1] 7500
e(betas)[7,5]
time_to_ev~t coef se ci_lo ci_hi
r1 -3 -.06395882 .48354546 -1.0117079 .88379028
r2 -2 .09466869 .48042378 -.84696192 1.0362993
r3 -1 0 0 0 0
r4 0 -2.7639728 .46190658 -3.6693097 -1.8586358
r5 1 -3.0942824 .52041411 -4.1142941 -2.0742708
r6 2 -2.7076914 .55493963 -3.7953731 -1.6200097
r7 3 -3.2569205 .42619968 -4.0922719 -2.4215691
e(gamma)[1,6]
_dlm_lead2 _dlm_lead1 _dlm_lag0 _dlm_lag1 _dlm_lag2 _dlm_lag3
y1 .1586275 -.09466869 -2.7639728 -.33030967 .38659102 -.54922912
