Diff-in-Diff and Event Studies
Two-way fixed effects and dynamic treatment effect estimation
Diff-in-diff is just a regression. If you can run reghdfe y treat, absorb(state year), you can do diff-in-diff. Event studies are just DiD with more treatment indicators. The concepts might seem abstract, but the code is straightforward.
What You'll Learn
- Setting up panel data for diff-in-diff estimation
- Creating treatment timing and post-treatment variables
- Two-way fixed effects (TWFE) regression
- Event study specifications with leads and lags
- Visualizing treatment effects over time
Download Complete Scripts
Run the full session code in your preferred language:
Background: No-Fault Divorce and Female Suicide
In this tutorial, we replicate a classic diff-in-diff analysis examining how no-fault divorce reform affected female suicide rates across US states.
The Diff-in-Diff Idea in 30 Seconds
| Before Policy | After Policy | Difference | |
|---|---|---|---|
| Treated States | A | B | B - A |
| Control States | C | D | D - C |
| Diff-in-Diff Effect | (B - A) - (D - C) | ||
Logic: Compare how treated states changed (B - A) to how control states changed (D - C). The difference between these differences is the treatment effect.
The Research Question
Did no-fault divorce laws reduce female suicide rates? The hypothesis is that women trapped in unhappy or abusive marriages might experience better mental health outcomes when divorce becomes easier to obtain.
What is No-Fault Divorce?
Before no-fault divorce, ending a marriage required proving that one spouse was at "fault" (adultery, abandonment, abuse, etc.). No-fault divorce allows couples to divorce simply by declaring the marriage "irretrievably broken."
- States adopted no-fault divorce at different times (1969-1985)
- Some states never adopted it during our sample period
- This staggered adoption creates our diff-in-diff design
The Data
We use a balanced panel from 1964 to 1996 covering 49 US states:
asmrs- Age-standardized female suicide mortality rate (outcome Y)pcinc- Per capita income (control)asmrh- Age-standardized homicide rate (control)cases- Number of cases (control)stfips- State FIPS codeyear- Year
The Mechanics of TWFE, DiD, and Event Studies
Before we write any code, let's understand exactly how these models work—what the data looks like, what the regression equations mean, and why we make certain choices.
Panel Data Structure
DiD and event studies require panel data: repeated observations of the same units over time. Each row is a unit-time observation:
| state | year | suicide_rate | treatment_year | post |
|---|---|---|---|---|
| California | 1968 | 12.3 | 1970 | 0 |
| California | 1969 | 11.8 | 1970 | 0 |
| California | 1970 | 10.5 | 1970 | 1 |
| California | 1971 | 10.1 | 1970 | 1 |
| Texas | 1968 | 14.2 | . | 0 |
| Texas | 1969 | 14.0 | . | 0 |
| Texas | 1970 | 13.8 | . | 0 |
Key variables you need:
- Unit identifier (state) — who is being observed
- Time identifier (year) — when they're observed
- Outcome (suicide_rate) — what you're measuring
- Treatment timing (treatment_year) — when each unit got treated (missing for never-treated)
- Post indicator (post) — equals 1 if year ≥ treatment_year, 0 otherwise. For never-treated units, this is always 0 (they're never "post-treatment")
The TWFE Regression Equation
The basic two-way fixed effects DiD model is:
Let's break this down:
- Yit — Outcome for unit i in time t (e.g., California's suicide rate in 1970)
- Postit — Equals 1 if unit i has been treated by time t, 0 otherwise
- αi — Unit fixed effects (a dummy for each state). These absorb all time-invariant differences between states—geography, culture, baseline health infrastructure, etc.
- γt — Time fixed effects (a dummy for each year). These absorb all shocks that hit every state in a given year—recessions, federal policy changes, national trends.
- β — The treatment effect. This is what you care about.
What β Captures
The coefficient β tells you: how much did treated units change, relative to control units, relative to the overall time trend? It's the "difference-in-differences"—the difference in how treated and control units evolved over time.
From DiD to Event Studies
An event study is DiD with more detail. Instead of a single "post" indicator, we estimate separate effects for each time period relative to treatment:
Now instead of one β, we get a β for each "time to treatment" period:
- β-3 — Effect 3 years before treatment (should be ~0 if parallel trends holds)
- β-2 — Effect 2 years before treatment (should be ~0)
- β-1 — Reference period (omitted, normalized to 0)
- β0 — Effect in the year of treatment
- β1 — Effect 1 year after treatment
- β2 — Effect 2 years after treatment
This gives you a plot showing how the treatment effect evolves over time—and critically, whether the pre-treatment coefficients are close to zero (supporting parallel trends).
Why We Need a Reference Period
You can't include dummies for every time-to-treatment period. If you did, you'd have perfect multicollinearity—the dummies would sum to 1 for every treated unit, which is already captured by the unit fixed effect.
Solution: Omit one period as the "reference." All other coefficients are interpreted relative to that period.
Why -1? We typically choose the period just before treatment (k = -1) as the reference. This makes interpretation easy: β0 is the immediate treatment effect relative to the period just before treatment. Pre-treatment coefficients (β-3, β-2) show whether trends were parallel relative to the period right before treatment.
The Never-Treated Problem
What about units that never get treated? They don't have a "treatment year," so "time to treatment" is undefined for them.
event_time = year - treatment_year — the result would be missing. But we still want these states in our regression as the comparison group. The solution: give them a fake event-time value (-1000) that's far outside the real range (-5, -4, ..., 0, 1, 2, ...), so their observations won't be assigned to any event-time coefficient. Then we exclude -1000 from the event study dummies while keeping these states in the regression to help estimate the fixed effects.
If you have never-treated units: You need to handle them carefully. The key insight is that never-treated units don't have a "time to treatment," so you can't create event-time dummies for them in the usual way. Common approaches:
- Set their time-to-treatment to a far-away value (like -1000) and make sure that value gets excluded from the regression
- In Stata with
reghdfe: Create your event-time dummies only for treated units. Never-treated units will have 0 for all event-time dummies, which is correct—they serve as the control group throughout. When you generate dummies likegen evt_m3 = (time_to_treat == -3), units with missingtime_to_treatwill automatically get 0. - In Stata with factor variables: If using
i.time_to_treat, set never-treated to a value like -1000, then useib(-1).time_to_treatto set -1 as the reference. Stata will estimate coefficients for -1000 but you can ignore them (or cap your event window to exclude it). - In R's
fixest: Useref = c(-1, -1000)to exclude both the reference period AND the never-treated indicator from the regression.
If you have NO never-treated units (everyone eventually gets treated, just at different times): You need two reference periods—one to avoid collinearity with unit fixed effects, and one to avoid collinearity with time fixed effects. Typically you'd omit both k = -1 and one endpoint (like the last period).
Why Two Reference Periods?
With staggered adoption and no never-treated group, the time-to-treatment dummies span the entire time dimension. Without a second normalization, the year fixed effects become collinear with the event-time dummies. Dropping an endpoint (like the last period in your event window) solves this.
Interpreting the Event Study Plot
A good event study plot shows:
- Pre-treatment coefficients near zero — The β-3, β-2 estimates should be close to zero and statistically insignificant. This supports parallel trends.
- A jump at treatment — β0 should show the immediate effect (if there is one).
- Post-treatment dynamics — β1, β2, etc. show whether the effect persists, grows, or fades.
- Pre-trends that aren't flat (suggests parallel trends violation)
- A "jump" that starts before treatment (anticipation effects or misspecified timing)
- Huge confidence intervals that include zero everywhere (not enough power)
Setting Up the Analysis
Load and Explore the Data
* Load the data
use bacon_example.dta, clear
* Explore the structure
tab year
tab stfips
* Examine the outcome and controls
desc asmrs // Female suicide rate (Y)
desc pcinc asmrh cases // Controls
# Load packages
pacman::p_load(haven, tidyverse, fixest, ggfixest)
# Load the data
df <- read_dta("bacon_example.dta")
# Explore the structure
table(df$year)
table(df$stfips)
# Examine the outcome and controls
summary(df[c("asmrs", "pcinc", "asmrh", "cases")])
import pandas as pd
import pyfixest as pf
# Load the data
df = pd.read_stata("bacon_example.dta")
# Explore the structure
print(df['year'].value_counts().sort_index())
print(df['stfips'].value_counts().sort_index())
# Examine the outcome and controls
df[['asmrs', 'pcinc', 'asmrh', 'cases']].describe()
Create the Treatment Timing Variable
First, we need to identify when each state adopted no-fault divorce. States that never adopted act as our control group.
* Create variable for year state adopted no-fault divorce
* Missing = never adopted (control state)
gen _nfd = .
* Fill in adoption years for each state
replace _nfd = 1971 if stfips == 1 // Alabama
replace _nfd = 1973 if stfips == 4 // Arizona
replace _nfd = 1970 if stfips == 6 // California
replace _nfd = 1971 if stfips == 8 // Colorado
replace _nfd = 1973 if stfips == 9 // Connecticut
replace _nfd = 1977 if stfips == 11 // DC
replace _nfd = 1971 if stfips == 12 // Florida
replace _nfd = 1973 if stfips == 13 // Georgia
replace _nfd = 1971 if stfips == 16 // Idaho
replace _nfd = 1984 if stfips == 17 // Illinois
replace _nfd = 1973 if stfips == 18 // Indiana
replace _nfd = 1970 if stfips == 19 // Iowa
replace _nfd = 1969 if stfips == 20 // Kansas
replace _nfd = 1972 if stfips == 21 // Kentucky
replace _nfd = 1973 if stfips == 23 // Maine
replace _nfd = 1975 if stfips == 25 // Massachusetts
replace _nfd = 1972 if stfips == 26 // Michigan
replace _nfd = 1974 if stfips == 27 // Minnesota
replace _nfd = 1973 if stfips == 29 // Missouri
replace _nfd = 1975 if stfips == 30 // Montana
replace _nfd = 1972 if stfips == 31 // Nebraska
replace _nfd = 1973 if stfips == 32 // Nevada
replace _nfd = 1971 if stfips == 33 // New Hampshire
replace _nfd = 1971 if stfips == 34 // New Jersey
replace _nfd = 1973 if stfips == 35 // New Mexico
replace _nfd = 1971 if stfips == 38 // North Dakota
replace _nfd = 1974 if stfips == 39 // Ohio
replace _nfd = 1973 if stfips == 41 // Oregon
replace _nfd = 1980 if stfips == 42 // Pennsylvania
replace _nfd = 1976 if stfips == 44 // Rhode Island
replace _nfd = 1969 if stfips == 45 // South Carolina
replace _nfd = 1985 if stfips == 46 // South Dakota
replace _nfd = 1974 if stfips == 48 // Texas
replace _nfd = 1973 if stfips == 53 // Washington
replace _nfd = 1977 if stfips == 55 // Wisconsin
replace _nfd = 1977 if stfips == 56 // Wyoming
* States with missing _nfd never adopted (control group)
# Create adoption year variable
# NA = never adopted (control state)
adoption_years <- c(
`1` = 1971, `4` = 1973, `6` = 1970, `8` = 1971, `9` = 1973,
`11` = 1977, `12` = 1971, `13` = 1973, `16` = 1971, `17` = 1984,
`18` = 1973, `19` = 1970, `20` = 1969, `21` = 1972, `23` = 1973,
`25` = 1975, `26` = 1972, `27` = 1974, `29` = 1973, `30` = 1975,
`31` = 1972, `32` = 1973, `33` = 1971, `34` = 1971, `35` = 1973,
`38` = 1971, `39` = 1974, `41` = 1973, `42` = 1980, `44` = 1976,
`45` = 1969, `46` = 1985, `48` = 1974, `53` = 1973, `55` = 1977,
`56` = 1977
)
df <- df %>%
mutate(nfd = adoption_years[as.character(stfips)])
# States with NA never adopted (control group)
# Create adoption year variable
# NaN = never adopted (control state)
adoption_years = {
1: 1971, 4: 1973, 6: 1970, 8: 1971, 9: 1973, 11: 1977,
12: 1971, 13: 1973, 16: 1971, 17: 1984, 18: 1973, 19: 1970,
20: 1969, 21: 1972, 23: 1973, 25: 1975, 26: 1972, 27: 1974,
29: 1973, 30: 1975, 31: 1972, 32: 1973, 33: 1971, 34: 1971,
35: 1973, 38: 1971, 39: 1974, 41: 1973, 42: 1980, 44: 1976,
45: 1969, 46: 1985, 48: 1974, 53: 1973, 55: 1977, 56: 1977
}
df['_nfd'] = df['stfips'].map(adoption_years)
# States with NaN never adopted (control group)
Key Insight: Staggered Adoption
Notice that treatment is staggered - different states adopt at different times. Kansas and South Carolina adopted first (1969), while South Dakota adopted last (1985). This variation is what allows identification.
Two-Way Fixed Effects (TWFE)
Create the Post-Treatment Indicator
For the basic diff-in-diff, we need a binary indicator for whether a state-year is in the post-treatment period:
treat_post equal? What about California in 1968?
* Create post-treatment indicator
* = 1 if year >= year state adopted no-fault divorce
* = 0 otherwise (including never-adopters)
gen treat_post = (year >= _nfd)
# Create post-treatment indicator
# = 1 if year >= year state adopted no-fault divorce
# = 0 otherwise (including never-adopters)
df <- df %>%
mutate(treat_post = ifelse(!is.na(`_nfd`) & year >= `_nfd`, 1, 0))
# Create post-treatment indicator
# = 1 if year >= year state adopted no-fault divorce
# = 0 otherwise (including never-adopters)
df['treat_post'] = ((df['_nfd'].notna()) & (df['year'] >= df['_nfd'])).astype(int)
Answer: California's treat_post values
California adopted in 1970, so:
• California 1968: treat_post = 0 (1968 < 1970)
• California 1972: treat_post = 1 (1972 ≥ 1970)
• Massachusetts (never adopted): treat_post = 0 always
Run the TWFE Regression
- State FE: Control for everything about a state that doesn't change over time (e.g., geography, culture)
- Year FE: Control for everything that affects all states equally in a year (e.g., recessions, federal policy)
- treat_post: The only remaining variation is states switching from 0 to 1 when they adopt the policy
* Two-way fixed effects regression
* Y = female suicide rate
* X = post-treatment indicator
* FEs = state and year
* Controls = income, homicide rate, cases
reg asmrs treat_post i.stfips i.year pcinc asmrh cases, vce(cluster stfips)
# Two-way fixed effects regression using fixest
# Y = female suicide rate
# X = post-treatment indicator
# FEs = state and year
# Controls = income, homicide rate, cases
model_twfe <- feols(asmrs ~ treat_post + pcinc + asmrh + cases | stfips + year,
cluster = ~stfips, data = df)
summary(model_twfe)
# Two-way fixed effects regression using pyfixest
# Y = female suicide rate
# X = post-treatment indicator
# FEs = state and year
# Controls = income, homicide rate, cases
model_twfe = pf.feols('asmrs ~ treat_post + pcinc + asmrh + cases | stfips + year',
vcov={'CRV1': 'stfips'}, data=df)
print(model_twfe.summary())
Quick Check: TWFE
Question: In the TWFE regression, what does the coefficient on treat_post capture?
Interpreting the Coefficient
The coefficient on treat_post represents the average effect of no-fault divorce on female suicide rates, controlling for state fixed effects, year fixed effects, and time-varying controls. A negative coefficient suggests the reform reduced suicide rates.
Event Study
The TWFE gives us one average effect, but we often want to see how the effect evolves over time. An event study lets us:
- Test the parallel trends assumption (pre-treatment coefficients should be near zero)
- See how the effect builds over time after treatment
- Identify any anticipation effects before treatment
Create Time-to-Treatment Variable
treat_post coefficient, get a separate coefficient for each year relative to treatment.
Example: California adopted in 1970. For California:
• 1968 → time_to_treat = -2 (two years before)
• 1970 → time_to_treat = 0 (year of adoption)
• 1973 → time_to_treat = +3 (three years after)
* Create time relative to treatment
* Negative = years before treatment
* 0 = year of treatment
* Positive = years after treatment
gen time_to_treat = year - _nfd
* IMPORTANT: Never-treated states don't have a meaningful "time to treatment"
* Set them to -1000 (a value far outside the range of actual event times)
* Then exclude -1000 from the event study dummies using ref = c(-1, -1000)
replace time_to_treat = -1000 if missing(_nfd)
* Create ever-treated indicator
gen ever_treated = !missing(_nfd)
# Create time relative to treatment
# Negative = years before treatment
# 0 = year of treatment
# Positive = years after treatment
#
# IMPORTANT: Never-treated states don't have a meaningful "time to treatment"
# Set them to -1000 (outside the range of real event times)
# Then exclude with ref = c(-1, -1000) in fixest
df <- df %>%
mutate(
time_to_treat = ifelse(is.na(`_nfd`), -1000, year - `_nfd`),
ever_treated = !is.na(`_nfd`)
)
# Create time relative to treatment
# Negative = years before treatment
# 0 = year of treatment
# Positive = years after treatment
#
# IMPORTANT: Never-treated states don't have a meaningful "time to treatment"
# Set them to -1000 (outside the range of real event times)
# Then exclude with ref = c(-1, -1000) in pyfixest
df['time_to_treat'] = df['year'] - df['_nfd']
df['time_to_treat'] = df['time_to_treat'].fillna(-1000).astype(int)
df['ever_treated'] = df['_nfd'].notna().astype(int)
Run the Event Study Regression
* Handle Stata's factor variable limitation (no negatives)
summ time_to_treat
local min_val = r(min)
* Shift so minimum is 0
gen shifted_ttt = time_to_treat - `min_val'
* Find the value that corresponds to t=-1 (our reference period)
summ shifted_ttt if time_to_treat == -1
local true_neg1 = r(mean)
* Event study regression (manual approach for understanding)
* ib`true_neg1' sets t=-1 as the reference category
reg asmrs ib`true_neg1'.shifted_ttt pcinc asmrh cases i.stfips i.year, vce(cluster stfips)
Easier Package-Based Approach
The manual approach above is good for understanding what's happening. In practice, use these cleaner methods:
* EASY WAY: Use xtevent package
* Install once: ssc install xtevent
xtevent asmrs pcinc asmrh cases, ///
policyvar(treat_post) ///
panelvar(stfips) timevar(year) ///
window(10) ///
cluster(stfips)
* One-line plot
xteventplot, ///
title("Event Study: No-Fault Divorce and Female Suicide") ///
note("95% CIs shown. SEs clustered at state level.")
* Alternative: eventdd package (ssc install eventdd)
* eventdd asmrs pcinc asmrh cases, timevar(time_to_treat) ///
* absorb(i.stfips i.year) cluster(stfips) ///
* graph_op(ytitle("Effect on Female Suicide Rate"))
# Load ggfixest for plotting
pacman::p_load(haven, tidyverse, fixest, ggfixest)
# EASY WAY: fixest with proper handling of never-treated
# i(time_to_treat, ever_treated, ref = c(-1, -1000)) means:
# - Create dummies for each time_to_treat value
# - Interact with ever_treated (so never-treated don't get dummies)
# - Exclude t=-1 (reference) AND t=-1000 (never-treated)
# See: https://lrberge.github.io/fixest/articles/fixest_walkthrough.html
model_es <- feols(
asmrs ~ i(time_to_treat, ever_treated, ref = c(-1, -1000)) + pcinc + asmrh + cases | stfips + year,
cluster = ~stfips, data = df
)
# One-line plot using ggfixest
p <- ggiplot(model_es,
main = "Event Study: No-Fault Divorce and Female Suicide",
xlab = "Years Relative to Treatment",
ylab = "Coefficient")
# EASY WAY: pyfixest with proper handling of never-treated
# ref = c(-1, -1000) excludes both the reference period and never-treated
model_es = pf.feols(
'asmrs ~ i(time_to_treat, ever_treated, ref=c(-1, -1000)) + pcinc + asmrh + cases | stfips + year',
vcov={'CRV1': 'stfips'}, data=df
)
# One-line plot
model_es.iplot()
# Load ggfixest for plotting
library(ggfixest)
# Event study using fixest's i() syntax
# i(time_to_treat, ever_treated, ref = c(-1, -1000)):
# - time_to_treat: relative time variable (-1000 for never-treated)
# - ever_treated: interaction (1 if ever treated, 0 if never)
# - ref = c(-1, -1000): exclude reference period AND never-treated
model_es <- feols(
asmrs ~ i(time_to_treat, ever_treated, ref = c(-1, -1000)) + pcinc + asmrh + cases | stfips + year,
cluster = ~stfips, data = df
)
summary(model_es)
# Create event study plot using ggfixest
p <- ggiplot(model_es,
main = "Event Study: No-Fault Divorce and Female Suicide",
xlab = "Years Relative to Treatment",
ylab = "Coefficient")
# Event study using pyfixest's i() syntax
# i(time_to_treat, ever_treated, ref=c(-1, -1000)):
# - time_to_treat: relative time (-1000 for never-treated)
# - ever_treated: interaction term
# - ref=c(-1, -1000): exclude reference AND never-treated
model_es = pf.feols(
'asmrs ~ i(time_to_treat, ever_treated, ref=c(-1, -1000)) + pcinc + asmrh + cases | stfips + year',
vcov={'CRV1': 'stfips'}, data=df
)
print(model_es.summary())
# Create event study plot
model_es.iplot()
Quick Check: Event Studies
Question: In an event study plot, what should pre-treatment coefficients (t < 0) look like if parallel trends holds?
Why t = -1 as Reference?
We use t = -1 (one year before treatment) as our reference period because it's the last pre-treatment period. All coefficients are then interpreted relative to this period. This is standard practice in event studies.
Reading an Event Study Plot
- Pre-treatment coefficients (left of 0): Should be close to zero and statistically insignificant if parallel trends holds
- Post-treatment coefficients (right of 0): Show how the treatment effect evolves over time
- Reference period (t = -1): Always zero by construction
Interpreting Results
What to Look For
Parallel Trends
In the event study plot, pre-treatment coefficients should be statistically indistinguishable from zero. If you see a clear trend before treatment, the parallel trends assumption may be violated.
Common Issues
- Pre-trends: If coefficients are trending before treatment, your estimate may be biased
- Anticipation effects: If coefficients become significant before t=0, states may have responded before official adoption
A Note on Staggered Timing
Recent research has found that DD estimates can be biased when there is both (A) staggered treatment timing and (B) heterogeneous treatment effects. You don't need to know anything about this for 14.33. If you're interested, ask your TA!
Practice: Interpreting Results
Test your understanding by interpreting this hypothetical event study output.
Scenario: You run an event study and get these coefficients
| Time to Treatment | Coefficient | Std. Error | Significant? |
|---|---|---|---|
| t = -3 | 0.02 | 0.15 | No |
| t = -2 | -0.05 | 0.14 | No |
| t = -1 (ref) | 0.00 | — | — |
| t = 0 | -0.35 | 0.12 | Yes |
| t = 1 | -0.42 | 0.13 | Yes |
| t = 2 | -0.48 | 0.14 | Yes |
Questions:
- Does the parallel trends assumption appear to hold?
- What is the estimated effect of the policy?
- Does the effect appear immediate or gradual?
Click to see answers
- Parallel trends: Yes, it appears to hold. The pre-treatment coefficients (t = -3, -2) are small and statistically insignificant, suggesting treated and control groups were on similar trends before the policy.
- Effect: The policy reduces the outcome by about 0.35-0.48 units. This is a negative effect that is statistically significant.
- Dynamics: The effect appears immediately at t = 0 and grows slightly over time (-0.35 → -0.42 → -0.48). This suggests both an immediate impact and some gradual strengthening of the effect.
Tutorial Complete!
You've learned the essentials of diff-in-diff analysis and event studies.
Key Takeaways
- DiD logic: Compare how treated units changed vs. how control units changed
- TWFE: State FE + year FE + treatment indicator gives you the average effect
- Event studies: Replace one treatment indicator with many (one per period)
- Parallel trends: Pre-treatment coefficients should be near zero
- Interpretation: Coefficients show effect relative to t = -1
Found something unclear or have a suggestion? Email [email protected].