Causal Inference in Practice
Instrumental variables, difference-in-differences, event studies, regression discontinuity
You have four main tools in your toolbox: IV uses an "experiment-like" random shock to your treatment; DiD/Event Studies compare treated and control groups before and after treatment; RD exploits a threshold or cutoff; and sometimes you find a real random experiment (RCT). Understanding when to use each is key.
Instrumental Variables
Suppose you want to know: does more education cause higher wages? You run a regression of wages on years of schooling and find a positive coefficient. But does this mean education causes higher wages?
The problem is that people who get more education might be different in ways that also affect their wages. Maybe they're smarter, more motivated, or from wealthier families. These traits affect both education and wages. So when you see that educated people earn more, you can't tell if it's because:
- Education actually makes people more productive (the causal effect you want), or
- Smart/motivated people both get more education and earn more (a confound)
OLS mixes these together. Your coefficient captures both the true effect of education and the fact that "education" is partly a proxy for ability. This is called omitted variable bias—ability affects both X and Y, but you can't measure it.
The IV Solution
Instrumental variables solve this by finding something that affects education but has no direct effect on wages. This "instrument" creates variation in education that's unrelated to ability.
A classic example: distance to the nearest college. People who grow up closer to a college tend to get more education (it's cheaper and easier to attend). But distance to a college doesn't directly make you earn more—it only affects wages through its effect on education.
The logic: if we compare people who differ only in how close they grew up to a college, any wage difference must be due to their different education levels—not ability, since distance to college isn't correlated with ability.
Two-Stage Least Squares (2SLS)
In practice, we implement this in two stages:
- First stage: Predict education using only the instrument (distance to college). This gives us "clean" variation in education that's driven by geography, not ability.
- Second stage: Use these predicted education values to estimate the effect on wages. Since the predictions are based only on distance, they're uncorrelated with ability.
* 2SLS: Y = outcome, X = endogenous, Z = instrument
* Syntax: ivregress 2sls Y controls (X = Z)
ivregress 2sls income age female (education = distance_to_college)
* First-stage F-statistic (check relevance)
estat firststage
* With robust standard errors
ivregress 2sls income age (education = distance), robust
* With fixed effects
ivreghdfe income age (education = distance), absorb(state) cluster(state)
pacman::p_load(fixest)
# 2SLS with fixest
# Syntax: Y ~ controls | FE | endogenous ~ instrument
model <- feols(income ~ age + female | 0 | education ~ distance_to_college,
data = data)
summary(model)
# Check first-stage F-statistic
fitstat(model, "ivf")
# With fixed effects
model <- feols(income ~ age | state | education ~ distance,
data = data, vcov = ~state)
import pyfixest as pf
# 2SLS: Y ~ exogenous | FE | endogenous ~ instrument
model = pf.feols(
'income ~ age + female | 0 | education ~ distance_to_college',
data=data, vcov='hetero'
)
model.summary()
# With fixed effects
model = pf.feols(
'income ~ age | state | education ~ distance',
data=data, vcov={'CRV1': 'state'}
)
Checking Instrument Strength: First-Stage F-Statistic
The first-stage F-statistic tests whether your instrument actually predicts the endogenous variable. Rule of thumb: F > 10 suggests the instrument is strong enough. Weak instruments (F < 10) lead to biased estimates and unreliable inference.
Instrument Strength Matters
Even if your instrument is valid (doesn't affect Y directly), it only helps if it's relevant (actually predicts X). Always report the first-stage F-statistic—readers need to know if your instrument is strong.
Complete IV Example: Protest Size and Voting
Research question: Does protest size affect voting? Problem: larger protests might occur in places that would vote differently anyway. Solution: use weather (rain) as an instrument—rain reduces protest size but shouldn't directly affect voting.
* First stage: Does rain affect protest size?
reg protest rain, robust
eststo first_stage
* 2SLS: Effect of protest on voting
ivregress 2sls vote (protest = rain), robust
eststo iv_model
* Reduced form: Direct effect of rain on voting
reg vote rain, robust
eststo reduced_form
* Export results
esttab first_stage iv_model reduced_form ///
using "iv_results.tex", replace ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
stats(N r2 F, fmt(%9.0fc %9.3f %9.2f))
pacman::p_load(fixest, modelsummary)
# First stage: Does rain affect protest size?
first_stage <- feols(protest ~ rain, data = data, vcov = "hetero")
# 2SLS: Effect of protest on voting
iv_model <- feols(vote ~ 1 | 0 | protest ~ rain, data = data, vcov = "hetero")
# Reduced form: Direct effect of rain on voting
reduced_form <- feols(vote ~ rain, data = data, vcov = "hetero")
# Export results
modelsummary(
list("First Stage" = first_stage,
"2SLS" = iv_model,
"Reduced Form" = reduced_form),
stars = c('*' = 0.10, '**' = 0.05, '***' = 0.01),
output = "iv_results.tex"
)
import pyfixest as pf
# First stage: Does rain affect protest size?
first_stage = pf.feols('protest ~ rain', data=data, vcov='hetero')
# 2SLS: Effect of protest on voting
iv_model = pf.feols(
'vote ~ 1 | 0 | protest ~ rain', data=data, vcov='hetero'
)
# Reduced form: Direct effect of rain on voting
reduced_form = pf.feols('vote ~ rain', data=data, vcov='hetero')
# View results
pf.etable([first_stage, iv_model, reduced_form])
Common IV Pitfalls
Watch Out For
- Weak instruments: F < 10 means your IV estimates are badly biased toward OLS
- Invalid exclusion: If the instrument affects Y directly, your estimate is biased
- LATE vs ATE: IV estimates effects only for "compliers" - this may not generalize
Difference-in-Differences
Difference-in-differences compares changes over time between a treated group and a control group. The key assumption is that both groups would have followed parallel trends in the absence of treatment.
The DiD Logic in One Table
| Before Treatment | After Treatment | Change | |
|---|---|---|---|
| Treated (got the policy) | A | B | B - A |
| Control (no policy) | C | D | D - C |
| DiD Effect = (B - A) - (D - C) | Causal effect | ||
Why this works: The change in the control group (D - C) tells us what would have happened to the treated group without treatment. Subtracting it removes secular trends.
Real-World Example: No-Fault Divorce and Female Suicide
Research Question
Did the introduction of no-fault divorce laws affect female suicide rates? U.S. states adopted these laws at different times between 1969-1985, creating a natural experiment with staggered treatment timing.
The analysis requires setting up several key variables:
- Treatment year — When each state adopted no-fault divorce
- Post indicator — Whether the current year is after treatment for that state
- Time to treatment — Years before/after treatment (for event studies)
Basic 2x2 DiD
* Generate treatment indicator
gen treated = (state == "CA")
gen post = (year >= 2015)
gen treat_post = treated * post
* Basic DiD regression
reg outcome treated post treat_post, cluster(state)
* The coefficient on treat_post is the DiD estimate
* Better: Two-way fixed effects
reghdfe outcome treat_post, absorb(state year) cluster(state)
* With controls (like the divorce study)
reghdfe asmrs treat_post pcinc asmrh cases, absorb(stfips year) cluster(stfips)
pacman::p_load(fixest)
# Generate treatment indicator
data <- data %>%
mutate(
treated = state == "CA",
post = year >= 2015,
treat_post = treated * post
)
# Basic DiD regression
model <- feols(outcome ~ treated + post + treat_post,
data = data, vcov = ~state)
# Two-way fixed effects (preferred)
model <- feols(outcome ~ treat_post | state + year,
data = data, vcov = ~state)
# The treat_post coefficient is the DiD estimate
summary(model)
import pyfixest as pf
# Generate treatment indicator
data['treated'] = (data['state'] == 'CA').astype(int)
data['post'] = (data['year'] >= 2015).astype(int)
data['treat_post'] = data['treated'] * data['post']
# Basic DiD regression
model = pf.feols('outcome ~ treated + post + treat_post', data=data,
vcov={'CRV1': 'state'})
# Two-way fixed effects (preferred)
model = pf.feols('outcome ~ treat_post | state + year', data=data,
vcov={'CRV1': 'state'})
model.summary()
Testing Parallel Trends
The key assumption is that treated and control groups would have followed parallel trends absent treatment. Check pre-treatment trends:
* Create year dummies interacted with treatment
* (Omit the year before treatment as reference)
reghdfe outcome ib2014.year#treated, ///
absorb(state year) cluster(state)
* Plot coefficients
coefplot, keep(*#*) vertical ///
yline(0) xline(4.5, lpattern(dash)) ///
title("Event Study: Pre-trends Check")
pacman::p_load(fixest, ggfixest, dplyr)
# Create relative time variable
# Set never-treated units to -1000 (outside data range)
data <- data %>%
mutate(
rel_time = year - treatment_year,
rel_time = if_else(is.na(rel_time), -1000, rel_time),
treated = !is.na(treatment_year)
)
# Event study regression
# ref = c(-1, -1000) excludes both t=-1 and never-treated (-1000)
model <- feols(
outcome ~ i(rel_time, treated, ref = c(-1, -1000)) | state + year,
data = data, vcov = ~state
)
# Plot with ggiplot (publication-ready)
ggiplot(model) +
labs(title = "Event Study: Pre-trends Check",
x = "Years Relative to Treatment",
y = "Coefficient Estimate") +
geom_hline(yintercept = 0, linetype = "dashed") +
theme_minimal()
import pyfixest as pf
import pandas as pd
# Create relative time variable
# Never-treated units: set to -1000 (outside data range)
data['rel_time'] = data['year'] - data['treatment_year']
data['rel_time'] = data['rel_time'].fillna(-1000).astype(int)
data['treated'] = data['treatment_year'].notna().astype(int)
# Event study regression
# ref=c(-1, -1000) excludes both t=-1 and never-treated
model = pf.feols(
'outcome ~ i(rel_time, treated, ref=c(-1, -1000)) | state + year',
vcov={'CRV1': 'state'}, data=data
)
# Plot
model.iplot()
Event Studies
Event studies show dynamic treatment effects over time, both before (to check parallel trends) and after treatment. They're essential for:
R users: See the fixest walkthrough for comprehensive documentation on event studies with feols().
- Testing parallel trends: Pre-treatment coefficients should be near zero
- Understanding dynamics: Does the effect grow, shrink, or stay constant over time?
- Detecting anticipation: Did units respond before official treatment?
Reading Event Study Plots
Event study coefficients tell a story. Here's how to interpret common patterns:
1. Testing Parallel Trends
Pre-treatment coefficients (before t=0) should be statistically indistinguishable from zero. This suggests treated and control groups were on similar trajectories before treatment.
2. Understanding Effect Dynamics
How the treatment effect evolves over time can reveal important information about the mechanism. Does the policy take time to work? Does the effect persist or fade?
3. Detecting Anticipation Effects
Sometimes units respond before the official treatment date—perhaps because the policy was announced in advance, or because they anticipated it. This shows up as non-zero coefficients just before t=0.
What to Do About Anticipation
- Redefine treatment timing: If a law was announced 6 months before implementation, use the announcement date as t=0
- Report honestly: Acknowledging anticipation is better than hiding it—it may be substantively interesting
- Check the mechanism: Anticipation can help you understand how the policy works (e.g., through expectations vs. direct effects)
Setting Up the Variables
With staggered treatment (different units treated at different times), you need to create a "time to treatment" variable:
* Create treatment year variable (example: no-fault divorce)
gen treatment_year = .
replace treatment_year = 1970 if state == "CA"
replace treatment_year = 1971 if state == "FL"
* ... etc for each treated state
* Leave as missing for never-treated states (controls)
* Create post-treatment indicator
gen treat_post = (year >= treatment_year) & !missing(treatment_year)
* Create time-to-treatment variable
gen time_to_treat = year - treatment_year
replace time_to_treat = 0 if missing(treatment_year) // Never-treated = 0
* Stata can't handle negative factor values, so shift
summ time_to_treat
gen shifted_ttt = time_to_treat - r(min)
# Create treatment year variable
# Never-treated states have NA for treatment_year
data <- data %>%
mutate(
treatment_year = case_when(
state == "CA" ~ 1970,
state == "FL" ~ 1971,
# ... etc for each treated state
TRUE ~ NA_real_ # Never-treated: leave as NA
),
# Binary: ever treated vs never treated
treated = !is.na(treatment_year),
# Post-treatment indicator
treat_post = year >= treatment_year & treated,
# Time to treatment (for event studies)
# IMPORTANT: Set never-treated to -1000 (outside data range)
# Then exclude with ref = c(-1, -1000) in fixest
time_to_treat = if_else(is.na(treatment_year), -1000, year - treatment_year)
)
import pandas as pd
import numpy as np
# Create treatment year variable
# Never-treated states have NaN for treatment_year
treatment_map = {'CA': 1970, 'FL': 1971} # etc.
data['treatment_year'] = data['state'].map(treatment_map)
# Binary: ever treated vs never treated
data['treated'] = data['treatment_year'].notna().astype(int)
# Post-treatment indicator
data['treat_post'] = (
(data['year'] >= data['treatment_year']) & data['treatment_year'].notna()
).astype(int)
# Time to treatment (for event studies)
# IMPORTANT: Set never-treated to -1000 (outside data range)
data['time_to_treat'] = data['year'] - data['treatment_year']
data['time_to_treat'] = data['time_to_treat'].fillna(-1000).astype(int)
Basic Event Study
* Create relative time to treatment
gen rel_time = year - treatment_year
* Create dummies for each relative time period
* (typically bin endpoints and omit t=-1)
forvalues t = -5/5 {
gen rel_`t' = (rel_time == `t')
}
replace rel_m5 = (rel_time <= -5) // Bin early periods
replace rel_5 = (rel_time >= 5) // Bin late periods
* Regression (omit t=-1 as reference)
reghdfe outcome rel_m5-rel_m2 rel_0-rel_5, ///
absorb(state year) cluster(state)
* Plot
coefplot, keep(rel_*) vertical yline(0) ///
xline(3.5, lpattern(dash)) ///
title("Event Study")
pacman::p_load(fixest, ggfixest, ggplot2, dplyr)
# Create relative time variable
# IMPORTANT: Never-treated units don't have a meaningful "time to treatment"
# Set them to -1000 (outside data range), then exclude with ref = c(-1, -1000)
# See: https://lrberge.github.io/fixest/articles/fixest_walkthrough.html
data <- data %>%
mutate(
rel_time = year - treatment_year,
rel_time = if_else(is.na(rel_time), -1000, rel_time),
treated = !is.na(treatment_year)
)
# Optional: bin endpoints to avoid sparse cells (but keep -1000 for never-treated)
data <- data %>%
mutate(rel_time = case_when(
rel_time == -1000 ~ -1000, # Keep never-treated at -1000
rel_time <= -6 ~ -6, # Bin early periods
rel_time >= 6 ~ 6, # Bin late periods
TRUE ~ rel_time
))
# Event study with fixest
# ref = c(-1, -1000) excludes BOTH t=-1 (reference) AND t=-1000 (never-treated)
model <- feols(
outcome ~ i(rel_time, treated, ref = c(-1, -1000)) | state + year,
data = data,
vcov = ~state
)
# Plot with ggiplot (publication-ready)
ggiplot(model) +
labs(title = "Event Study",
x = "Years Relative to Treatment",
y = "Effect on Outcome") +
geom_hline(yintercept = 0, linetype = "dashed") +
geom_vline(xintercept = -0.5, linetype = "dashed", color = "red") +
theme_minimal()
import pyfixest as pf
import numpy as np
# Create relative time variable
# Never-treated: set to -1000 (outside data range)
data['rel_time'] = data['year'] - data['treatment_year']
data['rel_time'] = data['rel_time'].fillna(-1000).astype(int)
data['treated'] = data['treatment_year'].notna().astype(int)
# Optional: bin endpoints
data['rel_time'] = np.where(
data['rel_time'] == -1000, -1000,
np.clip(data['rel_time'], -6, 6)
)
# Event study with pyfixest
# ref=c(-1, -1000) excludes both t=-1 and never-treated
model = pf.feols(
'outcome ~ i(rel_time, treated, ref=c(-1, -1000)) | state + year',
vcov={'CRV1': 'state'}, data=data
)
# Plot
model.iplot()
A Note on Staggered Timing
Recent research has found that DiD estimates can be biased when there is both (A) staggered treatment timing and (B) heterogeneous treatment effects. You don't need to know anything about this for 14.33. If you're interested, ask your TA!
Regression Discontinuity
Regression discontinuity (RD) exploits a sharp threshold—a cutoff rule that determines who gets treated. When treatment is assigned based on whether someone crosses a threshold (e.g., a test score, age, income level), you can compare outcomes for people just barely above and just barely below the cutoff. These nearly-identical groups differ only in treatment status, creating a quasi-experimental setting.
The RD Idea in One Picture
Imagine a college admissions test where anyone with a score ≥ 80 gets in, and anyone with a score < 80 is rejected. People with scores of 80 and 79 are nearly identical in ability, but vastly different in college attendance. If college causes higher earnings, we should see a jump in earnings right at the 80 cutoff.
Why RD Works
At the cutoff, treatment is essentially random. Someone with a test score of 79.9 has nearly the same ability as someone with 80.1, but their treatment status flips. By comparing outcomes just above and below the cutoff, you isolate the effect of treatment from all the underlying characteristics that typically confound it (ability, motivation, socioeconomic status).
The Cutoff Is Key
RD validity depends on the running variable being continuous and manipulation being impossible. For example: you can't improve your age (it's continuous and manipulation is impossible), but you could game a test score (it's discrete and manipulation is possible). The closer you are to the cutoff, the more convincing the design.
Implementation: Simple Local Linear Regression
The simplest approach: fit a line on each side of the cutoff using observations near the cutoff, then compute the gap where the lines meet.
* Create treatment indicator based on cutoff
gen cutoff = 80
gen treated = (test_score >= cutoff)
* Simple RD: linear regression on each side
reg outcome test_score treated if test_score < cutoff + 10 & test_score > cutoff - 10
* Better: allow different slopes on each side
gen score_centered = test_score - cutoff
reg outcome treated score_centered c.score_centered#treated if abs(score_centered) <= 10
* Or use rdrobust (requires installation)
ssc install rdrobust
rdrobust outcome test_score, c(80) h(10)
pacman::p_load(tidyverse, rdrobust)
# Create treatment indicator
data <- data %>%
mutate(
cutoff = 80,
treated = test_score >= cutoff,
score_centered = test_score - cutoff
)
# Simple RD: fit lines on each side of cutoff
library(fixest)
model <- feols(
outcome ~ treated + score_centered + treated:score_centered,
data = data %>% filter(abs(score_centered) <= 10)
)
summary(model)
# Or use rdrobust for optimal bandwidth selection
rd_result <- rdrobust(
y = data$outcome,
x = data$test_score,
c = 80 # cutoff
)
summary(rd_result)
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from sklearn.linear_model import LinearRegression
# Create treatment indicator
data['cutoff'] = 80
data['treated'] = (data['test_score'] >= data['cutoff']).astype(int)
data['score_centered'] = data['test_score'] - data['cutoff']
# Simple RD: fit lines on each side, then find gap at cutoff
# Below cutoff
below = data[data['score_centered'] < 0]
model_below = smf.ols('outcome ~ score_centered', data=below).fit()
# Above cutoff
above = data[data['score_centered'] >= 0]
model_above = smf.ols('outcome ~ score_centered', data=above).fit()
# Predictions at cutoff (score_centered = 0)
pred_below = model_below.predict(pd.DataFrame({'score_centered': [0]})).values[0]
pred_above = model_above.predict(pd.DataFrame({'score_centered': [0]})).values[0]
rd_estimate = pred_above - pred_below
# Or use bandwidth restriction
window = 10
window_data = data[np.abs(data['score_centered']) <= window]
model = smf.ols('outcome ~ treated + score_centered + treated:score_centered',
data=window_data).fit()
print(f"RD Estimate: {model.params['treated']:.4f}")
Checking RD Validity
RD is only convincing if people can't precisely manipulate the running variable to cross the threshold. Check for:
- Continuity at cutoff: Is there a discontinuous jump in the density of observations at the cutoff? (If yes, people may be manipulating.)
- Covariate balance: Do pre-determined characteristics (e.g., age, parental education) show a discontinuity at the cutoff? (If yes, confounders may be driving the result.)
- Placebo tests: Try the same RD design on outcomes that shouldn't be affected by treatment. If you find effects, something is wrong with your design.
* Test covariate balance: does age jump at cutoff?
reg age treated score_centered c.score_centered#treated ///
if abs(score_centered) <= 10
* If the coefficient on 'treated' is near zero,
* age is balanced (good for validity)
* Placebo test: effect on a variable that shouldn't be affected
reg parental_education treated score_centered c.score_centered#treated ///
if abs(score_centered) <= 10
* Density check: are observations bunched above cutoff?
hist test_score if abs(score_centered) <= 5, by(treated)
pacman::p_load(fixest, ggplot2)
# Check covariate balance: does age jump at cutoff?
balance_model <- feols(
age ~ treated + score_centered + treated:score_centered,
data = data %>% filter(abs(score_centered) <= 10)
)
summary(balance_model)
# If treated coefficient ≈ 0, age is balanced (good)
# Placebo test on variable unaffected by treatment
placebo_model <- feols(
parental_education ~ treated + score_centered + treated:score_centered,
data = data %>% filter(abs(score_centered) <= 10)
)
summary(placebo_model)
# Density check: histogram of running variable
ggplot(data %>% filter(abs(score_centered) <= 5),
aes(x = test_score, fill = treated)) +
geom_histogram(bins = 20, alpha = 0.5) +
geom_vline(xintercept = 80, linetype = "dashed") +
theme_minimal() +
labs(title = "Density Check: Bunching at Cutoff?")
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
# Check covariate balance: does age jump at cutoff?
subset = data[data['score_centered'].abs() <= 10]
balance = smf.ols(
'age ~ treated + score_centered + treated:score_centered',
data=subset
).fit()
print(balance.summary())
# If treated coefficient ≈ 0, age is balanced (good)
# Placebo test on variable unaffected by treatment
placebo = smf.ols(
'parental_education ~ treated + score_centered + treated:score_centered',
data=subset
).fit()
print(placebo.summary())
# Density check: histogram of running variable
fig, ax = plt.subplots()
near = data[data['score_centered'].abs() <= 5]
ax.hist(near.loc[near['treated'] == 0, 'test_score'],
bins=20, alpha=0.5, label='Control')
ax.hist(near.loc[near['treated'] == 1, 'test_score'],
bins=20, alpha=0.5, label='Treated')
ax.axvline(80, linestyle='--', color='black')
ax.legend()
ax.set_title('Density Check: Bunching at Cutoff?')
plt.show()
Bandwidth Selection
How close to the cutoff should you focus? Too close, and you have few observations and less precision. Too far, and people further from the cutoff may not be comparable. The rule of thumb: use observations within 1-2 standard deviations of the running variable, or use automated bandwidth selection (like rdrobust does).
RD Estimates Local Effects
RD identifies the effect for people at the cutoff. This is the "local average treatment effect" (LATE). If people far from the cutoff would have very different treatment effects, RD won't tell you about those groups. Always be clear about whom your results apply to.
Found something unclear or have a suggestion? Email [email protected].