BAC Laws and Hit-and-Run Crashes

A complete replication using difference-in-differences

This worked example walks through a complete research project from start to finish. We replicate the analysis from French & Gumus (2024) "Hit-and-Run or Hit-and-Stay? Unintended Effects of a Stricter BAC Limit" to analyze how 0.08 BAC laws affect fatal hit-and-run crashes.

What You'll Learn

How to formulate a research question with clear identification
How to collect data from multiple sources (FARS, FRED, APIS)
Cleaning, aggregating, and merging state-level panel data
Implementing a difference-in-differences event study
Creating publication-ready tables and figures

1. Research Question + Identification

The Question

Do stricter BAC laws reduce hit-and-run crashes?

Between 1983 and 2004, all 50 U.S. states lowered their legal blood alcohol content (BAC) limit from 0.10 to 0.08. The policy goal was to reduce drunk driving fatalities. But French & Gumus (2024) ask: did this also have unintended effects?

Their hypothesis: stricter penalties might cause drunk drivers who cause crashes to flee the scene (hit-and-run) rather than stay and face arrest. If true, we'd see hit-and-run fatalities increase after 0.08 BAC laws take effect.

The Data

We need:

Outcome: Fatal hit-and-run crashes by state and year
Treatment: When each state adopted the 0.08 BAC limit
Controls: Other policies and economic conditions that affect crashes

The Identification Strategy

We use a difference-in-differences design exploiting the staggered rollout of 0.08 BAC laws:

Sample of adoption dates:

State	Adopted
Utah	1983
Oregon	1983
California	1990
Texas	1999
Minnesota	2005

The identifying assumption:

In the absence of the 0.08 law, hit-and-run fatalities would have followed parallel trends across early-adopting and late-adopting states.

We test this by checking that pre-treatment trends are similar (the "event study" plot).

Question + Data + Identification

Every empirical project needs these three elements clearly stated:

Question: What causal effect are you trying to estimate?
Data: What data will you use? What's the unit of observation?
Identification: Why does your comparison isolate the causal effect?

2. Sourcing Datasets

We need data from multiple sources. Here's where to find each piece:

Fatality Data: FARS

The Fatality Analysis Reporting System (FARS) records every fatal traffic crash in the United States.

Source: NHTSA (National Highway Traffic Safety Administration)
URL: nhtsa.gov/research-data/fatality-analysis-reporting-system-fars
Download: static.nhtsa.gov/nhtsa/downloads/FARS/
Years: 1982-2008

The key variable is HIT_RUN in the Vehicle file, which indicates whether a vehicle fled the scene:

HIT_RUN coding (varies by year):

Pre-2005: 0=No, 1=Yes, 2=Unknown, 3=Not Reported
2005+: 0=No, 1-4=Yes (various categories)

Policy Data: APIS

The Alcohol Policy Information System (APIS) tracks alcohol-related laws by state.

Source: NIAAA
URL: alcoholpolicy.niaaa.nih.gov
Topics: BAC limits (topic 12), Zero Tolerance (topic 13)

Economic Controls: FRED

Economic conditions affect crash rates. We control for:

Unemployment: State unemployment rates from FRED
Income: Per capita personal income (series: {STATE}PCPI)

* Download FARS data (example for one year)
* NHTSA provides ZIP files with CSV data
copy "https://static.nhtsa.gov/nhtsa/downloads/FARS/2000/National/FARS2000NationalCSV.zip" ///
     "fars_2000.zip", replace

* Unzip and import
unzipfile "fars_2000.zip"
import delimited "accident.csv", clear

# Download FARS data
library(tidyverse)
library(httr)

# Download and unzip
url <- "https://static.nhtsa.gov/nhtsa/downloads/FARS/2000/National/FARS2000NationalCSV.zip"
temp <- tempfile()
download.file(url, temp)
unzip(temp, exdir = "fars_2000")

# Read accident file
accident <- read_csv("fars_2000/accident.csv")

import pandas as pd
import requests
import zipfile
import io

# Download FARS data
url = "https://static.nhtsa.gov/nhtsa/downloads/FARS/2000/National/FARS2000NationalCSV.zip"
response = requests.get(url)

# Extract and read
with zipfile.ZipFile(io.BytesIO(response.content)) as z:
    with z.open('accident.csv') as f:
        accident = pd.read_csv(f, encoding='latin-1')

3. Aggregating to State-Year Level

FARS data is at the crash level (one row per crash). We need state-year level data for our analysis.

Step 1: Identify Hit-and-Run Crashes

First, we need to merge the Vehicle file (which has hit-run indicator) with the Accident file:

* Load vehicle file
import delimited "vehicle.csv", clear

* Code hit-run indicator (codes 1-4 = hit-and-run)
gen hit_run = inlist(hit_run, 1, 2, 3, 4)

* Collapse to crash level (any vehicle fled = hit-run crash)
collapse (max) hit_run, by(st_case)
save "vehicle_hitrun.dta", replace

* Merge with accident file
use "accident.dta", clear
merge 1:1 st_case using "vehicle_hitrun.dta"
replace hit_run = 0 if _merge == 1  // No match = not hit-run

# Load vehicle file
vehicle <- read_csv("vehicle.csv")

# Code hit-run indicator
vehicle <- vehicle %>%
  mutate(hit_run = HIT_RUN %in% c(1, 2, 3, 4))

# Collapse to crash level
vehicle_hr <- vehicle %>%
  group_by(ST_CASE) %>%
  summarize(hit_run = max(hit_run, na.rm = TRUE))

# Merge with accident file
accident <- accident %>%
  left_join(vehicle_hr, by = "ST_CASE") %>%
  mutate(hit_run = replace_na(hit_run, 0))

# Load vehicle file
vehicle = pd.read_csv("vehicle.csv", encoding='latin-1')

# Code hit-run indicator
vehicle['hit_run'] = vehicle['HIT_RUN'].isin([1, 2, 3, 4]).astype(int)

# Collapse to crash level
vehicle_hr = vehicle.groupby('ST_CASE')['hit_run'].max().reset_index()

# Merge with accident file
accident = accident.merge(vehicle_hr, on='ST_CASE', how='left')
accident['hit_run'] = accident['hit_run'].fillna(0).astype(int)

Step 2: Aggregate to State-Year

Now collapse to get total fatalities and hit-run fatalities by state and year:

* Generate hr_fatalities = fatalities in hit-run crashes
gen hr_fatalities = fatals * hit_run

* Collapse to state-year
collapse (sum) total_fatalities=fatals hr_fatalities, by(state year)

* Calculate non-hit-run fatalities
gen nhr_fatalities = total_fatalities - hr_fatalities

* Check the data
list in 1/10
summarize

# Aggregate to state-year
state_year <- accident %>%
  mutate(hr_fatalities = FATALS * hit_run) %>%
  group_by(STATE, YEAR) %>%
  summarize(
    total_fatalities = sum(FATALS, na.rm = TRUE),
    hr_fatalities = sum(hr_fatalities, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(nhr_fatalities = total_fatalities - hr_fatalities)

# Check the data
head(state_year, 10)
summary(state_year)

# Aggregate to state-year
accident['hr_fatalities'] = accident['FATALS'] * accident['hit_run']

state_year = accident.groupby(['STATE', 'YEAR']).agg({
    'FATALS': 'sum',
    'hr_fatalities': 'sum'
}).reset_index()

state_year.columns = ['state', 'year', 'total_fatalities', 'hr_fatalities']
state_year['nhr_fatalities'] = state_year['total_fatalities'] - state_year['hr_fatalities']

# Check the data
print(state_year.head(10))
print(state_year.describe())

Example output (state-year level):

state	year	total_fatalities	hr_fatalities	nhr_fatalities
CA	1998	3,494	312	3,182
CA	1999	3,559	298	3,261
TX	1998	3,576	187	3,389
TX	1999	3,519	201	3,318

4. Merging Policy and Control Data

Now we merge in the treatment variable (0.08 BAC adoption dates) and control variables.

Step 1: Merge BAC Adoption Dates

* Load BAC adoption dates (one row per state)
import delimited "bac08_adoption_dates.csv", clear
gen adoption_year = year(date(effective_date, "YMD"))
keep state_fips adoption_year
save "bac_dates.dta", replace

* Merge with fatality data (m:1 merge)
use "state_year_fatalities.dta", clear
merge m:1 state_fips using "bac_dates.dta"

* Create event time and treatment indicator
gen event_time = year - adoption_year
gen treated = (event_time >= 0)

# Load BAC adoption dates
bac_dates <- read_csv("bac08_adoption_dates.csv") %>%
  mutate(
    effective_date = as.Date(effective_date),
    adoption_year = year(effective_date)
  ) %>%
  select(state_fips, adoption_year)

# Merge with fatality data
state_year <- state_year %>%
  left_join(bac_dates, by = "state_fips") %>%
  mutate(
    event_time = year - adoption_year,
    treated = as.integer(event_time >= 0)
  )

# Load BAC adoption dates
bac_dates = pd.read_csv("bac08_adoption_dates.csv")
bac_dates['effective_date'] = pd.to_datetime(bac_dates['effective_date'])
bac_dates['adoption_year'] = bac_dates['effective_date'].dt.year

# Merge with fatality data
state_year = state_year.merge(
    bac_dates[['state_fips', 'adoption_year']],
    on='state_fips',
    how='left'
)

# Create event time and treatment indicator
state_year['event_time'] = state_year['year'] - state_year['adoption_year']
state_year['treated'] = (state_year['event_time'] >= 0).astype(int)

Step 2: Merge Economic Controls

* Merge unemployment data
merge 1:1 state_fips year using "unemployment.dta", nogen

* Merge income data
merge 1:1 state_fips year using "income.dta", nogen

* Create log variables for regression
gen ln_hr_fatalities = ln(hr_fatalities + 1)
gen ln_nhr_fatalities = ln(nhr_fatalities + 1)

# Merge controls
state_year <- state_year %>%
  left_join(unemployment, by = c("state_fips", "year")) %>%
  left_join(income, by = c("state_fips", "year")) %>%
  mutate(
    ln_hr_fatalities = log(hr_fatalities + 1),
    ln_nhr_fatalities = log(nhr_fatalities + 1)
  )

# Merge controls
state_year = state_year.merge(unemployment, on=['state_fips', 'year'], how='left')
state_year = state_year.merge(income, on=['state_fips', 'year'], how='left')

# Create log variables
state_year['ln_hr_fatalities'] = np.log(state_year['hr_fatalities'] + 1)
state_year['ln_nhr_fatalities'] = np.log(state_year['nhr_fatalities'] + 1)

Final Analysis Dataset Structure

Unit of observation: State-year (50 states × 27 years = 1,350 obs)
Outcome: ln_hr_fatalities (log hit-run fatalities)
Treatment: treated (1 if BAC 0.08 law in effect)
Event time: event_time (years since/until adoption)
Fixed effects: State and year

5. Difference-in-Differences Analysis

Simple DiD: Pooled Post-Treatment Effect

The simplest specification estimates a single treatment effect:

* Simple TWFE DiD
reghdfe ln_hr_fatalities treated unemployment, ///
    absorb(state_fips year) cluster(state_fips)

* Interpretation: coefficient on 'treated' is the DiD estimate
* Positive = hit-run fatalities increased after 0.08 law

library(fixest)

# Simple TWFE DiD
did_model <- feols(
  ln_hr_fatalities ~ treated + unemployment | state_fips + year,
  data = state_year,
  cluster = ~state_fips
)

summary(did_model)

from linearmodels.panel import PanelOLS

# Prepare panel data
panel = state_year.set_index(['state_fips', 'year'])

# Simple TWFE DiD
formula = 'ln_hr_fatalities ~ treated + unemployment + EntityEffects + TimeEffects'
model = PanelOLS.from_formula(formula, data=panel)
results = model.fit(cov_type='clustered', cluster_entity=True)
print(results)

Event Study: Dynamic Treatment Effects

To check parallel trends and see how effects evolve over time, we estimate an event study:

* Create event time dummies (bin endpoints at -5 and +10)
gen event_time_binned = event_time
replace event_time_binned = -5 if event_time < -5
replace event_time_binned = 10 if event_time > 10

* Create dummies (omit -1 as reference)
tab event_time_binned, gen(et_)

* Event study regression
reghdfe ln_hr_fatalities et_* unemployment, ///
    absorb(state_fips year) cluster(state_fips)

* Plot coefficients
coefplot, vertical keep(et_*) ///
    yline(0) xline(5.5, lpattern(dash)) ///
    title("Event Study: Hit-and-Run Fatalities") ///
    xtitle("Years Since 0.08 BAC Law") ytitle("Coefficient")

# Event study with fixest
es_model <- feols(
  ln_hr_fatalities ~ i(event_time, ref = -1) + unemployment | state_fips + year,
  data = state_year %>% filter(event_time >= -5, event_time <= 10),
  cluster = ~state_fips
)

# Plot
iplot(es_model,
      main = "Event Study: Hit-and-Run Fatalities",
      xlab = "Years Since 0.08 BAC Law",
      ylab = "Coefficient")

# Create event time dummies
df = state_year.copy()
df['event_time_binned'] = df['event_time'].clip(-5, 10)

# Create dummies (excluding -1)
event_times = sorted([et for et in df['event_time_binned'].unique() if et != -1])
for et in event_times:
    col_name = f'et_{et}' if et >= 0 else f'et_m{abs(et)}'
    df[col_name] = (df['event_time_binned'] == et).astype(int)

# Run regression and extract coefficients
# ... (see full code in repository)

6. Publication-Ready Output

Event Study Plot

The key result is the event study plot showing dynamic treatment effects:

Event study coefficients with 95% confidence intervals. Reference period: t = -1.

Interpreting the Event Study

Pre-trends (t < 0): Coefficients should be close to zero and statistically insignificant. This supports the parallel trends assumption.
Treatment effects (t ≥ 0): Positive coefficients indicate hit-and-run fatalities increased after the 0.08 BAC law.
Pattern: The effect appears to grow over time, suggesting a persistent behavioral shift.

Creating Publication Tables

* Export regression table
esttab did_hr did_nhr using "table2.tex", ///
    cells(b(star fmt(3)) se(par fmt(3))) ///
    star(* 0.10 ** 0.05 *** 0.01) ///
    stats(N r2, fmt(%9.0fc %9.3f) labels("Observations" "R-squared")) ///
    mtitles("Hit-Run" "Non-Hit-Run") ///
    title("Effect of 0.08 BAC Laws on Fatalities") ///
    replace

library(modelsummary)

# Create table
modelsummary(
  list("Hit-Run" = did_hr, "Non-Hit-Run" = did_nhr),
  stars = c('*' = .1, '**' = .05, '***' = .01),
  gof_map = c("nobs", "r.squared"),
  output = "table2.tex"
)

# Export coefficients for LaTeX
coef_df.to_csv("event_study_coefficients.csv", index=False)

# For full LaTeX table generation, see the
# generate_report.py script in the repository

Raw Data Visualization

Always visualize your raw data before running regressions:

Hit-and-run share of total fatalities, by treatment timing. Vertical dashed lines show adoption years.

Key Takeaways

What We Found

Consistent with French & Gumus (2024), we find that 0.08 BAC laws are associated with an increase in hit-and-run fatalities. The effect grows over time, suggesting a persistent behavioral response where drunk drivers who cause crashes flee the scene to avoid the stricter penalties.

Workflow Summary

Question + Identification: Clear causal question with credible research design
Data Collection: Multiple sources (FARS, APIS, FRED)
Aggregation: Crash-level → State-year panel
Merging: Combine fatalities + treatment + controls
Analysis: TWFE DiD with event study
Output: Publication-ready figures and tables

Full Code Repository

The complete replication code is available in the course materials at:

teaching/14.33/tutorial/worked-examples/bac-hit-and-run/

← Back to Tutorials