Research and Communication in Economics

Q&A

Common questions and coding issues from the class

This page collects real questions from students in the course (anonymized) along with explanations and fixes. If you're running into a problem, chances are someone else has hit the same thing.

Invalid Characters in Variable Names

Question

I'm trying to create event-study dummies using a foreach loop, but I keep getting syntax errors no matter what I try. Here's my code:

* Event dummies (k = -1 omitted as baseline)
foreach kk in -5 -4 -3 -2 0 1 2 3 4 5 {
    gen ev_`kk' = (D == 1 & kb == `kk')
}

I've tried putting everything on one line and splitting it across multiple lines, but I get invalid syntax r(198) either way. What am I doing wrong?

Answer

The issue isn't with the loop structure or the braces—it's with the negative numbers in the variable names.

When the loop reaches kk = -5, Stata substitutes the value into the variable name, so gen ev_`kk' becomes:

gen ev_-5 = (D == 1 & kb == -5)

But ev_-5 is not a valid Stata variable name—variable names can only contain letters, digits, and underscores. Stata reads ev_-5 as "ev_ minus 5" and throws a syntax error.

The fix is to replace the minus sign with something else (like m) in the variable name, while keeping the actual negative value for the condition:

* Event dummies (k = -1 omitted as baseline)
foreach kk in -5 -4 -3 -2 0 1 2 3 4 5 {
    local vname = subinstr("`kk'", "-", "m", .)
    gen ev_`vname' = (D == 1 & kb == `kk')
}

This creates variables named ev_m5, ev_m4, ..., ev_0, ev_1, ..., ev_5. When you use these dummies later (e.g., in a regression), remember that ev_m5 means k = −5.

Variable Naming Rule

Stata variable names can only contain letters, digits, and underscores. Any time your loop values might produce characters outside this set (hyphens, spaces, periods), you need to transform the value before using it in a variable name.

Found something unclear or have a suggestion? Email [email protected].