Programming Tips for Research
Running scripts from the command line, logging, and cross-platform considerations
This is an auxiliary module covering practical programming habits that make your research more reproducible and efficient. These skills aren't essential, but they'll save you time and prevent bugs.
Software Is Just Text Files
Here's something that's easy to miss when you're starting out: Stata code, R code, and Python code are all just plain text files. A .do file is a text file. A .R file is a text file. A .py file is a text file. You could open any of them in Notepad and read them.
Programs like the Stata visual editor or RStudio make running that code easier and prettier, but they aren't necessary. They're just convenient wrappers around the text. Since code is just text, you can run it directly from the command line:
stata -b do my_analysis.do # Run a Stata .do file
Rscript my_analysis.R # Run an R script
python3 my_analysis.py # Run a Python script
That's it. No GUI needed. This is how most researchers actually run their code in practice, especially for long-running analyses.
I use VS Code as my text editor for everything. This lets me:
- See programs in different languages in one place. I can have a Stata
.dofile, an R script, and a Python script all open side by side. - Use helpful extensions. VS Code has extensions for syntax highlighting, linters (which catch mistakes before you run the code), AI agents, and more.
- Use the integrated terminal. VS Code has a built-in command line, so I can run scripts directly from the terminal right next to the code I'm editing.
You can download VS Code for free at code.visualstudio.com. Other popular text editors include Sublime Text, Atom, and Cursor (which is built on VS Code with extra AI features).
You don't need to use any of these, but if you find yourself switching between the Stata editor, RStudio, and a terminal window, a single editor that handles all of them can simplify your workflow.
Unix vs. Windows: What You Need to Know
Most empirical economists work on either Mac/Linux (Unix-based) or Windows. They work differently, and it matters for your scripts.
Key Differences
| Feature | Unix (Mac/Linux) | Windows |
| Command Line | bash, zsh (built-in) | PowerShell, cmd.exe |
| Path Separator | / (forward slash) |
\ (backslash) |
| Home Directory | ~ (shorthand) |
%USERPROFILE% |
| Example Path | /Users/alice/projects/analysis |
C:\Users\alice\projects\analysis |
| Line Endings | LF (line feed) | CRLF (carriage return + line feed) |
Writing Cross-Platform Code
If you want your code to work on both Unix and Windows, be careful about file paths. The safest approach:
- Always use forward slashes (
/), even on Windows. Modern software handles this correctly. - Avoid hardcoding paths. Instead, use relative paths or environment variables.
- Use language-specific path functions. Most languages have built-in functions to handle paths correctly.
* Stata: Use forward slashes or ~
* Good (works everywhere)
use "~/projects/data/mydata.dta"
* Also works
use "/Users/alice/projects/data/mydata.dta"
# R: Use file.path() for automatic path handling
my_file <- file.path("~", "projects", "data", "mydata.csv")
data <- read.csv(my_file)
# Or use forward slashes directly
data <- read.csv("~/projects/data/mydata.csv")
from pathlib import Path
# Python: Use Path for cross-platform handling
my_file = Path.home() / "projects" / "data" / "mydata.csv"
df = pd.read_csv(my_file)
# Or with forward slashes
df = pd.read_csv("~/projects/data/mydata.csv")
Running Scripts from the Command Line
Running your analysis scripts from the terminal (not through an IDE) is important for reproducibility. This ensures your code works without relying on the IDE's environment.
Making Your Software Accessible
When you run a script from the terminal, your computer needs to know where to find stata, Rscript, or python. Check if they're already accessible:
which stata
which Rscript
which python3
If any of these return "not found," you need to add them to your PATH. This tells your system where to look for executables.
Adding Software to Your PATH (Mac/Linux)
Edit your shell configuration file. On Mac, this is usually ~/.zshrc (newer Macs) or ~/.bash_profile (older). On Linux, it's ~/.bashrc.
nano ~/.zshrc
Add lines like these (adjust paths to where your software is actually installed):
export PATH="/Applications/Stata/StataIC.app/Contents/MacOS:$PATH"
export PATH="/usr/local/bin:$PATH" # R usually installs here
Save and exit (Ctrl+X, then Y, then Enter in nano). Reload your shell:
source ~/.zshrc
Now test again:
which stata
Running Scripts from the Command Line
Once your software is in your PATH, you can run scripts directly:
# Run a Stata script
stata -b do analysis.do
# Run an R script
Rscript analysis.R
# Run a Python script
python3 analysis.py
Master Scripts
Consider creating a master script that runs all your analysis scripts in order. This way, you can regenerate your entire analysis with a single command: stata -b do master.do or Rscript master.R. This is the gold standard for reproducibility.
Logging for Reproducibility
A "log file" records everything your script outputs—results, error messages, warnings. It's invaluable for debugging and for proving that your code works.
Why Log?
- Debugging: When something goes wrong, the log shows you exactly what happened.
- Reproducibility: You have a permanent record of every analysis run, including date/time and results.
- Collaboration: You can share the log with coauthors to show your analysis worked.
- Journal requirements: Some journals ask for logs to verify reproducibility.
Creating Log Files
Most statistical software can automatically create logs. Here's how:
* Start logging at the beginning of your script
log using "analysis.log", replace
* Your analysis code here
summarize age income
regress outcome x1 x2
* Close the log at the end
log close
# Using the logger package for clean logging
library(logger)
# Set up logging to both console and file
log_appender(appender_tee("analysis.log"))
# Your analysis code with logging
log_info("Starting analysis...")
log_info("Running regression...")
# ... rest of your code
log_info("Analysis complete.")
import logging
# Set up logging to both console and file
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('analysis.log'),
logging.StreamHandler()
]
)
# Your analysis code with logging
logging.info("Starting analysis...")
logging.info("Running regression...")
# ... rest of your code
logging.info("Analysis complete.")
Viewing the Log
After running your script, open the log file to see all output:
# View the log file
cat analysis.log
# Or open it in your editor
nano analysis.log
Bash and Other Scripting Languages
In addition to Stata, R, and Python, you'll encounter "shell scripts" (bash on Mac/Linux, PowerShell on Windows). These are useful for automating tasks that don't fit neatly into your analysis workflow.
When to Use Bash/PowerShell
- Downloading files: Fetch data from the internet
- File management: Rename, move, or organize many files at once
- Orchestrating multiple scripts: Run your Stata, R, and Python scripts in sequence
A Simple Example
Here's a bash script that runs a mixed-language analysis pipeline:
#!/bin/bash
echo "Starting analysis at $(date)"
# Step 1: Clean raw data (Python is great for this)
python3 scripts/01_clean_data.py
# Step 2: Build analysis dataset (Stata for complex merges)
stata -b do scripts/02_build_dataset.do
# Step 3: Run regressions (R for modern DiD estimators)
Rscript scripts/03_analysis.R
# Step 4: Generate tables (Stata's estout is convenient)
stata -b do scripts/04_tables.do
echo "Done at $(date)"
Save this as master.sh, make it executable, and run it:
chmod +x master.sh
./master.sh
This runs scripts in multiple languages in sequence, using each language where it excels.
Don't Worry Too Much
You don't need to become a bash expert. For most empirical economics work, a simple master script in R or Stata is sufficient. Use bash only if it saves you significant time or makes your workflow cleaner.
Found something unclear or have a suggestion? Email [email protected].