Research and Communication in Economics

Programming Tips for Research

Running scripts from the command line, logging, and cross-platform considerations

This is an auxiliary module covering practical programming habits that make your research more reproducible and efficient. These skills aren't essential, but they'll save you time and prevent bugs.

Software Is Just Text Files

Here's something that's easy to miss when you're starting out: Stata code, R code, and Python code are all just plain text files. A .do file is a text file. A .R file is a text file. A .py file is a text file. You could open any of them in Notepad and read them.

Programs like the Stata visual editor or RStudio make running that code easier and prettier, but they aren't necessary. They're just convenient wrappers around the text. Since code is just text, you can run it directly from the command line:

stata -b do my_analysis.do    # Run a Stata .do file
Rscript my_analysis.R          # Run an R script
python3 my_analysis.py         # Run a Python script

That's it. No GUI needed. This is how most researchers actually run their code in practice, especially for long-running analyses.

I use VS Code as my text editor for everything. This lets me:

  • See programs in different languages in one place. I can have a Stata .do file, an R script, and a Python script all open side by side.
  • Use helpful extensions. VS Code has extensions for syntax highlighting, linters (which catch mistakes before you run the code), AI agents, and more.
  • Use the integrated terminal. VS Code has a built-in command line, so I can run scripts directly from the terminal right next to the code I'm editing.

You can download VS Code for free at code.visualstudio.com. Other popular text editors include Sublime Text, Atom, and Cursor (which is built on VS Code with extra AI features).

You don't need to use any of these, but if you find yourself switching between the Stata editor, RStudio, and a terminal window, a single editor that handles all of them can simplify your workflow.

Unix vs. Windows: What You Need to Know

Most empirical economists work on either Mac/Linux (Unix-based) or Windows. They work differently, and it matters for your scripts.

Key Differences

Feature Unix (Mac/Linux) Windows
Command Line bash, zsh (built-in) PowerShell, cmd.exe
Path Separator / (forward slash) \ (backslash)
Home Directory ~ (shorthand) %USERPROFILE%
Example Path /Users/alice/projects/analysis C:\Users\alice\projects\analysis
Line Endings LF (line feed) CRLF (carriage return + line feed)

Writing Cross-Platform Code

If you want your code to work on both Unix and Windows, be careful about file paths. The safest approach:

  • Always use forward slashes (/), even on Windows. Modern software handles this correctly.
  • Avoid hardcoding paths. Instead, use relative paths or environment variables.
  • Use language-specific path functions. Most languages have built-in functions to handle paths correctly.
* Stata: Use forward slashes or ~
* Good (works everywhere)
use "~/projects/data/mydata.dta"

* Also works
use "/Users/alice/projects/data/mydata.dta"
# R: Use file.path() for automatic path handling
my_file <- file.path("~", "projects", "data", "mydata.csv")
data <- read.csv(my_file)

# Or use forward slashes directly
data <- read.csv("~/projects/data/mydata.csv")
from pathlib import Path

# Python: Use Path for cross-platform handling
my_file = Path.home() / "projects" / "data" / "mydata.csv"
df = pd.read_csv(my_file)

# Or with forward slashes
df = pd.read_csv("~/projects/data/mydata.csv")

Running Scripts from the Command Line

Running your analysis scripts from the terminal (not through an IDE) is important for reproducibility. This ensures your code works without relying on the IDE's environment.

Making Your Software Accessible

When you run a script from the terminal, your computer needs to know where to find stata, Rscript, or python. Check if they're already accessible:

which stata
which Rscript
which python3

If any of these return "not found," you need to add them to your PATH. This tells your system where to look for executables.

Adding Software to Your PATH (Mac/Linux)

Edit your shell configuration file. On Mac, this is usually ~/.zshrc (newer Macs) or ~/.bash_profile (older). On Linux, it's ~/.bashrc.

nano ~/.zshrc

Add lines like these (adjust paths to where your software is actually installed):

export PATH="/Applications/Stata/StataIC.app/Contents/MacOS:$PATH"
export PATH="/usr/local/bin:$PATH"  # R usually installs here

Save and exit (Ctrl+X, then Y, then Enter in nano). Reload your shell:

source ~/.zshrc

Now test again:

which stata

Running Scripts from the Command Line

Once your software is in your PATH, you can run scripts directly:

# Run a Stata script
stata -b do analysis.do

# Run an R script
Rscript analysis.R

# Run a Python script
python3 analysis.py

Master Scripts

Consider creating a master script that runs all your analysis scripts in order. This way, you can regenerate your entire analysis with a single command: stata -b do master.do or Rscript master.R. This is the gold standard for reproducibility.

Logging for Reproducibility

A "log file" records everything your script outputs—results, error messages, warnings. It's invaluable for debugging and for proving that your code works.

Why Log?

  • Debugging: When something goes wrong, the log shows you exactly what happened.
  • Reproducibility: You have a permanent record of every analysis run, including date/time and results.
  • Collaboration: You can share the log with coauthors to show your analysis worked.
  • Journal requirements: Some journals ask for logs to verify reproducibility.

Creating Log Files

Most statistical software can automatically create logs. Here's how:

* Start logging at the beginning of your script
log using "analysis.log", replace

* Your analysis code here
summarize age income
regress outcome x1 x2

* Close the log at the end
log close
# Using the logger package for clean logging
library(logger)

# Set up logging to both console and file
log_appender(appender_tee("analysis.log"))

# Your analysis code with logging
log_info("Starting analysis...")
log_info("Running regression...")
# ... rest of your code

log_info("Analysis complete.")
import logging

# Set up logging to both console and file
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('analysis.log'),
        logging.StreamHandler()
    ]
)

# Your analysis code with logging
logging.info("Starting analysis...")
logging.info("Running regression...")
# ... rest of your code

logging.info("Analysis complete.")

Viewing the Log

After running your script, open the log file to see all output:

# View the log file
cat analysis.log

# Or open it in your editor
nano analysis.log

Bash and Other Scripting Languages

In addition to Stata, R, and Python, you'll encounter "shell scripts" (bash on Mac/Linux, PowerShell on Windows). These are useful for automating tasks that don't fit neatly into your analysis workflow.

When to Use Bash/PowerShell

  • Downloading files: Fetch data from the internet
  • File management: Rename, move, or organize many files at once
  • Orchestrating multiple scripts: Run your Stata, R, and Python scripts in sequence

A Simple Example

Here's a bash script that runs a mixed-language analysis pipeline:

#!/bin/bash

echo "Starting analysis at $(date)"

# Step 1: Clean raw data (Python is great for this)
python3 scripts/01_clean_data.py

# Step 2: Build analysis dataset (Stata for complex merges)
stata -b do scripts/02_build_dataset.do

# Step 3: Run regressions (R for modern DiD estimators)
Rscript scripts/03_analysis.R

# Step 4: Generate tables (Stata's estout is convenient)
stata -b do scripts/04_tables.do

echo "Done at $(date)"

Save this as master.sh, make it executable, and run it:

chmod +x master.sh
./master.sh

This runs scripts in multiple languages in sequence, using each language where it excels.

Don't Worry Too Much

You don't need to become a bash expert. For most empirical economics work, a simple master script in R or Stata is sufficient. Use bash only if it saves you significant time or makes your workflow cleaner.

← LaTeX All Tutorials →

Found something unclear or have a suggestion? Email [email protected].