StatsPAI Is Now Open Source: A Unified Python Toolkit for Causal Inference and Applied Econometrics

If you've ever worked on an empirical research paper, you know the pain: OLS in Stata, causal forests in R, data wrangling in Python, then back to Stata for publication tables. Three languages, six packages, zero consistency.

Today, we're releasing StatsPAI as an open-source Python package to end this fragmentation — once and for all.

bash

pip install statspai

One line. One language. 150+ econometric and causal inference methods at your fingertips.

Why We Built StatsPAI

The motivation is simple: empirical researchers deserve better tools.

For decades, applied econometrics has been split across ecosystems. Stata dominates classical regression. R leads in cutting-edge causal inference (DID, RD, causal forests). Python, despite being the most popular programming language in the world, has been a second-class citizen in this space — scattered across statsmodels, linearmodels, econml, dowhy, and dozens of smaller packages, each with its own API conventions, output formats, and documentation styles.

This fragmentation creates real costs:

Cognitive overhead: Researchers must learn and context-switch between multiple languages and API patterns
Reproducibility barriers: Multi-language pipelines are harder to share, review, and replicate
Lost productivity: Hours spent formatting tables manually, converting between output formats, and debugging cross-package incompatibilities
Accessibility gap: Junior researchers and students face a steep learning curve just to run standard causal inference methods

StatsPAI was born from a conviction: if Python can power self-driving cars and large language models, it should be able to run a proper difference-in-differences analysis with publication-ready output. Before vs. After StatsPAI

What StatsPAI Offers

StatsPAI is not just another wrapper. It's a ground-up redesign of how econometric analysis should work in Python.

A Unified API Across 150+ Methods

Every method in StatsPAI — from basic OLS to neural causal models — returns a consistent result object with the same interface:

python

import statspai as sp

# Classical OLS
result = sp.ols("y ~ x1 + x2 + x3", data=df)
result.summary()
result.plot()
result.to_latex()

# Difference-in-Differences (Callaway & Sant'Anna)
result = sp.did_cs(y="outcome", g="group", t="time", data=df)
result.summary()
result.plot()
result.to_latex()

# Causal Forest
result = sp.causal_forest(y="outcome", t="treatment", x=covariates, data=df)
result.summary()
result.plot()
result.to_latex()

Same pattern. Same methods. No surprises.

Eight Major Module Categories

StatsPAI covers the complete empirical research toolkit:

1. Classical Econometrics — OLS, IV/2SLS, panel data (fixed/random effects), quantile regression, Tobit, Heckman selection models, dynamic panel GMM

2. Difference-in-Differences — Classic 2x2 DID, staggered adoption designs (Callaway & Sant'Anna, Sun & Abraham), Goodman-Bacon decomposition, parallel trends sensitivity analysis

3. Regression Discontinuity — Sharp and fuzzy RD with bias-corrected inference, McCrary density tests, optimal bandwidth selection, and rich visualization

4. Matching & Reweighting — Propensity score matching, Mahalanobis distance, coarsened exact matching, entropy balancing, IPW, and augmented IPW

5. Synthetic Control — Abadie-Diamond-Hainmueller SCM, Synthetic DID, with full placebo inference and donor pool diagnostics

6. Modern ML Causal Methods — Double/debiased machine learning, causal forests, meta-learners (S/T/X/R/DR-Learner), TMLE, and neural causal models (TARNet, CFRNet, DragonNet)

7. Robustness & Diagnostics — Specification curves, Oster coefficient stability bounds, Sensemakr sensitivity analysis, automated robustness reports, and heterogeneity/subgroup analysis

8. Publication-Ready Output — modelsummary(), outreg2(), balance tables, coefficient plots — export directly to Word, LaTeX, HTML, and Excel with unified formatting

Publication-Ready from Day One

One of our strongest opinions: researchers should never manually format a regression table. StatsPAI generates publication-quality output by default:

python

# Multi-model comparison table, exported to Word
sp.modelsummary(
    [model1, model2, model3],
    stars=True,
    output="results_table.docx"
)

# Automated robustness battery
sp.robustness_report(
    base_model=model,
    data=df,
    output="robustness_check.html"
)

No more copying numbers into Excel. No more manually aligning columns in LaTeX. The output is ready for submission.

Our Mission: Python for the AI Era of Econometrics

StatsPAI is more than a package — it represents a thesis about where empirical research is heading.

The Convergence Is Happening

The boundaries between "traditional econometrics" and "machine learning" are dissolving. Double machine learning uses random forests to estimate treatment effects. Causal forests combine tree-based methods with Neyman-Rubin potential outcomes. Neural causal models use deep learning architectures for counterfactual prediction.

These methods don't belong to separate toolkits. They belong together.

StatsPAI is built on the conviction that a researcher should be able to start with an OLS baseline, move to a DID design, add a causal forest for heterogeneity analysis, and generate a publication-ready robustness report — all within the same workflow, the same language, the same API.

Why Python?

Python is where the AI ecosystem lives. PyTorch, TensorFlow, Hugging Face, LangChain — the tools that define modern AI are all Python-native. As causal inference increasingly integrates with machine learning, the language of econometrics needs to be the language of AI.

StatsPAI makes this possible. You can use StatsPAI alongside any ML framework, in any Jupyter notebook, on any cloud platform. No Stata license fees. No R interoperability headaches. Just Python.

Open Source as a Public Good

We release StatsPAI under the MIT License — free to use, modify, and distribute. This is a deliberate choice.

The tools for rigorous empirical research should not be gated behind expensive software licenses. A PhD student in Nairobi should have access to the same econometric toolkit as a professor at Stanford. Open source is how we make that real.

We also believe open source produces better science. When your analysis code is a single pip install away from reproducibility, peer review becomes more meaningful. When methods are implemented in a transparent, inspectable codebase, methodological debates can focus on substance rather than implementation details.

Getting Started

Installation

bash

pip install statspai

Requires Python 3.9+. Core dependencies (NumPy, SciPy, Pandas, statsmodels, scikit-learn) are installed automatically.

Quick Example: A Complete DID Analysis in 10 Lines

python

import statspai as sp
import pandas as pd

# Load data
df = pd.read_csv("policy_evaluation.csv")

# Run staggered DID (Callaway & Sant'Anna)
result = sp.did_cs(y="outcome", g="first_treat", t="year", data=df)

# View results
result.summary()
result.plot()

# Export to LaTeX
result.to_latex("did_results.tex")

That's it. No boilerplate. No configuration files. No separate data processing pipeline.

From StatsPAI's GitHub Repository

The full source code, documentation, and contribution guidelines are available on GitHub:

github.com/brycewang-stanford/StatsPAI

We welcome contributions — whether it's a bug fix, a new method implementation, or documentation improvements. Every pull request makes the toolkit better for the entire research community.

What's Next

This initial release is just the beginning. Our roadmap includes:

Interactive documentation with runnable examples for every method
Integration with CoPaper.AI for AI-assisted paper writing powered by StatsPAI's computation engine
GPU acceleration for large-scale causal forest and neural causal model estimation
Bayesian extensions including Bayesian causal forests and posterior predictive checks
Community-contributed methods — we're building the infrastructure for researchers to contribute their own implementations

Join the Community

StatsPAI is built by the team behind CoPaper.AI, emerging from Stanford's Rural Education Action Program (REAP). Our team has seen firsthand how methodological barriers limit research impact — especially in development economics, where rigorous causal evidence can directly inform policy decisions that affect millions of lives.

If you believe that better tools lead to better research, and better research leads to better policy — StatsPAI is for you.

bash

pip install statspai

Star us on GitHub. Try it in your next project. File an issue if something doesn't work. Together, we can make Python the definitive language for empirical research in the AI era.

StatsPAI is developed by StatsPAI Inc. and maintained as an open-source project under the MIT License. For questions, feedback, or collaboration inquiries, visit our GitHub repository or reach out to the team.