StatsPAI: The Statistics Toolkit Built for the AI Agent Era
pip install statspaiimport statspai as spA Python econometrics and causal inference toolkit designed for the AI Agent era. 150+ functions, unified API, MIT license.
StatsPAI's core positioning: a statistics package built for Agent-driven development. As AI coding tools — Claude Code, Codex, Cursor — increasingly participate in empirical research, the ecosystem needs a unified, stable, and API-standardized Python statistics package to handle computation. StatsPAI is built for exactly this scenario: designed for researchers, but optimized for agents.
Why Python
Stata and R have deep roots in empirical research, but they are both domain-specific languages. Python is a general-purpose language, and this gives it one decisive advantage: new algorithms ship faster in Python than in Stata or R.
In Stata, new methods wait for StataCorp's official releases or community-written .ado files. In R, new packages go through the CRAN review pipeline. In Python's ecosystem, when a new causal inference paper drops, AI coding tools can implement, test, and publish the algorithm to PyPI within days. The iteration speed of scikit-learn, PyTorch, and transformers has already proven this.
StatsPAI's goal is to bring this speed advantage to empirical research: leverage AI-assisted coding to rapidly implement frontier algorithms, so researchers and agents can use them immediately.
And Python naturally connects to data engineering, machine learning, and cloud computing. A researcher can clean data with pandas, run double machine learning with sp.dml(), train a custom model with PyTorch — all in the same environment, with no data export or language switching. That's an experience Stata and R simply cannot offer.
What StatsPAI Does
Anyone in empirical research knows Stata and R. Stata has built a highly unified econometric command system over 40 years — from reg to xtreg to ivregress, consistent syntax, smooth workflow. R's CRAN hosts dozens of causal inference packages — fixest, did, rdrobust, grf — cutting-edge methods with broad coverage, where new paper implementations often land first.
On the Python side, data science infrastructure has long been industry standard — pandas, scikit-learn, TensorFlow, PyTorch form the bedrock of AI. But for empirical research, there's never been a unified, convenient tool. statsmodels handles regression, linearmodels handles panels, EconML handles heterogeneous treatment effects, differences handles DID — each with its own API, its own output format. Running one paper requires installing a pile of packages and playing glue engineer.
StatsPAI takes the best of both worlds — Stata's command system and R's causal inference coverage — and unifies them in a single Python package:
import statspai as sp
# Classical econometrics
sp.regress("wage ~ edu + exp", data=df, robust=class="syn-string">'hc1')
sp.ivreg("wage ~ (edu ~ parent_edu) + exp", data=df)
sp.panel(df, "wage ~ edu + exp", entity=class="syn-string">'id', time=class="syn-string">'year', model=class="syn-string">'fe')
# Causal inference
sp.did(df, y=class="syn-string">'wage', treat=class="syn-string">'policy', time=class="syn-string">'year', id=class="syn-string">'worker')
sp.rdrobust(df, y=class="syn-string">'score', x=class="syn-string">'running_var', c=0)
sp.synth(df, treat_unit=1, outcome=class="syn-string">'y', time=class="syn-string">'year', unit=class="syn-string">'id')
# ML causal
sp.dml(df, y=class="syn-string">'wage', treat=class="syn-string">'training', covariates=[class="syn-string">'age', class="syn-string">'edu'])
sp.causal_forest("y ~ treatment | x1 + x2 + x3", data=df)
# All results, same interface
result.summary()
result.plot()
result.to_docx() # Word
result.to_latex() # LaTeX
result.cite() # BibTeX for the method
# Publication tables in one line
sp.modelsummary(r1, r2, r3, output=class="syn-string">'results.docx')All functions share a unified API design. All results return a standardized CausalResult object. Publication-ready tables export directly to Word, Excel, LaTeX, and HTML. Stata users will find sp.regress(), sp.margins(), sp.test() familiar. R users will recognize sp.callaway_santanna() and sp.rdrobust().
Method Coverage
150+ public functions. Here's an overview by category:
Classical Econometrics — regress, ivreg, panel, heckman, qreg, tobit, xtabond
DID Family — did, callaway_santanna, sun_abraham, bacon_decomposition, honest_did
Regression Discontinuity — rdrobust, rdplot, rddensity
Matching & Reweighting — match (PSM / Mahalanobis / CEM), ebalance
Synthetic Control — synth, sdid
ML Causal — dml, causal_forest, metalearner (S/T/X/R/DR), tmle, aipw, deepiv
Neural Causal — tarnet, cfrnet, dragonnet
Causal Discovery — notears, pc_algorithm
Policy Learning — policy_tree, policy_value
More — dose_response, multi_treatment, lee_bounds, manski_bounds, spillover, g_estimation, bunching, mc_panel, causal_impact, mediate, bartik, conformal_cate, bcf
Post-estimation — margins, test, lincom, oster_bounds, sensemakr, evalue, hausman_test, het_test, reset_test, vif
Automated Robustness (one-click solutions that neither Stata nor R offer)
spec_curve()— Specification curve / multiverse analysisrobustness_report()— Automatically varies standard errors, winsorization, control variables, subsamplessubgroup_analysis()— Heterogeneity analysis + forest plots + Wald tests
modelsummary, outreg2, sumstats, balance_table, tab, coefplot, binscatter
Designed for Agents
When an AI Agent conducts empirical research, the workflow has two layers: Skills orchestrate "what to do," and the statistics package executes "how to do it."
A Skill might specify: "Run Callaway-Sant'Anna staggered DID → test parallel trends → export Word table." Under the hood, the Agent needs to call concrete Python functions. If the underlying layer consists of seven or eight scattered packages, the Agent must juggle different APIs and adapt between formats. With StatsPAI, it's three lines of sp.xxx().
StatsPAI is designed with Agent invocation in mind:
- Unified function signatures — Agents don't need to memorize each package's unique parameter style
- Standardized result objects — All methods return the same type of object, enabling standardized downstream processing
- Built-in publication output —
.to_docx(),.to_latex()work out of the box, no extra assembly required - Method-level citations —
.cite()returns BibTeX, making automated reference generation easy
Timeline
- 2025.07 — StatsPAI v0.1.0 published on PyPI. Open source first, starting with classical econometrics and publication tables
- 2025.08 — StatsPAI Inc. incorporated. A sustainable entity to support the open-source project
- 2025.12 — [CoPaper.AI](https://copaper.ai) launches. An AI-assisted empirical research co-authoring platform built on StatsPAI
- 2026.04 — v0.3.1. Three major version iterations, 150+ functions, covering classical econometrics through frontier ML causal methods
What's Next
StatsPAI is iterating rapidly. The current 150+ common functions are just the starting point, with significant new features shipping weekly.
Near-term directions:
- Performance — Computational efficiency for large-scale panel data, benchmarking against Stata's compiled backend and R's
fixest - Method expansion — Survey design, spatial econometrics, structural estimation — closing the remaining gaps where Stata and R still lead
- Ecosystem integration — Deeper fusion with
pandas,scikit-learn,PyTorch, accelerating the convergence of AI and causal inference - AI adaptation — Structured function documentation and standardized interfaces for Agents and Skills, lowering the barrier for AI invocation
- Rapid frontier implementation — Leveraging Python + AI coding speed advantages to implement new paper methods as soon as they're published
scikit-learn is to machine learning and pandas is to data processing. This takes time, and it takes community. The ecosystem is in a period of rapid growth, with each version adding new methods, optimizing existing implementations, and refining API consistency. We'll maintain this pace.
Get Involved
StatsPAI is an open ecosystem, and all forms of participation are welcome:
- Use it —
pip install statspai, try it in your next empirical analysis - Feedback — Tell us what's broken or what could be better on [GitHub Issues](https://github.com/brycewang-stanford/statspai/issues)
- Suggest — What methods do you want most? What features? Let us know
- Contribute — Fork & PR, build with us
GitHub: [github.com/brycewang-stanford/statspai](https://github.com/brycewang-stanford/statspai)
CoPaper.AI: [copaper.ai](https://copaper.ai)
StatsPAI Inc. · Stanford REAP Program