Back to Blog
StatsPAIAI AgentPythonCausal InferenceEconometrics

StatsPAI: The Statistics Toolkit Built for the AI Agent Era

2026-04-04Bryce Wang
StatsPAI: The Statistics Toolkit Built for the AI Agent Era
bash
pip install statspai
python
import statspai as sp

A Python econometrics and causal inference toolkit designed for the AI Agent era. 150+ functions, unified API, MIT license.

StatsPAI's core positioning: a statistics package built for Agent-driven development. As AI coding tools — Claude Code, Codex, Cursor — increasingly participate in empirical research, the ecosystem needs a unified, stable, and API-standardized Python statistics package to handle computation. StatsPAI is built for exactly this scenario: designed for researchers, but optimized for agents.

Why Python

Stata and R have deep roots in empirical research, but they are both domain-specific languages. Python is a general-purpose language, and this gives it one decisive advantage: new algorithms ship faster in Python than in Stata or R.

In Stata, new methods wait for StataCorp's official releases or community-written .ado files. In R, new packages go through the CRAN review pipeline. In Python's ecosystem, when a new causal inference paper drops, AI coding tools can implement, test, and publish the algorithm to PyPI within days. The iteration speed of scikit-learn, PyTorch, and transformers has already proven this.

StatsPAI's goal is to bring this speed advantage to empirical research: leverage AI-assisted coding to rapidly implement frontier algorithms, so researchers and agents can use them immediately.

And Python naturally connects to data engineering, machine learning, and cloud computing. A researcher can clean data with pandas, run double machine learning with sp.dml(), train a custom model with PyTorch — all in the same environment, with no data export or language switching. That's an experience Stata and R simply cannot offer.

What StatsPAI Does

Anyone in empirical research knows Stata and R. Stata has built a highly unified econometric command system over 40 years — from reg to xtreg to ivregress, consistent syntax, smooth workflow. R's CRAN hosts dozens of causal inference packages — fixest, did, rdrobust, grf — cutting-edge methods with broad coverage, where new paper implementations often land first.

On the Python side, data science infrastructure has long been industry standard — pandas, scikit-learn, TensorFlow, PyTorch form the bedrock of AI. But for empirical research, there's never been a unified, convenient tool. statsmodels handles regression, linearmodels handles panels, EconML handles heterogeneous treatment effects, differences handles DID — each with its own API, its own output format. Running one paper requires installing a pile of packages and playing glue engineer.

StatsPAI takes the best of both worlds — Stata's command system and R's causal inference coverage — and unifies them in a single Python package:

python
import statspai as sp

# Classical econometrics
sp.regress("wage ~ edu + exp", data=df, robust=class="syn-string">'hc1')
sp.ivreg("wage ~ (edu ~ parent_edu) + exp", data=df)
sp.panel(df, "wage ~ edu + exp", entity=class="syn-string">'id', time=class="syn-string">'year', model=class="syn-string">'fe')

# Causal inference
sp.did(df, y=class="syn-string">'wage', treat=class="syn-string">'policy', time=class="syn-string">'year', id=class="syn-string">'worker')
sp.rdrobust(df, y=class="syn-string">'score', x=class="syn-string">'running_var', c=0)
sp.synth(df, treat_unit=1, outcome=class="syn-string">'y', time=class="syn-string">'year', unit=class="syn-string">'id')

# ML causal
sp.dml(df, y=class="syn-string">'wage', treat=class="syn-string">'training', covariates=[class="syn-string">'age', class="syn-string">'edu'])
sp.causal_forest("y ~ treatment | x1 + x2 + x3", data=df)

# All results, same interface
result.summary()
result.plot()
result.to_docx()       # Word
result.to_latex()      # LaTeX
result.cite()          # BibTeX for the method

# Publication tables in one line
sp.modelsummary(r1, r2, r3, output=class="syn-string">'results.docx')

All functions share a unified API design. All results return a standardized CausalResult object. Publication-ready tables export directly to Word, Excel, LaTeX, and HTML. Stata users will find sp.regress(), sp.margins(), sp.test() familiar. R users will recognize sp.callaway_santanna() and sp.rdrobust().

Method Coverage

150+ public functions. Here's an overview by category:

Classical Econometricsregress, ivreg, panel, heckman, qreg, tobit, xtabond

DID Familydid, callaway_santanna, sun_abraham, bacon_decomposition, honest_did

Regression Discontinuityrdrobust, rdplot, rddensity

Matching & Reweightingmatch (PSM / Mahalanobis / CEM), ebalance

Synthetic Controlsynth, sdid

ML Causaldml, causal_forest, metalearner (S/T/X/R/DR), tmle, aipw, deepiv

Neural Causaltarnet, cfrnet, dragonnet

Causal Discoverynotears, pc_algorithm

Policy Learningpolicy_tree, policy_value

Moredose_response, multi_treatment, lee_bounds, manski_bounds, spillover, g_estimation, bunching, mc_panel, causal_impact, mediate, bartik, conformal_cate, bcf

Post-estimationmargins, test, lincom, oster_bounds, sensemakr, evalue, hausman_test, het_test, reset_test, vif

Automated Robustness (one-click solutions that neither Stata nor R offer)

  • spec_curve() — Specification curve / multiverse analysis
  • robustness_report() — Automatically varies standard errors, winsorization, control variables, subsamples
  • subgroup_analysis() — Heterogeneity analysis + forest plots + Wald tests
Publication Tablesmodelsummary, outreg2, sumstats, balance_table, tab, coefplot, binscatter

Designed for Agents

When an AI Agent conducts empirical research, the workflow has two layers: Skills orchestrate "what to do," and the statistics package executes "how to do it."

A Skill might specify: "Run Callaway-Sant'Anna staggered DID → test parallel trends → export Word table." Under the hood, the Agent needs to call concrete Python functions. If the underlying layer consists of seven or eight scattered packages, the Agent must juggle different APIs and adapt between formats. With StatsPAI, it's three lines of sp.xxx().

StatsPAI is designed with Agent invocation in mind:

  • Unified function signatures — Agents don't need to memorize each package's unique parameter style
  • Standardized result objects — All methods return the same type of object, enabling standardized downstream processing
  • Built-in publication output.to_docx(), .to_latex() work out of the box, no extra assembly required
  • Method-level citations.cite() returns BibTeX, making automated reference generation easy
As agents play an increasingly central role in academic research, the uniformity and reliability of the underlying statistics package will become ever more critical. This is StatsPAI's most fundamental long-term value.

Timeline

  • 2025.07 — StatsPAI v0.1.0 published on PyPI. Open source first, starting with classical econometrics and publication tables
  • 2025.08 — StatsPAI Inc. incorporated. A sustainable entity to support the open-source project
  • 2025.12 — [CoPaper.AI](https://copaper.ai) launches. An AI-assisted empirical research co-authoring platform built on StatsPAI
  • 2026.04 — v0.3.1. Three major version iterations, 150+ functions, covering classical econometrics through frontier ML causal methods
Package first, then company, then product. StatsPAI is and always will be MIT-licensed and open source.

What's Next

StatsPAI is iterating rapidly. The current 150+ common functions are just the starting point, with significant new features shipping weekly.

Near-term directions:

  • Performance — Computational efficiency for large-scale panel data, benchmarking against Stata's compiled backend and R's fixest
  • Method expansion — Survey design, spatial econometrics, structural estimation — closing the remaining gaps where Stata and R still lead
  • Ecosystem integration — Deeper fusion with pandas, scikit-learn, PyTorch, accelerating the convergence of AI and causal inference
  • AI adaptation — Structured function documentation and standardized interfaces for Agents and Skills, lowering the barrier for AI invocation
  • Rapid frontier implementation — Leveraging Python + AI coding speed advantages to implement new paper methods as soon as they're published
StatsPAI aims to become foundational infrastructure for Python-based empirical research — analogous to what scikit-learn is to machine learning and pandas is to data processing. This takes time, and it takes community. The ecosystem is in a period of rapid growth, with each version adding new methods, optimizing existing implementations, and refining API consistency. We'll maintain this pace.

Get Involved

StatsPAI is an open ecosystem, and all forms of participation are welcome:

  • Use itpip install statspai, try it in your next empirical analysis
  • Feedback — Tell us what's broken or what could be better on [GitHub Issues](https://github.com/brycewang-stanford/statspai/issues)
  • Suggest — What methods do you want most? What features? Let us know
  • Contribute — Fork & PR, build with us

GitHub: [github.com/brycewang-stanford/statspai](https://github.com/brycewang-stanford/statspai)

CoPaper.AI: [copaper.ai](https://copaper.ai)

StatsPAI Inc. · Stanford REAP Program