Agent Systems / Empirical Research
Using StatsPAI for Research: From Data and a Question to an Audited Causal Estimate
Turn 'run a regression' into an agent-native workflow: detect design -> recommend estimator -> fit (handle) -> audit robustness -> sensitivity -> verifiable citations.
Start Here
What you should be able to do
Understand the agent-native causal workflow: detect -> recommend -> fit -> audit -> sensitivity -> cite.
Use a result handle (result_id) to chain an estimate into downstream tools without re-passing beta / sigma.
Know that audit enumerates the missing robustness checks, turning identification into a checklist.
Run at least one sensitivity analysis (e.g., E-value) to quantify robustness to unobserved confounding.
Keep citation discipline: cite only from a verifiable bib, never letting the model fabricate references.
Learning Path
Learning path: detect -> recommend -> fit -> audit -> cite
Read StatsPAI along this path: detect the design, recommend an estimator, fit to a handle, audit robustness and run sensitivity, then close with verifiable citations.
Step 1
Detect
Data shape -> design.
Step 2
Recommend
Design -> robust estimator.
Step 3
Fit
A fit returns a result handle.
Step 4
Audit
Enumerate missing checks and coverage.
Step 5
Cite
Cite only from a verifiable bib.
01 / Intuition
Core Intuition
Most people treat stats software as a set of commands to memorize. StatsPAI makes it a workflow: first ask what design this is, then what estimator to use.
Identification always precedes estimation: on the same panel, mistaking a staggered DiD for ordinary TWFE biases the estimate; classify the design first.
The result handle is the key abstraction: fit once to get a result_id that downstream audit, sensitivity, and tables all reference, keeping the evidence chain intact.
Credibility is not 'printing a star' but 'audit coverage + sensitivity + verifiable citations' — consistent with the whole course's causal spirit.
02 / Math
Formalizing the agent-native causal workflow
01 / Detect design
Map the data shape to a design: panel + staggered treatment -> DiD, a cutoff -> RD, an instrument -> IV, else selection on observables.
02 / Recommend estimator
Each design maps to a robust estimator, avoiding misuse (e.g., TWFE is biased under staggered timing).
03 / Estimand (ATT)
Most policy evaluations target the average treatment effect on the treated.
04 / Result handle
A fit returns a handle encapsulating coefficients, variance, and diagnostics for downstream tools.
05 / Audit coverage
Auditing turns identification into a checklist: the share of recommended checks that are done.
06 / Sensitivity (E-value)
The E-value measures how strong unobserved confounding must be to explain away the effect; larger is more robust.
03 / Code
Code cases: the workflow from detection to audit
Use plain Python to simulate the StatsPAI workflow logic: detect a design, recommend an estimator, chain with a handle, audit coverage, E-value sensitivity, and citation discipline. Real projects use the StatsPAI MCP tools.
Case 1: recommend a robust estimator by design
Once the design is identified, have StatsPAI recommend a matching estimator to avoid method misuse.
RECOMMEND = {
"staggered_did": "callaway_santanna",
"regression_discontinuity": "rdrobust",
"instrumental_variables": "ivreg",
"selection_on_observables": "dml",
}
design = "staggered_did"
print(f"design={design} -> recommended estimator: {RECOMMEND[design]}")
print("reason: TWFE is biased under heterogeneous treatment timing")Expected output
design=staggered_did -> recommended estimator: callaway_santanna
reason: TWFE is biased under heterogeneous treatment timingHow to read this code
- TWFE is biased under staggered timing, so callaway_santanna is recommended.
- Design-driven estimation is the first step of 'using the right method.'
Case 2: chain estimation and audit with a result handle
A fit returns a result_id that downstream audit references directly, without re-passing beta / sigma.
results = {}
def fit(estimator, as_handle=True):
rid = f"res_{len(results)+1}"
results[rid] = {"estimator": estimator, "att": 0.073, "se": 0.021}
return rid
def audit_result(result_id):
r = results[result_id]
return {"id": result_id, "t_stat": round(r["att"] / r["se"], 2),
"missing": ["pretrends_test", "honest_did", "sensitivity"]}
rid = fit("callaway_santanna", as_handle=True)
print("handle:", rid)
print("audit:", audit_result(rid))Expected output
handle: res_1
audit: {'id': 'res_1', 't_stat': 3.48, 'missing': ['pretrends_test', 'honest_did', 'sensitivity']}How to read this code
- The handle keeps the evidence chain intact across multi-step analysis.
- Audit reads the handle to report the t-stat and the still-missing checks.
Case 3: audit coverage turns identification into a checklist
List the recommended robustness checks; done / missing is immediately visible.
recommended = {"pretrends_test", "honest_did", "sensitivity", "placebo", "cluster_se"}
done = {"pretrends_test", "cluster_se"}
coverage = len(done) / len(recommended)
print(f"robustness coverage = {coverage:.0%}")
print("still missing:", sorted(recommended - done))Expected output
robustness coverage = 40%
still missing: ['honest_did', 'placebo', 'sensitivity']How to read this code
- Coverage = done / recommended turns identification from post-hoc defense into a pre-flight checklist.
- A clear gap tells you what to add next.
Case 4: E-value sensitivity analysis
The E-value measures how strong unobserved confounding must be to explain away the effect; larger is more robust.
import math
def e_value(rr):
if rr < 1:
rr = 1 / rr
return round(rr + math.sqrt(rr * (rr - 1)), 2)
for rr in (1.2, 1.5, 2.0):
print(f"RR={rr} -> E-value = {e_value(rr)}")Expected output
RR=1.2 -> E-value = 1.69
RR=1.5 -> E-value = 2.37
RR=2.0 -> E-value = 3.41How to read this code
- The E-value grows with the effect size.
- It turns 'could there be an omitted variable?' into a reportable number.
Case 5: citation discipline — reject fabricated references
All citations must come from a verifiable bib; any key not in the library is rejected.
VERIFIED_BIB = {
"callaway2021": "Callaway & Sant'Anna (2021), J. Econometrics",
"goodmanbacon2021": "Goodman-Bacon (2021), J. Econometrics",
}
def cite(keys):
out, bad = [], []
for k in keys:
(out if k in VERIFIED_BIB else bad).append(k)
return {"cited": [VERIFIED_BIB[k] for k in out], "rejected_invented": bad}
print(cite(["callaway2021", "smith2099"]))Expected output
{'cited': ["Callaway & Sant'Anna (2021), J. Econometrics"], 'rejected_invented': ['smith2099']}How to read this code
- The model's biggest danger is not 'looking machine-like' but fabricating plausible citations.
- bibtex (paper.bib as the single source) is the only trustworthy source.
04 / Case
Case: evaluate a staggered policy rollout with the StatsPAI workflow
- Question: a policy rolled out across regions at different times; estimate its average effect on firm investment from a region-year panel.
- detect_design classifies it as staggered DiD; recommend suggests callaway_santanna because TWFE is biased by negative weights under staggered timing.
- The fit returns a result handle; audit_result flags missing pretrends, honest DiD, and placebo checks and reports coverage.
- After adding sensitivity (E-value / honest DiD), report the estimate + interval + audit coverage + sensitivity + limitations; all citations go through bibtex, with no fabrication.
05 / Causal
Bridge to causal inference: collapse the whole course into one auditable pipeline
StatsPAI unifies the designs from the first three weeks (DiD / IV / RD / synthetic control / DML) into one agent-native pipeline: identification first, estimation in the middle, audit and sensitivity after, citations to close. It is the concrete form of 'using AI for causal inference.'
01 / Design drives estimation (design -> estimator)
Classify the design first, then let StatsPAI recommend a matching estimator to avoid method misuse.
02 / Unbroken evidence chain (handle -> downstream)
Use result_id to chain estimation, audit, sensitivity, tables, and citations into a rerunnable evidence chain.
03 / Audit is identification (audit -> checklist)
Turn identification assumptions into an audit checklist with quantifiable coverage and fillable gaps.
04 / Verifiable citations (claim -> citation)
Every conclusion maps to verifiable references and code, with no fabrication.
Three red lines: (1) identification comes from design — tools only swap functional form, not defend your assumptions; (2) an estimate without audit and sensitivity is not a conclusion; (3) citations go only through a verifiable bib, and the model must not fabricate references.
06 / Risks
Common Pitfalls
Resources
Hands-on downloads
References
- StatsPAIhttps://www.statspai.com
- Callaway & Sant'Anna (2021), Difference-in-Differences with Multiple Time Periods, J. Econometricshttps://doi.org/10.1016/j.jeconom.2020.12.001
- VanderWeele & Ding (2017), Sensitivity Analysis in Observational Research: Introducing the E-Value, Annals of Internal Medicinehttps://doi.org/10.7326/M16-2607
- Athey & Imbens (2017), The State of Applied Econometrics: Causality and Policy Evaluation, JEPhttps://doi.org/10.1257/jep.31.2.3
- Model Context Protocolhttps://modelcontextprotocol.io