Causal Foundations / Potential Outcomes
Causal Inference and Potential Outcomes
The potential-outcomes framework defines a causal effect as the difference between the same unit under treated and untreated worlds; the central problem is that one world is always missing.
Mechanism Lab
Animation: why one unit is missing one counterfactual
The animation draws treated and untreated potential-outcome tracks for each unit, then shows treatment assignment lighting up only one track before moving through estimands, bias, overlap, and adjustment.
Step 1 / 5
Two worlds
Every unit has treated and untreated potential outcomes.
Y_i(1), Y_i(0)Animation Control
Reduced-motion users receive the same step states without continuous motion.
01 / Intuition
Core Intuition
Prediction asks for E[Y|X]. Causal inference asks what would happen to Y if the same unit had D switched from 0 to 1.
Each unit has two potential outcomes, Y_i(1) and Y_i(0), but we observe only Y_i = D_i Y_i(1) + (1-D_i)Y_i(0). This is the fundamental missing-data problem of causal inference.
Identification does not come automatically from a richer regression. It comes from defensible design assumptions: SUTVA, exchangeability, overlap, and a clearly stated estimand.
02 / Math
From missing counterfactuals to identifiable estimators
01 / Unit-level effect
For unit i, the treatment effect is the difference between its treated and untreated potential outcomes. The definition is model-free, but the two outcomes cannot be observed together.
tau_i = Y_i(1) - Y_i(0)
Y_i = D_i Y_i(1) + (1-D_i)Y_i(0)02 / SUTVA
The stable unit treatment value assumption rules out multiple hidden treatment versions and interference across units.
Y_i(d) does not depend on D_j for j != i03 / Estimands
ATE, ATT, and CATE answer different questions and should not be interpreted interchangeably.
ATE = E[Y(1)-Y(0)]
ATT = E[Y(1)-Y(0) | D=1]
CATE(x)=E[Y(1)-Y(0)|X=x]04 / Naive-difference bias
The observed mean difference decomposes into ATT plus selection bias: the treated and control groups may differ even under no treatment.
E[Y|D=1]-E[Y|D=0]
= ATT + {E[Y(0)|D=1]-E[Y(0)|D=0]}05 / Randomization or exchangeability
If treatment is independent of potential outcomes, group means can replace counterfactual means. Observational studies usually require conditional independence given X.
(Y(1),Y(0)) independent D
or (Y(1),Y(0)) independent D | X06 / Overlap
Each covariate region where we want causal inference must contain both treated and control units.
0 < P(D=1|X=x) < 107 / Standardization
Under conditional exchangeability and overlap, recover potential-outcome means by estimating conditional means inside X and averaging over the X distribution.
E[Y(d)] = E_X{ E[Y | D=d, X] }
ATE = E_X{ mu_1(X) - mu_0(X) }08 / IPW
Alternatively, use the propensity score e(X)=P(D=1|X) to reweight observed units into representative treated and untreated worlds.
ATE = E[D Y/e(X)] - E[(1-D)Y/(1-e(X))]03 / Code
Python code: simulate potential outcomes, selection bias, and adjustment
The example builds a simulation with true potential outcomes and observed data. Real studies never observe both `Y0` and `Y1`; they are kept here only to validate the identification formulas.
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression, LinearRegression
rng = np.random.default_rng(42)
n = 5000
# Covariates that affect both treatment selection and outcomes.
x1 = rng.normal(size=n)
x2 = rng.binomial(1, 0.45, size=n)
X = np.column_stack([x1, x2])
# Potential outcomes. In real data, one of these is missing for each unit.
y0 = 2.0 + 0.8 * x1 - 0.5 * x2 + rng.normal(scale=1.0, size=n)
tau = 1.0 + 0.4 * x2
y1 = y0 + tau
# Observational treatment assignment: higher x1 units select into treatment.
logit_e = -0.2 + 1.1 * x1 + 0.6 * x2
e = 1 / (1 + np.exp(-logit_e))
d = rng.binomial(1, e)
y = d * y1 + (1 - d) * y0
df = pd.DataFrame({"Y": y, "D": d, "x1": x1, "x2": x2, "Y0": y0, "Y1": y1})
true_ate = np.mean(y1 - y0)
naive = df.loc[df.D == 1, "Y"].mean() - df.loc[df.D == 0, "Y"].mean()
# Outcome-standardization / g-computation.
mu0_model = LinearRegression().fit(df.loc[df.D == 0, ["x1", "x2"]], df.loc[df.D == 0, "Y"])
mu1_model = LinearRegression().fit(df.loc[df.D == 1, ["x1", "x2"]], df.loc[df.D == 1, "Y"])
mu0_hat = mu0_model.predict(df[["x1", "x2"]])
mu1_hat = mu1_model.predict(df[["x1", "x2"]])
standardized_ate = np.mean(mu1_hat - mu0_hat)
# IPW using an estimated propensity score.
ps_model = LogisticRegression(max_iter=2000).fit(df[["x1", "x2"]], df["D"])
e_hat = ps_model.predict_proba(df[["x1", "x2"]])[:, 1]
e_hat = np.clip(e_hat, 0.02, 0.98)
ipw_ate = np.mean(df["D"] * df["Y"] / e_hat) - np.mean((1 - df["D"]) * df["Y"] / (1 - e_hat))
balance = df.assign(weight=np.where(df.D == 1, 1 / e_hat, 1 / (1 - e_hat))).groupby("D").apply(
lambda g: pd.Series({
"x1_weighted_mean": np.average(g["x1"], weights=g["weight"]),
"x2_weighted_mean": np.average(g["x2"], weights=g["weight"]),
})
)
print({
"true_ate": round(true_ate, 3),
"naive_difference": round(naive, 3),
"standardized_ate": round(standardized_ate, 3),
"ipw_ate": round(ipw_ate, 3),
})
print(balance)04 / Case
Case: causal effect of an online course on project scores
- Question: does joining the StatsPAI online bootcamp improve later project scores? A raw comparison is not credible because students who opt in may already have stronger preparation, more time, or higher motivation.
- Potential outcomes: Y_i(1) is the score if the same student joins the bootcamp, and Y_i(0) is the score if the same student does not. The data reveal only one of them.
- If seats are randomly assigned, exchangeability comes from design. If enrollment is self-selected, the analysis must argue that X is sufficient and must check overlap.
- A credible report states whether the estimand is ATE, ATT, or CATE, then shows sample flow, baseline balance, overlap plots, adjustment models, sensitivity analysis, and remaining selection channels.
05 / Risks
Common Pitfalls
References
- Rubin (1974), Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studieshttps://doi.org/10.1037/h0037350
- Holland (1986), Statistics and Causal Inferencehttps://doi.org/10.1080/01621459.1986.10478354
- Hernan and Robins, Causal Inference: What Ifhttps://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
- Imbens and Rubin, Causal Inferencehttps://www.cambridge.org/core/books/causal-inference-for-statistics-social-and-biomedical-sciences/71126BE90C58F1A431FE9B2DD07938AB