Causal Identification / Matching
Propensity Score Matching
PSM compresses high-dimensional covariates into a one-dimensional treatment probability, then compares treated units with similar controls on common support.
Mechanism Lab
Animation: how PSM forms matched pairs on common support
The animation projects treated and control units onto the propensity-score axis, then reveals overlap, nearest neighbors, caliper screening, and balance diagnostics.
Step 1 / 5
Raw scores
Estimate each unit’s treatment probability and project covariates onto one score axis.
e(X)=P(D=1|X)Animation Control
Reduced-motion users receive the same step states without continuous motion.
01 / Intuition
Core Intuition
The propensity score e(X) is the probability of treatment conditional on covariates. It is a summary of selection into treatment, not the treatment effect.
Matching aims to balance pre-treatment covariates, not to make outcomes look similar.
PSM still requires conditional ignorability: confounders that affect both treatment and outcomes must be observed in X.
02 / Math
From ignorability to a one-dimensional matching estimator
01 / Potential outcomes
Let D indicate treatment and Y(1), Y(0) be potential outcomes. PSM starts from selection-on-observables and overlap.
(Y(1), Y(0)) independent of D | X
0 < P(D=1|X) < 102 / Propensity score
The propensity score is the conditional treatment probability. It compresses X while preserving the assignment probability.
e(X) = P(D=1 | X)03 / Balancing proof
For discrete X and any x with e(x)=p, Bayes rule shows that the distribution of X among treated units at score p equals the score-stratum distribution.
P(X=x | D=1, e(X)=p)
= P(D=1 | X=x, e(X)=p) P(X=x | e(X)=p) / P(D=1 | e(X)=p)
= p P(X=x | e(X)=p) / p
= P(X=x | e(X)=p)04 / Ignorability transfer
If treatment is independent of potential outcomes given X and e(X) balances X, then conditioning on e(X) is sufficient.
(Y(1), Y(0)) independent of D | e(X)05 / ATT estimator
For each treated unit i, find nearby controls J(i), build its untreated counterfactual with weights w_ij, then average across treated units.
tau_ATT_hat = (1/N_T) sum_{i:D_i=1} [Y_i - sum_{j:D_j=0} w_ij Y_j]
w_ij >= 0, sum_j w_ij = 106 / Balance diagnostics
After matching, inspect standardized mean differences rather than reporting only the matching algorithm.
SMD_k = (mean(X_k|D=1) - mean(X_k|D=0, matched)) / sqrt((s_Tk^2 + s_Ck^2)/2)03 / Code
Python code: propensity scores, nearest-neighbor matching, and balance table
This runnable-style skeleton estimates treatment probabilities, restricts common support, performs nearest-neighbor matching, and reports ATT plus balance.
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors
# df columns:
# outcome, treated, age, baseline_score, income, school_size
covariates = ["age", "baseline_score", "income", "school_size"]
X = df[covariates]
D = df["treated"].astype(int)
ps_model = LogisticRegression(max_iter=2000)
ps_model.fit(X, D)
df = df.copy()
df["pscore"] = ps_model.predict_proba(X)[:, 1]
# Common support: keep treated and control observations whose scores overlap.
treat = df[df["treated"] == 1].copy()
control = df[df["treated"] == 0].copy()
lower = max(treat["pscore"].min(), control["pscore"].min())
upper = min(treat["pscore"].max(), control["pscore"].max())
support = df[df["pscore"].between(lower, upper)].copy()
treat = support[support["treated"] == 1].copy()
control = support[support["treated"] == 0].copy()
matcher = NearestNeighbors(n_neighbors=1, metric="euclidean")
matcher.fit(control[["pscore"]])
distance, index = matcher.kneighbors(treat[["pscore"]])
matched_control = control.iloc[index[:, 0]].copy()
matched_control.index = treat.index
att = (treat["outcome"] - matched_control["outcome"]).mean()
print({"ATT": att, "matched_pairs": len(treat), "max_distance": float(distance.max())})
def standardized_mean_difference(left, right, columns):
rows = []
for col in columns:
pooled_sd = np.sqrt((left[col].var() + right[col].var()) / 2)
rows.append({
"covariate": col,
"smd": (left[col].mean() - right[col].mean()) / pooled_sd,
})
return pd.DataFrame(rows)
before = standardized_mean_difference(
df[df["treated"] == 1],
df[df["treated"] == 0],
covariates,
)
after = standardized_mean_difference(treat, matched_control, covariates)
print(before.assign(stage="before"))
print(after.assign(stage="after"))04 / Case
Case: comparable controls for an education support program
- Question: does joining an education support program improve later test scores?
- Participation depends on age, baseline scores, family income, and school size, so raw comparisons mix treatment effects with selection bias.
- The PSM workflow estimates participation probabilities, drops observations outside common support, then matches treated students to similar nonparticipants.
- A credible report includes the ATT, balance plots before and after matching, common-support diagnostics, caliper sensitivity, and a discussion of unobserved confounding.
05 / Risks
Common Pitfalls
References
- Rosenbaum and Rubin (1983), The Central Role of the Propensity Scorehttps://doi.org/10.1093/biomet/70.1.41
- Austin (2011), An Introduction to Propensity Score Methodshttps://doi.org/10.1080/00273171.2011.568786
- Imbens and Rubin (2015), Causal Inferencehttps://www.cambridge.org/core/books/causal-inference/71126BE90C58F1A431FE9B2DD07938AB