Causal Identification / Matching

Propensity Score Matching

PSM compresses high-dimensional covariates into a one-dimensional treatment probability, then compares treated units with similar controls on common support.

Mechanism Lab

Animation: how PSM forms matched pairs on common support

The animation projects treated and control units onto the propensity-score axis, then reveals overlap, nearest neighbors, caliper screening, and balance diagnostics.

Step 1 / 5

Raw scores

Estimate each unit’s treatment probability and project covariates onto one score axis.

e(X)=P(D=1|X)

Animation Control

Reduced-motion users receive the same step states without continuous motion.

01 / Intuition

Core Intuition

The propensity score e(X) is the probability of treatment conditional on covariates. It is a summary of selection into treatment, not the treatment effect.

Matching aims to balance pre-treatment covariates, not to make outcomes look similar.

PSM still requires conditional ignorability: confounders that affect both treatment and outcomes must be observed in X.

02 / Math

From ignorability to a one-dimensional matching estimator

01 / Potential outcomes

Let D indicate treatment and Y(1), Y(0) be potential outcomes. PSM starts from selection-on-observables and overlap.

(Y(1), Y(0)) independent of D | X
0 < P(D=1|X) < 1

02 / Propensity score

The propensity score is the conditional treatment probability. It compresses X while preserving the assignment probability.

e(X) = P(D=1 | X)

03 / Balancing proof

For discrete X and any x with e(x)=p, Bayes rule shows that the distribution of X among treated units at score p equals the score-stratum distribution.

P(X=x | D=1, e(X)=p)
= P(D=1 | X=x, e(X)=p) P(X=x | e(X)=p) / P(D=1 | e(X)=p)
= p P(X=x | e(X)=p) / p
= P(X=x | e(X)=p)

04 / Ignorability transfer

If treatment is independent of potential outcomes given X and e(X) balances X, then conditioning on e(X) is sufficient.

(Y(1), Y(0)) independent of D | e(X)

05 / ATT estimator

For each treated unit i, find nearby controls J(i), build its untreated counterfactual with weights w_ij, then average across treated units.

tau_ATT_hat = (1/N_T) sum_{i:D_i=1} [Y_i - sum_{j:D_j=0} w_ij Y_j]
w_ij >= 0, sum_j w_ij = 1

06 / Balance diagnostics

After matching, inspect standardized mean differences rather than reporting only the matching algorithm.

SMD_k = (mean(X_k|D=1) - mean(X_k|D=0, matched)) / sqrt((s_Tk^2 + s_Ck^2)/2)

03 / Code

Python code: propensity scores, nearest-neighbor matching, and balance table

This runnable-style skeleton estimates treatment probabilities, restricts common support, performs nearest-neighbor matching, and reports ATT plus balance.

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors

# df columns:
# outcome, treated, age, baseline_score, income, school_size
covariates = ["age", "baseline_score", "income", "school_size"]
X = df[covariates]
D = df["treated"].astype(int)

ps_model = LogisticRegression(max_iter=2000)
ps_model.fit(X, D)
df = df.copy()
df["pscore"] = ps_model.predict_proba(X)[:, 1]

# Common support: keep treated and control observations whose scores overlap.
treat = df[df["treated"] == 1].copy()
control = df[df["treated"] == 0].copy()
lower = max(treat["pscore"].min(), control["pscore"].min())
upper = min(treat["pscore"].max(), control["pscore"].max())
support = df[df["pscore"].between(lower, upper)].copy()

treat = support[support["treated"] == 1].copy()
control = support[support["treated"] == 0].copy()

matcher = NearestNeighbors(n_neighbors=1, metric="euclidean")
matcher.fit(control[["pscore"]])
distance, index = matcher.kneighbors(treat[["pscore"]])

matched_control = control.iloc[index[:, 0]].copy()
matched_control.index = treat.index

att = (treat["outcome"] - matched_control["outcome"]).mean()
print({"ATT": att, "matched_pairs": len(treat), "max_distance": float(distance.max())})

def standardized_mean_difference(left, right, columns):
    rows = []
    for col in columns:
        pooled_sd = np.sqrt((left[col].var() + right[col].var()) / 2)
        rows.append({
            "covariate": col,
            "smd": (left[col].mean() - right[col].mean()) / pooled_sd,
        })
    return pd.DataFrame(rows)

before = standardized_mean_difference(
    df[df["treated"] == 1],
    df[df["treated"] == 0],
    covariates,
)
after = standardized_mean_difference(treat, matched_control, covariates)
print(before.assign(stage="before"))
print(after.assign(stage="after"))

04 / Case

Case: comparable controls for an education support program

Question: does joining an education support program improve later test scores?
Participation depends on age, baseline scores, family income, and school size, so raw comparisons mix treatment effects with selection bias.
The PSM workflow estimates participation probabilities, drops observations outside common support, then matches treated students to similar nonparticipants.
A credible report includes the ATT, balance plots before and after matching, common-support diagnostics, caliper sensitivity, and a discussion of unobserved confounding.

05 / Risks

Common Pitfalls

Treating the propensity model as the outcome model and skipping balance checks.

Forcing matches outside common support and extrapolating from incomparable controls.

Assuming PSM solves unobserved confounding; it only adjusts for observed covariates included in X.