Causal Identification / Matching

Propensity Score Matching

PSM compresses high-dimensional covariates into a one-dimensional treatment probability, then compares treated units with similar controls on common support.

Mechanism Lab

Animation: how PSM forms matched pairs on common support

The animation projects treated and control units onto the propensity-score axis, then reveals overlap, nearest neighbors, caliper screening, and balance diagnostics.

Step 1 / 5

Raw scores

Estimate each unit’s treatment probability and project covariates onto one score axis.

e(X)=P(D=1|X)

Animation Control

Reduced-motion users receive the same step states without continuous motion.

01 / Intuition

Core Intuition

The propensity score e(X) is the probability of treatment conditional on covariates. It is a summary of selection into treatment, not the treatment effect.

Matching aims to balance pre-treatment covariates, not to make outcomes look similar.

PSM still requires conditional ignorability: confounders that affect both treatment and outcomes must be observed in X.

02 / Math

From ignorability to a one-dimensional matching estimator

01 / Potential outcomes

Let D indicate treatment and Y(1), Y(0) be potential outcomes. PSM starts from selection-on-observables and overlap.

(Y(1), Y(0)) independent of D | X
0 < P(D=1|X) < 1

02 / Propensity score

The propensity score is the conditional treatment probability. It compresses X while preserving the assignment probability.

e(X) = P(D=1 | X)

03 / Balancing proof

For discrete X and any x with e(x)=p, Bayes rule shows that the distribution of X among treated units at score p equals the score-stratum distribution.

P(X=x | D=1, e(X)=p)
= P(D=1 | X=x, e(X)=p) P(X=x | e(X)=p) / P(D=1 | e(X)=p)
= p P(X=x | e(X)=p) / p
= P(X=x | e(X)=p)

04 / Ignorability transfer

If treatment is independent of potential outcomes given X and e(X) balances X, then conditioning on e(X) is sufficient.

(Y(1), Y(0)) independent of D | e(X)

05 / ATT estimator

For each treated unit i, find nearby controls J(i), build its untreated counterfactual with weights w_ij, then average across treated units.

tau_ATT_hat = (1/N_T) sum_{i:D_i=1} [Y_i - sum_{j:D_j=0} w_ij Y_j]
w_ij >= 0, sum_j w_ij = 1

06 / Balance diagnostics

After matching, inspect standardized mean differences rather than reporting only the matching algorithm.

SMD_k = (mean(X_k|D=1) - mean(X_k|D=0, matched)) / sqrt((s_Tk^2 + s_Ck^2)/2)

03 / Code

Python code: propensity scores, nearest-neighbor matching, and balance table

This runnable-style skeleton estimates treatment probabilities, restricts common support, performs nearest-neighbor matching, and reports ATT plus balance.

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors

# df columns:
# outcome, treated, age, baseline_score, income, school_size
covariates = ["age", "baseline_score", "income", "school_size"]
X = df[covariates]
D = df["treated"].astype(int)

ps_model = LogisticRegression(max_iter=2000)
ps_model.fit(X, D)
df = df.copy()
df["pscore"] = ps_model.predict_proba(X)[:, 1]

# Common support: keep treated and control observations whose scores overlap.
treat = df[df["treated"] == 1].copy()
control = df[df["treated"] == 0].copy()
lower = max(treat["pscore"].min(), control["pscore"].min())
upper = min(treat["pscore"].max(), control["pscore"].max())
support = df[df["pscore"].between(lower, upper)].copy()

treat = support[support["treated"] == 1].copy()
control = support[support["treated"] == 0].copy()

matcher = NearestNeighbors(n_neighbors=1, metric="euclidean")
matcher.fit(control[["pscore"]])
distance, index = matcher.kneighbors(treat[["pscore"]])

matched_control = control.iloc[index[:, 0]].copy()
matched_control.index = treat.index

att = (treat["outcome"] - matched_control["outcome"]).mean()
print({"ATT": att, "matched_pairs": len(treat), "max_distance": float(distance.max())})

def standardized_mean_difference(left, right, columns):
    rows = []
    for col in columns:
        pooled_sd = np.sqrt((left[col].var() + right[col].var()) / 2)
        rows.append({
            "covariate": col,
            "smd": (left[col].mean() - right[col].mean()) / pooled_sd,
        })
    return pd.DataFrame(rows)

before = standardized_mean_difference(
    df[df["treated"] == 1],
    df[df["treated"] == 0],
    covariates,
)
after = standardized_mean_difference(treat, matched_control, covariates)
print(before.assign(stage="before"))
print(after.assign(stage="after"))

04 / Case

Case: comparable controls for an education support program

  • Question: does joining an education support program improve later test scores?
  • Participation depends on age, baseline scores, family income, and school size, so raw comparisons mix treatment effects with selection bias.
  • The PSM workflow estimates participation probabilities, drops observations outside common support, then matches treated students to similar nonparticipants.
  • A credible report includes the ATT, balance plots before and after matching, common-support diagnostics, caliper sensitivity, and a discussion of unobserved confounding.

05 / Risks

Common Pitfalls

Treating the propensity model as the outcome model and skipping balance checks.
Forcing matches outside common support and extrapolating from incomparable controls.
Assuming PSM solves unobserved confounding; it only adjusts for observed covariates included in X.

References