Frontier / Causal Discovery
Causal Discovery: Learning Causal Graphs from Data (PC, LiNGAM, NOTEARS) and Its Limits
When you are unsure which variable affects which, causal-discovery algorithms try to infer a DAG from observational data. They are good at generating hypotheses, but their assumptions are strong and social-science use needs care.
Schematic
The principle at a glance
Start Here
What you should be able to do
Know the goal: estimate a DAG / CPDAG from data, not a single effect.
Understand three families: constraint-based (PC), functional causal models (LiNGAM), continuous optimization (NOTEARS).
Know the key assumptions: causal sufficiency (no unobserved common cause), faithfulness, and (for LiNGAM) non-Gaussian noise.
Understand that observational data often identifies only a Markov equivalence class (CPDAG), leaving some edge directions undetermined.
Treat causal discovery as a hypothesis generator, then test with design, experiments, or expert knowledge.
Learning Path
Learning path: from correlation structure to a candidate causal graph
Follow this path: estimate the skeleton, orient as far as possible, recognize the equivalence-class limit, and hand the graph to research design as a hypothesis.
Step 1
Data
Observe multivariate data; the target is structure, not one effect.
X_1..X_p
Step 2
Skeleton
Remove indirect/spurious edges with conditional-independence tests.
X_i⟂X_j|S
Step 3
Orient
Orient with v-structures, non-Gaussianity, or time order.
v-structure
Step 4
Equivalence
Recognize that observation often identifies only a CPDAG.
CPDAG
Step 5
Validate
Treat the graph as a hypothesis; test key edges with design.
design test
01 / Intuition
Core Intuition
Causal discovery answers what the graph looks like, not how large a single edge is. It infers direction from correlation structure plus assumptions.
Constraint-based PC uses conditional-independence tests to remove edges and then orient them, but with observation alone it often identifies only a Markov equivalence class (CPDAG).
Recovering unique directions needs extra structural assumptions: LiNGAM exploits the asymmetry of linear models with non-Gaussian noise; NOTEARS uses a differentiable acyclicity constraint to turn combinatorial search into continuous optimization.
02 / Math
From a joint distribution to a testable causal graph
01 / Graph and factorization
A causal graph is a DAG, and the joint distribution factorizes over parents. Discovery estimates this structure from data.
P(X_1,...,X_p) = prod_j P(X_j | pa(X_j))02 / Constraint-based (PC) and equivalence classes
Use conditional-independence tests: if X_i and X_j are independent given some set S, remove the edge; then orient v-structures (colliders). Observation usually identifies only a CPDAG.
X_i ⟂ X_j | S => remove edge i–j03 / Faithfulness and causal sufficiency
PC depends on faithfulness (the data's independencies are exactly those implied by the graph), causal sufficiency (no unobserved common cause), and valid independence tests. If any fails, the result can be wrong.
faithfulness + causal sufficiency + valid CI tests04 / LiNGAM: orientation via non-Gaussianity
A linear non-Gaussian acyclic model X=BX+e with independent, non-Gaussian components e. Non-Gaussianity breaks the symmetry, making the causal order and coefficient matrix B uniquely identifiable (via ICA-type methods).
X = B X + e, e non-Gaussian and independent05 / NOTEARS: continuous optimization
Encode acyclicity as a smooth constraint h(W)=0, replacing combinatorial search over DAGs with continuous optimization solvable by gradient methods.
min_W loss(W) s.t. h(W)=tr(exp(W∘W))−p=003 / Code
Code cases: conditional independence, equivalence classes, and non-Gaussian orientation
Small simulations show how conditional independence removes a seemingly-present edge, why direction is unidentified under Gaussian observation, and how non-Gaussianity helps orient.
Case 1: conditional independence removes an indirect edge
In a chain X→Y→Z, X and Z are correlated but independent given Y, so there is no direct X–Z edge.
import numpy as np
rng = np.random.default_rng(0)
n = 4000
X = rng.normal(size=n)
Y = 1.2 * X + rng.normal(size=n)
Z = 0.9 * Y + rng.normal(size=n)
def pcorr(x, y, z):
rx = x - np.polyval(np.polyfit(z, x, 1), z)
ry = y - np.polyval(np.polyfit(z, y, 1), z)
return np.corrcoef(rx, ry)[0, 1]
print("corr(X,Z) =", round(np.corrcoef(X, Z)[0,1], 3))
print("pcorr(X,Z | Y) =", round(pcorr(X, Z, Y), 3))Expected output
corr(X,Z) = 0.73
pcorr(X,Z | Y) = 0.00How to read this code
- X and Z are marginally correlated, suggesting a direct link.
- Conditioning on the mediator Y makes them independent, so there is no direct edge.
- PC uses exactly this kind of conditional independence to recover the skeleton.
Case 2: direction is unidentified under Gaussian observation
X→Y and X←Y give the same correlation structure; Gaussian observation alone cannot tell them apart.
import numpy as np
rng = np.random.default_rng(1)
n = 5000
X = rng.normal(size=n); Y = 0.8 * X + rng.normal(size=n) # X -> Y
b_xy = np.polyfit(X, Y, 1)[0]
b_yx = np.polyfit(Y, X, 1)[0]
print("slope Y~X =", round(b_xy, 3), " slope X~Y =", round(b_yx, 3))
print("corr identical both ways:", round(np.corrcoef(X, Y)[0,1], 3))Expected output
slope Y~X = 0.80 slope X~Y = 0.49
corr identical both ways: 0.62How to read this code
- Both directions fit the data; the correlation structure is symmetric.
- So Gaussian observation identifies only the Markov equivalence class.
- Orientation needs extra information: experiments, time order, or non-Gaussianity.
Case 3: non-Gaussianity breaks the symmetry and orients
LiNGAM intuition: in the correct direction the residual is independent of the cause, but not in the reverse direction.
import numpy as np
rng = np.random.default_rng(2)
n = 5000
X = rng.laplace(size=n) # non-Gaussian cause
Y = 0.9 * X + rng.laplace(size=n) # X -> Y
res_fwd = Y - np.polyval(np.polyfit(X, Y, 1), X) # regress Y on X
res_bwd = X - np.polyval(np.polyfit(Y, X, 1), Y) # regress X on Y
print("forward |corr(res, X)| =", round(abs(np.corrcoef(res_fwd, X)[0,1]), 3))
print("backward|corr(res, Y)| =", round(abs(np.corrcoef(res_bwd, Y)[0,1]), 3))Expected output
forward |corr(res, X)| = 0.00
backward|corr(res, Y)| = 0.21How to read this code
- In the correct direction X→Y, the residual is independent of X (corr ~ 0).
- In the reverse direction the residual still correlates with Y, exposing the wrong direction.
- This is the intuition LiNGAM uses to orient via non-Gaussian asymmetry.
04 / Case
Case: discovering a candidate graph among development indicators, then testing it
- Question: the causal structure among education, health, income, and infrastructure indicators is unknown, and we want testable hypotheses first.
- Use PC to recover the skeleton and orientable parts into a CPDAG; use LiNGAM or time order to orient further where possible.
- Treat the discovered graph as a hypothesis and test key edges with quasi-experiments, instruments, or institutional knowledge rather than reporting it as causal truth.
- A credible report states the assumptions used (sufficiency, faithfulness), sensitivity to unobserved common causes, edges left unoriented in the equivalence class, and how design will validate them.
05 / Causal
Plugging into estimation: discovered graph → adjustment set → design
Causal discovery does not give an effect directly; it gives a candidate graph. Its most useful product is what to control for (a back-door adjustment set), which then feeds the estimation methods you already know.
01 / Discovery → back-door adjustment set
Read an adjustment set satisfying the back-door criterion from the candidate DAG, avoiding conditioning on colliders or mediators.
adjust by Z s.t. back-door(Z) holds02 / Adjustment set → DML / matching
Use the adjustment set as controls X in DML / AIPW / matching to estimate the treatment effect.
03 / Unoriented edges → fill with design
For key edges left unoriented in the equivalence class, add identification with quasi-experiments, IV, or time order.
04 / Discovery is exploration, not confirmation
Doing discovery and estimation on the same data is over-optimistic; separate exploration from confirmation (sample splitting / pre-registration).
Three red lines: (1) causal sufficiency is nearly untestable, and unobserved common causes distort discovered edges; (2) observation often identifies only an equivalence class, so do not treat a CPDAG as the unique DAG; (3) keep discovery and confirmation separate to avoid self-validation on the same data.
06 / Risks
Common Pitfalls
References
- Spirtes, Glymour, and Scheines (2000), Causation, Prediction, and Search, MIT Presshttps://mitpress.mit.edu/9780262527927/causation-prediction-and-search/
- Shimizu et al. (2006), A Linear Non-Gaussian Acyclic Model for Causal Discovery (LiNGAM), JMLRhttps://www.jmlr.org/papers/v7/shimizu06a.html
- Zheng et al. (2018), DAGs with NO TEARS: Continuous Optimization for Structure Learning, NeurIPShttps://arxiv.org/abs/1803.01422
- Peters, Janzing, and Schölkopf (2017), Elements of Causal Inference, MIT Presshttps://mitpress.mit.edu/9780262037310/elements-of-causal-inference/