Frontier / Causal Discovery

Causal Discovery: Learning Causal Graphs from Data (PC, LiNGAM, NOTEARS) and Its Limits

When you are unsure which variable affects which, causal-discovery algorithms try to infer a DAG from observational data. They are good at generating hypotheses, but their assumptions are strong and social-science use needs care.

Causal discovery tries to learn a causal graph from data. Three common families: constraint-based using conditional independence (PC), function-based using non-Gaussianity/nonlinearity (LiNGAM), and continuous optimization for finding DAGs (NOTEARS). They help generate hypotheses and screen variables but rely on strong assumptions (causal sufficiency, faithfulness) and cannot replace institutional knowledge and research design.

Schematic

The principle at a glance

Causal discovery infers a graph from data: conditional-independence tests remove edges to get a skeleton, then v-structures/non-Gaussianity/time order orient it; observation often identifies only an equivalence class (CPDAG). Treat it as a hypothesis generator; test key edges with design.

Start Here

What you should be able to do

Know the goal: estimate a DAG / CPDAG from data, not a single effect.

Understand three families: constraint-based (PC), functional causal models (LiNGAM), continuous optimization (NOTEARS).

Know the key assumptions: causal sufficiency (no unobserved common cause), faithfulness, and (for LiNGAM) non-Gaussian noise.

Understand that observational data often identifies only a Markov equivalence class (CPDAG), leaving some edge directions undetermined.

Treat causal discovery as a hypothesis generator, then test with design, experiments, or expert knowledge.

Learning Path

Learning path: from correlation structure to a candidate causal graph

Follow this path: estimate the skeleton, orient as far as possible, recognize the equivalence-class limit, and hand the graph to research design as a hypothesis.

Step 1
Data
Observe multivariate data; the target is structure, not one effect.
X_1..X_p
Step 2
Skeleton
Remove indirect/spurious edges with conditional-independence tests.
X_i⟂X_j|S
Step 3
Orient
Orient with v-structures, non-Gaussianity, or time order.
v-structure
Step 4
Equivalence
Recognize that observation often identifies only a CPDAG.
CPDAG
Step 5
Validate
Treat the graph as a hypothesis; test key edges with design.
design test

01 / Intuition

Core Intuition

Causal discovery answers what the graph looks like, not how large a single edge is. It infers direction from correlation structure plus assumptions.

Constraint-based PC uses conditional-independence tests to remove edges and then orient them, but with observation alone it often identifies only a Markov equivalence class (CPDAG).

Recovering unique directions needs extra structural assumptions: LiNGAM exploits the asymmetry of linear models with non-Gaussian noise; NOTEARS uses a differentiable acyclicity constraint to turn combinatorial search into continuous optimization.

02 / Math

From a joint distribution to a testable causal graph

01 / Graph and factorization

A causal graph is a DAG, and the joint distribution factorizes over parents. Discovery estimates this structure from data.

P(X_1,...,X_p) = prod_j P(X_j | pa(X_j))

02 / Constraint-based (PC) and equivalence classes

Use conditional-independence tests: if X_i and X_j are independent given some set S, remove the edge; then orient v-structures (colliders). Observation usually identifies only a CPDAG.

X_i ⟂ X_j | S  =>  remove edge i–j

03 / Faithfulness and causal sufficiency

PC depends on faithfulness (the data's independencies are exactly those implied by the graph), causal sufficiency (no unobserved common cause), and valid independence tests. If any fails, the result can be wrong.

faithfulness + causal sufficiency + valid CI tests

04 / LiNGAM: orientation via non-Gaussianity

A linear non-Gaussian acyclic model X=BX+e with independent, non-Gaussian components e. Non-Gaussianity breaks the symmetry, making the causal order and coefficient matrix B uniquely identifiable (via ICA-type methods).

X = B X + e,  e non-Gaussian and independent

05 / NOTEARS: continuous optimization

Encode acyclicity as a smooth constraint h(W)=0, replacing combinatorial search over DAGs with continuous optimization solvable by gradient methods.

min_W loss(W)  s.t.  h(W)=tr(exp(W∘W))−p=0

03 / Code

Code cases: conditional independence, equivalence classes, and non-Gaussian orientation

Small simulations show how conditional independence removes a seemingly-present edge, why direction is unidentified under Gaussian observation, and how non-Gaussianity helps orient.

Case 1: conditional independence removes an indirect edge

In a chain X→Y→Z, X and Z are correlated but independent given Y, so there is no direct X–Z edge.

import numpy as np
rng = np.random.default_rng(0)
n = 4000
X = rng.normal(size=n)
Y = 1.2 * X + rng.normal(size=n)
Z = 0.9 * Y + rng.normal(size=n)
def pcorr(x, y, z):
    rx = x - np.polyval(np.polyfit(z, x, 1), z)
    ry = y - np.polyval(np.polyfit(z, y, 1), z)
    return np.corrcoef(rx, ry)[0, 1]
print("corr(X,Z)      =", round(np.corrcoef(X, Z)[0,1], 3))
print("pcorr(X,Z | Y) =", round(pcorr(X, Z, Y), 3))

Expected output

corr(X,Z)      = 0.73
pcorr(X,Z | Y) = 0.00

How to read this code

X and Z are marginally correlated, suggesting a direct link.
Conditioning on the mediator Y makes them independent, so there is no direct edge.
PC uses exactly this kind of conditional independence to recover the skeleton.

Case 2: direction is unidentified under Gaussian observation

X→Y and X←Y give the same correlation structure; Gaussian observation alone cannot tell them apart.

import numpy as np
rng = np.random.default_rng(1)
n = 5000
X = rng.normal(size=n); Y = 0.8 * X + rng.normal(size=n)  # X -> Y
b_xy = np.polyfit(X, Y, 1)[0]
b_yx = np.polyfit(Y, X, 1)[0]
print("slope Y~X =", round(b_xy, 3), " slope X~Y =", round(b_yx, 3))
print("corr identical both ways:", round(np.corrcoef(X, Y)[0,1], 3))

Expected output

slope Y~X = 0.80  slope X~Y = 0.49
corr identical both ways: 0.62

How to read this code

Both directions fit the data; the correlation structure is symmetric.
So Gaussian observation identifies only the Markov equivalence class.
Orientation needs extra information: experiments, time order, or non-Gaussianity.

Case 3: non-Gaussianity breaks the symmetry and orients

LiNGAM intuition: in the correct direction the residual is independent of the cause, but not in the reverse direction.

import numpy as np
rng = np.random.default_rng(2)
n = 5000
X = rng.laplace(size=n)              # non-Gaussian cause
Y = 0.9 * X + rng.laplace(size=n)    # X -> Y
res_fwd = Y - np.polyval(np.polyfit(X, Y, 1), X)   # regress Y on X
res_bwd = X - np.polyval(np.polyfit(Y, X, 1), Y)   # regress X on Y
print("forward |corr(res, X)| =", round(abs(np.corrcoef(res_fwd, X)[0,1]), 3))
print("backward|corr(res, Y)| =", round(abs(np.corrcoef(res_bwd, Y)[0,1]), 3))

Expected output

forward |corr(res, X)| = 0.00
backward|corr(res, Y)| = 0.21

How to read this code

In the correct direction X→Y, the residual is independent of X (corr ~ 0).
In the reverse direction the residual still correlates with Y, exposing the wrong direction.
This is the intuition LiNGAM uses to orient via non-Gaussian asymmetry.

04 / Case

Case: discovering a candidate graph among development indicators, then testing it

Question: the causal structure among education, health, income, and infrastructure indicators is unknown, and we want testable hypotheses first.
Use PC to recover the skeleton and orientable parts into a CPDAG; use LiNGAM or time order to orient further where possible.
Treat the discovered graph as a hypothesis and test key edges with quasi-experiments, instruments, or institutional knowledge rather than reporting it as causal truth.
A credible report states the assumptions used (sufficiency, faithfulness), sensitivity to unobserved common causes, edges left unoriented in the equivalence class, and how design will validate them.

05 / Causal

Plugging into estimation: discovered graph → adjustment set → design

Causal discovery does not give an effect directly; it gives a candidate graph. Its most useful product is what to control for (a back-door adjustment set), which then feeds the estimation methods you already know.

01 / Discovery → back-door adjustment set

Read an adjustment set satisfying the back-door criterion from the candidate DAG, avoiding conditioning on colliders or mediators.

adjust by Z s.t. back-door(Z) holds

02 / Adjustment set → DML / matching

Use the adjustment set as controls X in DML / AIPW / matching to estimate the treatment effect.

03 / Unoriented edges → fill with design

For key edges left unoriented in the equivalence class, add identification with quasi-experiments, IV, or time order.

04 / Discovery is exploration, not confirmation

Doing discovery and estimation on the same data is over-optimistic; separate exploration from confirmation (sample splitting / pre-registration).

Three red lines: (1) causal sufficiency is nearly untestable, and unobserved common causes distort discovered edges; (2) observation often identifies only an equivalence class, so do not treat a CPDAG as the unique DAG; (3) keep discovery and confirmation separate to avoid self-validation on the same data.

06 / Risks

Common Pitfalls

Treating a discovered DAG as established causal truth rather than a hypothesis to test.

Ignoring causal sufficiency: with unobserved common causes, discovered edges may be entirely confounding.

Over-interpreting directions of edges that are inherently unorientable in the CPDAG.

Unstable conditional-independence tests in small samples, causing systematic edge/orientation errors.

Using discovery to mine the effect you want, treating exploration as confirmation.

Causal Discovery: Learning Causal Graphs from Data (PC, LiNGAM, NOTEARS) and Its Limits

The principle at a glance

What you should be able to do

Learning path: from correlation structure to a candidate causal graph

Data

Skeleton

Orient

Equivalence

Validate

Core Intuition

From a joint distribution to a testable causal graph

Code cases: conditional independence, equivalence classes, and non-Gaussian orientation

Case 1: conditional independence removes an indirect edge

Case 2: direction is unidentified under Gaussian observation

Case 3: non-Gaussianity breaks the symmetry and orients

Case: discovering a candidate graph among development indicators, then testing it

Plugging into estimation: discovered graph → adjustment set → design

Common Pitfalls

References