There are actually two sides of what is referred to as causal inference. Either (a) inferring a causal graph from the data, or (b) given a graph and data, measuring the causal effect of variables among each other.

The broad idea in (a) is to start with a fully connected graph, and eliminate edges between nodes that can be tested as independent, or independent conditionally on other nodes. This gives you a non-directed graph which can be oriented by several methods (identifying V-structures, looking at residuals of regressions of X on Y vs Y on X).

The theory in (b) actually generalizes instrumental variables and lays out graphical configurations where you can measure the causal effect of a variable onto another variable, and how to compute that effect.

The broad idea in (a) is to start with a fully connected graph, and eliminate edges between nodes that can be tested as independent, or independent conditionally on other nodes. This gives you a non-directed graph which can be oriented by several methods (identifying V-structures, looking at residuals of regressions of X on Y vs Y on X).

The theory in (b) actually generalizes instrumental variables and lays out graphical configurations where you can measure the causal effect of a variable onto another variable, and how to compute that effect.

A great reference: https://www.amazon.com/Causality-Reasoning-Inference-Judea-P...

A nice introduction: https://www.youtube.com/watch?v=RPgvfSeQB8A