Graphical models: An overview

Graphic modeling

Graphical models are intended to provide a concise description of possibly complex relationships between a set of variables. In addition, properties can be read directly from the description key. The basic idea is that each variable is represented by a node in the graph. Any pair of nodes can be connected by an edge. For most types of graphs, a missing edge represents some form of independence between a pair of variables. Because independence can be either negligible or dependent on some or all of the other variables, a variety of graph types is necessary.

A particularly important distinction is between directed and undirected edges. In the former case, the arrow indicates the direction of dependence from the explanatory variable to the response. If, however, the two variables are to be interpreted as equal, then the edge between them is undirected, or cyclic dependencies are allowed. For example, systolic and diastolic blood pressure are usually treated on an equal basis because they are two aspects of the same phenomenon, namely the blood pressure wave.

Graphical models were developed by Darroch et al. (1980) and Wermuth (1976) as special subclasses of logit models for contingency tables and multivariate Gaussian distributions that can be interpreted in terms of conditional independence and can be represented by undirected graphs. See Multivariate analysis: overview ; Multivariate analysis: discrete variables (logistic models) .

The first extension concerned problems in which variables can be arranged in a sequence so that each variable is considered as a response to all variables to its right in the sequence. This leads to a representation by an oriented acyclic graph. Then there are joint response models with graphs containing blocks of undirected components, the blocks are again connected in a directed and acyclic manner. Many recent and ongoing studies deal with more specialized situations, such as models for mixed dependencies, for event history data, for time series data, and for simultaneous dependencies in multivariate Gaussian distributions. The latter correspond, for example, to independence graphs with directed cyclic components.