A note on relationship between marginal probabilities and conditionals.
I occasionally come across people who define their models by a “Gibbs” sampling algorithm, without explicitly specifying a marginal distribution. That is, to model a bivariate distribution on and , these people do not define their marginal distribution ; instead, they show me a pair of conditional distributions and , and tell me that’s their probabilistic model. This always make me feel very uncomfortable, but I couldn’t tell exactly why that’s a bad idea; so I gave it some thought this time. I could not find a compelling reason to be against this practice, but in the process I learned some interesting facts about marginal probabilities and conditional probabilities; I was so entertained that I wanted to write down my thought process here.
I started thinking about whether a bivariate random variable is fully specified by and . In other words, if you are given and , how much do you know about ? If there is any hole, then I thought that could make a strong argument. I started by coming up with a somewhat atrocious pair of conditional distributions which I thought to be a counter-example:
where is an indicator function. That is, given , with probability 1; and also, given , with probability 1. This pair of conditional probabilities give you absolutely no information about marginal distribution of , , or . could be a normal distribution, a Cauchy, or a whatever distribution you name it. So this is an example specifying conditional distributions is not enough.
I was not satisfied with this example, however, because one would argue that nobody would use such a stupid distribution to specify their model. Then, I came up with a better-looking example:
where is a normal distribution with mean and standard deviation . This may look somewhat legitimate if you don’t pay enough attention, but this does not specify a well-defined probability distribution either; if you run Gibbs sampling algorithm with these conditional distributions, for example, your samples will follow a random walk and will drift to anywhere in . So, this implies that some pair of conditional distributions do not specify a non-degenerate marginal distribution, and therefore care should be taken if you are specifying your model with only conditionals.
However, I still felt uncomfortable that some may still argue that my counter-examples do not satisfy some necessary regularity conditions, while theirs do. It occurred to me that Gibbs sampling works under fairly weak regularity conditions; since you can recover your distribution up to arbitrary precision with Gibbs sampling algorithm, actually conditional distributions should be enough to specify the marginal distribution. So this time, I wrote down conditional distributions as follows:
Then, I realized that
If you integrate both sides in , since , you get
in other words,
This implies that you can recover the marginal distribution from conditional distributions. You can also get marginal distribution on by , so a pair of conditional distributions actually fully specifies the model as long as the above integration can be done with probability 1. Going back to the previous example where , the ratio of these conditional density function is a constant and not integrable everywhere; that was why this pair of conditional distributions do not define a non-degenerate marginal distribution.
In general, it may not be easy to check whether is integrable with probability 1. However, I feel like this is not a strong enough argument yet… I feel like there has to be more constraints on conditional probabilities, but I don’t know about them yet. On the other hand, the ratio of two conditional densities seems pretty interesting! I wonder whether it turned out to be useful in somewhere else.