What makes an image different from random noise? A short quick answer is: they have different statistics. But how and why? Scientists and engineers have come up with many competing theories among which the tail of a distribution is particularly enlightening.
Consider the Gaussian noise first. As we mentioned in the class, the tail of Gaussian noise is guided by an exponential function no matter what new coordinate is used (assuming an orthogonal transform). The rate of exponential decay implies that the tail (large coefficients or we call them exceptions) only consist of a neglible portion (e.g., about 99.7% samples are within 3\sigma distance from the mean for a Gaussian r.v. with variance of \sigma^2). By contrast, take the wavelet transform of any natural image, we often find the distribution of its high-frequency band coefficients has a much heavier tail – i.e., there are significant portion of exceptions. Such observation is the basis of modeling those coefficients by generalized Laplacian/Gaussian or even power-law distribution (scale-invariance). Different characteristics of tails can help understand the effectiveness of simple nonlinear operations such as thresholding. With a proper threshold, nonlinear thresholding can effectively knock down most noise components but leave exceptions alone.
The cause of heavy tails in natural image statistics is a much deeper issue than the observation of this phenomenon. Physical laws of natural world have produced diverse structures across many scales. Self-similarity and self-organizing principles have been proposed to characterize a wide range of physical events from the form of mountain and coastlines to the synchronization in fireflies and lasers. Since natural images are projection of natural world onto an imaging plane, it is plausible to attribute the source of heavy tails to various geometric and physical pricinples. However, a fully complete description of the statistical laws behind natural images remains elusive.