9.1 Distribution Objects

A distribution object represents a probability distribution over a common domain, such as the real numbers, integers, or a set of symbols. Their constructors correspond with distribution families, such as the family of normal distributions.

A distribution object, or a value of type dist, has a density function (a pdf) and a procedure to generate random samples. An ordered distribution object, or a value of type ordered-dist, additionally has a cumulative distribution function (a cdf), and its generalized inverse (an inverse cdf).

The following example creates an ordered distribution object representing a normal distribution with mean 2 and standard deviation 5, computes an approximation of the probability of the half-open interval (1/2,1], and computes another approximation from random samples:

> (define d (normal-dist 2 5))
> (real-dist-prob d 0.5 1.0)
0.038651712749849576
> (define xs (sample d 10000))
> (fl (/ (count (λ (x) (and (1/2 . < . x) (x . <= . 1))) xs)
(length xs)))
0.0391

This plots the pdf and a kernel density estimate of the pdf from random samples:

> (plot (list (function (distribution-pdf d) #:color 0 #:style 'dot)
(density xs))
#:x-label "x" #:y-label "density of N(2,5)")

There are also higher-order distributions, which take other distributions as constructor arguments. For example, the truncated distribution family returns a distribution like its distribution argument, but sets probability outside an interval to 0 and renormalizes the probabilities within the interval:

> (define d-trunc (truncated-dist d -inf.0 5))
> (real-dist-prob d-trunc 5 6)
0.0
> (real-dist-prob d-trunc 0.5 1.0)
0.0532578419490049
> (plot (list (function (distribution-pdf d-trunc) #:color 0 #:style 'dot)
(density (sample d-trunc 1000)))
#:x-label "x" #:y-label "density of T(N(2,5),-∞,5)")

Because real distributions’ cdfs represent the probability P[X ≤ x], they are right-continuous (i.e. continuous from the right):

> (define d (geometric-dist 0.4))
> (plot (for/list ([i  (in-range -1 7)])
          (define i+1-ε (flprev (+ i 1.0)))
          (list (lines (list (vector i (cdf d i))
                             (vector i+1-ε (cdf d i+1-ε)))
                       #:width 2)
                (points (list (vector i (cdf d i)))
                        #:sym 'fullcircle5 #:color 1)
                (points (list (vector i+1-ε (cdf d i+1-ε)))
                        #:sym 'fullcircle5 #:color 1 #:fill-color 0)))
        #:x-min -0.5 #:x-max 6.5 #:y-min -0.05 #:y-max 1
        #:x-label "x" #:y-label "P[X ≤ x]")

For convenience, cdfs are defined over the extended reals regardless of their distribution’s support, but their inverses return values only within the support:

> (cdf d +inf.0)
1.0
> (cdf d 1.5)
0.64
> (cdf d -inf.0)
0.0
> (inv-cdf d (cdf d +inf.0))
+inf.0
> (inv-cdf d (cdf d 1.5))
1.0
> (inv-cdf d (cdf d -inf.0))
0.0

A distribution’s inverse cdf is defined on the interval [0,1] and is always left-continuous, except possibly at 0 when its support is bounded on the left (as with geometric-dist).

Every pdf and cdf can return log densities and log probabilities, in case densities or probabilities are too small to represent as flonums (i.e. are less than +min.0):

> (define d (normal-dist))
> (pdf d 40.0)
0.0
> (cdf d -40.0)
0.0
> (pdf d 40.0 #t)
-800.9189385332047
> (cdf d -40.0 #t)
-804.6084420137538

Additionally, every cdf can return upper-tail probabilities, which are always more accurate when lower-tail probabilities are greater than 0.5:

> (cdf d 20.0)
1.0
> (cdf d 20.0 #f #t)
2.7536241186062337e-89

Upper-tail probabilities can also be returned as log probabilities in case probabilities are too small:

> (cdf d 40.0)
1.0
> (cdf d 40.0 #f #t)
0.0
> (cdf d 40.0 #t #t)
-804.6084420137538

Inverse cdfs accept log probabilities and upper-tail probabilities.

The functions lg+ and lgsum, as well as others in math/flonum, perform arithmetic on log probabilities.

When distribution object constructors receive parameters outside their domains, they return undefined distributions, or distributions whose functions all return +nan.0:

> (pdf (gamma-dist -1 2) 2)
+nan.0
> (sample (poisson-dist -2))
+nan.0
> (cdf (beta-dist 0 0) 1/2)
+nan.0
> (inv-cdf (geometric-dist 1.1) 0.2)
+nan.0

top ← prev up next →

1	Constants and Elementary Functions
2	Flonums
3	Special Functions
4	Number Theory
5	Arbitrary-Precision Floating-Point Numbers (Bigfloats)
6	Arrays
7	Matrices and Linear Algebra
8	Statistics Functions
9	Probability Distributions
10	Stuff That Doesn’t Belong Anywhere Else

9.1	Distribution Objects
9.2	Distribution Types and Operations
9.3	Finite Distribution Families
9.4	Integer Distribution Families
9.5	Real Distribution Families
9.6	Low-Level Distribution Functions