FiniteGaussianMixture

Documentation for finite Gaussian mixture models, with a fixed number of mixture components.

This model is available through the BayesDensityFiniteGaussianMixture package.

FiniteGaussianMixture models the data-generating density as a mixture of normal distributions with a known number of mixture components. The prior-model specification can be written as follows:

\[\begin{align*} x_i\,|\, \boldsymbol{w}, \boldsymbol{\mu}, \boldsymbol{\sigma}^2 &\sim \sum_{k=1}^K \frac{w_k}{\sigma_k} \phi\Big(\frac{x_i - \mu_k}{\sigma_k}\Big), &i = 1,\ldots, n,\\ w_k &\sim \mathrm{Dirichlet}_K(\alpha, \ldots, \alpha),\\ \mu_k &\sim \mathrm{Normal}(\mu_0, \sigma_0^2), &k = 1,\ldots, K,\\ \sigma_k^2 \,|\, \beta &\sim \mathrm{InverseGamma}(a_\sigma, \beta), &k = 1,\ldots, K,\\ \beta &\sim \mathrm{Gamma}(a_\beta, b_\beta),\\ \end{align*}\]

where $\phi(\cdot)$ denotes the density of the standard normal distribution, $\mu_0 \in \mathbb{R}, \alpha, \sigma_0^2, a_\sigma, a_\beta, b_\beta > 0$ are fixed hyperparameters and $K$ is a fixed positive integer.^[1]

Note

When using FiniteGaussianMixture, the number of mixture components $K$ must be specified by the user. A more flexible version of this model, where $K$ is further treated as a random variable with its own prior distribution, is available as RandomFiniteGaussianMixture.

For Markov chain Monte Carlo based inference, this module implements an augmented Gibbs sampling approach. The algorithm used is essentially the Gibbs sampler sweep (excluding the reversible jump-move) of Richardson and Green (1997). For variational inference, we implement a variant of algorithm 5 in Ormerod and Wand (2010). Note that our version also includes an additional hyperprior on the rate parameters of the mixture scales and that the algorithm has been adjusted to account for this fact.

For a discussion on the selection of the hyperparameters in such models, we refer the intereted reader to Frühwirth-Schnatter (2006) and Rousseau and Mengersen (2011). The default hyperparameter values are based on Richardson and Green (1997).

Module API

BayesDensityFiniteGaussianMixture.FiniteGaussianMixture — Type

FiniteGaussianMixture{T<:Real} <: AbstractBayesDensityModel{T}

Struct representing a finite Gaussian mixture model with a fixed number of components.

Constructors

FiniteGaussianMixture(x::AbstractVector{<:Real}, K::Int; kwargs...)
FiniteGaussianMixture{T}(x::AbstractVector{<:Real}, K::Int; kwargs...)

Arguments

x: The data vector.
K: The number of mixture components.

Keyword arguments

prior_strength: Strength parameter of the symmetric Dirichlet prior on the mixture weights. E.g. the prior is Dirichlet(strength, ..., strength). Defaults to 1.0.
prior_location: Prior mean of the location parameters μ[k]. Defaults to the midpoint of the minimum and maximum values in the sample.
prior_variance: The prior variance of the location parameter μ[k]. Defaults to the sample range.
prior_shape: Prior shape parameter of the squared scale parameters σ2[k]: Defaults to 2.0.
hyperprior_shape: Prior shape parameter of the hyperprior on the rate parameter of σ2[k]. Defaults to 0.2.
hyperprior_rate: Prior rate parameter of the hyperprior on the rate parameter of σ2[k]. Defaults to 0.2*R^2, where R is the sample range.

Examples

julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);

julia> fgm = FiniteGaussianMixture(x, 5)
FiniteGaussianMixture{Float64} with 5 components.
Using 5000 observations.
Hyperparameters:
 prior_location = 0.5, prior_variance = 1.0
 prior_shape = 2.0, hyperprior_shape = 0.2, hyperprior_rate = 10.0
 prior_strength = 1.0

julia> fgm = FiniteGaussianMixture(x; prior_strength=10);

Extended help

The number of mixture components

The value of the number of mixture components K controls the flexibility of the model. Setting this to a larger value makes the model able to capture more complex shapes, but may come at the cost of increased variance.

The prior strength parameter

The prior_strength plays a key role in determining the number of "active" components. For smaller values of this parameter, some of the mixture components will tend to be assigned low weight, and only a few components will contribute to the estimate.

source

Evaluating the pdf and cdf

Distributions.pdf — Method

pdf(
    fgm::FiniteGaussianMixture,
    params::NamedTuple,
    t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}

pdf(
    fgm::FiniteGaussianMixture,
    params::AbstractVector{NamedTuple},
    t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}

Evaluate $f(t\, |\, \boldsymbol{\eta})$ for a given FiniteGaussianMixture when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.

The named tuple should contain fields named :μ, :σ2, :w and optionally :β.

source

Distributions.cdf — Method

cdf(
    fgm::FiniteGaussianMixture,
    params::NamedTuple,
    t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}

cdf(
    fgm::FiniteGaussianMixture,
    params::AbstractVector{NamedTuple},
    t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}

Evaluate $F(t\, |\, \boldsymbol{\eta})$ for a given FiniteGaussianMixture when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.

The named tuple should contain fields named :μ, :σ2, :w and optionally :β.

source

Statistics.quantile — Method

quantile(
    fgm::RandomFiniteGaussianMixture,
    params::NamedTuple,
    p::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}

cdf(
    fgm::RandomFiniteGaussianMixture,
    params::AbstractVector{NamedTuple},
    p::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}

Evaluate $Q(p\, |\, \boldsymbol{\eta})$ for a given FiniteGaussianMixture when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.

The named tuple should contain fields named :μ, :σ2, :w and optionally :β.

source

quantile(
    bdm::AbstractBayesDensityModel,
    parameters::NamedTuple,
    p::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}

quantile(
    bdm::AbstractBayesDensityModel,
    parameters::AbstractVector{<:NamedTuple},
    p::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}

Evaluate the quantile function $Q(p\,|\, \boldsymbol{\eta}) = \inf \{x\colon F(x) \geq p\}$ of the Bayesian density model bdm for every $\boldsymbol{\eta}$ in parameters and every element in the collection p.

If a single NamedTuple is passed to the parameters argument, this function outputs either a scalar or a vector depending on the input type of the third argument p.

If a vector of NamedTuples is passed to the second positional argument, then this function returns a Matrix of size (length(p), length(parameters)).

source

Utility functions

BayesDensityCore.hyperparams — Method

hyperparams(
    gm::FiniteGaussianMixture{T}
) where {T} -> @NamedTuple{prior_strength::T, prior_location::T, prior_variance::T, prior_shape::T, prior_rate::T}

Returns the hyperparameters of the finite Gaussian mixture model gm as a NamedTuple.

source

Markov chain Monte Carlo

StatsBase.sample — Method

sample(
    [rng::Random.AbstractRNG],
    fgm::FiniteGaussianMixture{T},
    n_samples::Int;
    n_burnin::Int              = min(1000, div(n_samples, 5)),
    initial_params::NamedTuple = _get_default_initparams_mcmc(rfgm)
) where {T} -> PosteriorSamples{T}

Generate n_samples posterior samples from a FiniteGaussianMixture using an augmented Gibbs sampler.

Arguments

rng: Optional random seed used for random variate generation.
fgm: The FiniteGaussianMixture object for which posterior samples are generated.
n_samples: The total number of samples (including burn-in).

Keyword arguments

n_burnin: Number of burn-in samples.
initial_params: Initial values used in the MCMC algorithm. Should be supplied as a NamedTuple with fields :μ, :σ2 and :w, where all are K-dimensional vectors.

The following constraints must also be satisfied: σ2[k]>0 for all k and w[k]≥0 for all k and sum(w) ≈ 1

Returns

ps: A PosteriorSamples object holding the posterior samples and the original model object.

Examples

julia> using Random

julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);

julia> fgm = FiniteGaussianMixture(x, 4);

julia> ps1 = sample(fgm, 5_000);

julia> ps2 = sample(fgm, 5_000; n_burnin=2_000, initial_params = (μ = [0.2, 0.8], σ2 = [1.0, 2.0], w = [0.7, 0.3]));

source

Variational inference

BayesDensityFiniteGaussianMixture.FiniteGaussianMixtureVIPosterior — Type

FiniteGaussianMixtureVIPosterior{T<:Real} <: AbstractVIPosterior{T}

Struct representing the variational posterior distribution of a FiniteGaussianMixture.

Fields

q_w: Distribution representing the optimal variational densities of the component weights q*(w|K).
q_μ: Product distribution representing the optimal variational densitiy of the component means q*(μ|K).
q_σ2: Product distribution representing the optimal variational densitiy of the component variances q*(σ2|K).
q_β: The optimal variational density q*(β|k) of the rate hyperparameter of the component variance σ2[k].
fgm: The FiniteGaussianMixture to which the variational posterior was fit.

source

BayesDensityCore.varinf — Method

varinf(
    fgm::FiniteGaussianMixture{T};
    initial_params::NamedTuple = _get_default_initparams(x),
    max_iter::Int              = 2000
    rtol::Real                 = 1e-6
) where {T} -> PitmanYorMixtureVIPosterior{T}

Find a variational approximation to the posterior distribution of a FiniteGaussianMixture using mean-field variational inference.

Arguments

fgm: The FiniteGaussianMixture whose posterior we want to approximate.

Keyword arguments

initial_params: Initial values of the VI parameters dirichlet_params location_params, variance_params, shape_params and rate_params, supplied as a NamedTuple.
max_iter: Maximal number of VI iterations. Defaults to 1000.
rtol: Relative tolerance used to determine convergence. Defaults to 1e-6.

Returns

vip: A FiniteGaussianMixtureVIPosterior object representing the variational posterior.
info: A VariationalOptimizationResult describing the result of the optimization.

Note

To perform the optimization for a fixed number of iterations irrespective of the convergence criterion, one can set rtol = 0.0, and max_iter equal to the desired total iteration count. Note that setting rtol to a strictly negative value will issue a warning.

Examples

julia> using Random

julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);

julia> fgm = FiniteGaussianMixture(x, 10);

julia> vip, info = varinf(fgm);

julia> vip, info = varinf(fgm; rtol=1e-7, max_iter=3000);

Extended help

Convergence

The criterion used to determine convergence is that the relative change in the ELBO falls below the given rtol.

source

1We use the rate parameterization of the Gamma distribution here. This differs from the scale-parameterization used by Distributions.