FiniteGaussianMixture
Documentation for finite Gaussian mixture models, with a fixed number of mixture components.
This model is available through the BayesDensityFiniteGaussianMixture package.
FiniteGaussianMixture models the data-generating density as a mixture of normal distributions with a known number of mixture components. The prior-model specification can be written as follows:
\[\begin{align*} x_i\,|\, \boldsymbol{w}, \boldsymbol{\mu}, \boldsymbol{\sigma}^2 &\sim \sum_{k=1}^K \frac{w_k}{\sigma_k} \phi\Big(\frac{x_i - \mu_k}{\sigma_k}\Big), &i = 1,\ldots, n,\\ w_k &\sim \mathrm{Dirichlet}_K(\alpha, \ldots, \alpha),\\ \mu_k &\sim \mathrm{Normal}(\mu_0, \sigma_0^2), &k = 1,\ldots, K,\\ \sigma_k^2 \,|\, \beta &\sim \mathrm{InverseGamma}(a_\sigma, \beta), &k = 1,\ldots, K,\\ \beta &\sim \mathrm{Gamma}(a_\beta, b_\beta),\\ \end{align*}\]
where $\phi(\cdot)$ denotes the density of the standard normal distribution, $\mu_0 \in \mathbb{R}, \alpha, \sigma_0^2, a_\sigma, a_\beta, b_\beta > 0$ are fixed hyperparameters and $K$ is a fixed positive integer.[1]
When using FiniteGaussianMixture, the number of mixture components $K$ must be specified by the user. A more flexible version of this model, where $K$ is further treated as a random variable with its own prior distribution, is available as RandomFiniteGaussianMixture.
For Markov chain Monte Carlo based inference, this module implements an augmented Gibbs sampling approach. The algorithm used is essentially the Gibbs sampler sweep (excluding the reversible jump-move) of Richardson and Green (1997). For variational inference, we implement a variant of algorithm 5 in Ormerod and Wand (2010). Note that our version also includes an additional hyperprior on the rate parameters of the mixture scales and that the algorithm has been adjusted to account for this fact.
Module API
BayesDensityFiniteGaussianMixture.FiniteGaussianMixture — Type
FiniteGaussianMixture{T<:Real} <: AbstractBayesDensityModel{T}Struct representing a finite Gaussian mixture model with a fixed number of components.
Constructors
FiniteGaussianMixture(x::AbstractVector{<:Real}, K::Int; kwargs...)
FiniteGaussianMixture{T}(x::AbstractVector{<:Real}, K::Int; kwargs...)Arguments
x: The data vector.K: The number of mixture components.
Keyword arguments
prior_strength: Strength parameter of the symmetric Dirichlet prior on the mixture weights. E.g. the prior is Dirichlet(strength, ..., strength). Defaults to1.0.prior_location: Prior mean of the location parametersμ[k]. Defaults to the midpoint of the minimum and maximum values in the sample.prior_variance: The prior variance of the location parameterμ[k]. Defaults to the sample range.prior_shape: Prior shape parameter of the squared scale parametersσ2[k]: Defaults to2.0.hyperprior_shape: Prior shape parameter of the hyperprior on the rate parameter ofσ2[k]. Defaults to0.2.hyperprior_rate: Prior rate parameter of the hyperprior on the rate parameter ofσ2[k]. Defaults to0.2*R^2, whereRis the sample range.
Examples
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> fgm = FiniteGaussianMixture(x, 5)
FiniteGaussianMixture{Float64} with 5 components.
Using 5000 observations.
Hyperparameters:
prior_location = 0.5, prior_variance = 1.0
prior_shape = 2.0, hyperprior_shape = 0.2, hyperprior_rate = 10.0
prior_strength = 1.0
julia> fgm = FiniteGaussianMixture(x; prior_strength=10);Evaluating the pdf and cdf
Distributions.pdf — Method
pdf(
bsm::FiniteGaussianMixture,
params::NamedTuple,
t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}
pdf(
bsm::FiniteGaussianMixture,
params::AbstractVector{NamedTuple},
t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}Evaluate $f(t\, |\, \boldsymbol{\eta})$ for a given FiniteGaussianMixture when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.
The named tuple should contain fields named :μ, :σ2, :w and optionally :β.
Distributions.cdf — Method
cdf(
bsm::FiniteGaussianMixture,
params::NamedTuple,
t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}
cdf(
bsm::FiniteGaussianMixture,
params::AbstractVector{NamedTuple},
t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}Evaluate $F(t\, |\, \boldsymbol{\eta})$ for a given FiniteGaussianMixture when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.
The named tuple should contain fields named :μ, :σ2, :w and optionally :β.
Utility functions
BayesDensityCore.hyperparams — Method
hyperparams(
gm::FiniteGaussianMixture{T}
) where {T} -> @NamedTuple{prior_strength::T, prior_location::T, prior_variance::T, prior_shape::T, prior_rate::T}Returns the hyperparameters of the finite Gaussian mixture model gm as a NamedTuple.
Markov chain Monte Carlo
StatsBase.sample — Method
sample(
[rng::Random.AbstractRNG],
fgm::FiniteGaussianMixture{T},
n_samples::Int;
n_burnin::Int = min(1000, div(n_samples, 5)),
initial_params::NamedTuple = _get_default_initparams_mcmc(rfgm)
) where {T} -> PosteriorSamples{T}Generate n_samples posterior samples from a FiniteGaussianMixture using an augmented Gibbs sampler.
Arguments
rng: Optional random seed used for random variate generation.fgm: TheFiniteGaussianMixtureobject for which posterior samples are generated.n_samples: The total number of samples (including burn-in).
Keyword arguments
n_burnin: Number of burn-in samples.initial_params: Initial values used in the MCMC algorithm. Should be supplied as aNamedTuplewith fields:μ,:σ2and:w, where all areK-dimensional vectors.
The following constraints must also be satisfied: σ2[k]>0 for all k and w[k]≥0 for all k and sum(w) ≈ 1
Returns
ps: APosteriorSamplesobject holding the posterior samples and the original model object.
Examples
julia> using Random
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> fgm = FiniteGaussianMixture(x, 4);
julia> ps1 = sample(fgm, 5_000);
julia> ps2 = sample(fgm, 5_000; n_burnin=2_000, initial_params = (μ = [0.2, 0.8], σ2 = [1.0, 2.0], w = [0.7, 0.3]));Variational inference
BayesDensityFiniteGaussianMixture.FiniteGaussianMixtureVIPosterior — Type
FiniteGaussianMixtureVIPosterior{T<:Real} <: AbstractVIPosterior{T}Struct representing the variational posterior distribution of a FiniteGaussianMixture.
Fields
q_w: Distribution representing the optimal variational densities of the component weights q*(w|K).q_μ: Product distribution representing the optimal variational densitiy of the component means q*(μ|K).q_σ2: Product distribution representing the optimal variational densitiy of the component variances q*(σ2|K).q_β: The optimal variational density q*(β|k) of the rate hyperparameter of the component variance σ2[k].fgm: TheFiniteGaussianMixtureto which the variational posterior was fit.
BayesDensityCore.varinf — Method
varinf(
fgm::FiniteGaussianMixture{T};
initial_params::NamedTuple = _get_default_initparams(x),
max_iter::Int = 2000
rtol::Real = 1e-6
) where {T} -> PitmanYorMixtureVIPosterior{T}Find a variational approximation to the posterior distribution of a FiniteGaussianMixture using mean-field variational inference.
Arguments
fgm: TheFiniteGaussianMixturewhose posterior we want to approximate.
Keyword arguments
initial_params: Initial values of the VI parametersdirichlet_paramslocation_params,variance_params,shape_paramsandrate_params, supplied as a NamedTuple.max_iter: Maximal number of VI iterations. Defaults to1000.rtol: Relative tolerance used to determine convergence. Defaults to1e-6.
Returns
vip: AFiniteGaussianMixtureVIPosteriorobject representing the variational posterior.info: AVariationalOptimizationResultdescribing the result of the optimization.
To perform the optimization for a fixed number of iterations irrespective of the convergence criterion, one can set rtol = 0.0, and max_iter equal to the desired total iteration count. Note that setting rtol to a strictly negative value will issue a warning.
Examples
julia> using Random
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> fgm = FiniteGaussianMixture(x, 10);
julia> vip, info = varinf(fgm);
julia> vip, info = varinf(fgm; rtol=1e-7, max_iter=3000);Extended help
Convergence
The criterion used to determine convergence is that the relative change in the ELBO falls below the given rtol.
- 1We use the rate parameterization of the Gamma distribution here. This differs from the scale-parameterization used by
Distributions.