RandomFiniteGaussianMixture
Documentation for finite Gaussian mixture models, with a variable (random) number of mixture components.
This model is available through the BayesDensityFiniteGaussianMixture package.
RandomFiniteGaussianMixture models the data-generating density as a mixture of normal distributions with an unknown number of mixture components. Our prior and model-specification follows that of Richardson and Green (1997), and can be described as follows:
\[\begin{align*} x_i\,|\, K, \boldsymbol{w}, \boldsymbol{\mu}, \boldsymbol{\sigma}^2 &\sim \sum_{k=1}^K \frac{w_k}{\sigma_k} \phi\Big(\frac{x_i - \mu_k}{\sigma_k}\Big), &i = 1,\ldots, n,\\ w_k \,|\, K &\sim \mathrm{Dirichlet}_K(\alpha, \ldots, \alpha),\\ \mu_k \,|\, K &\sim \mathrm{Normal}(\mu_0, \sigma_0^2), &k = 1,\ldots, K,\\ \sigma_k^2 \,|\, K, \beta &\sim \mathrm{InverseGamma}(a_\sigma, \beta), &k = 1,\ldots, K,\\ \beta &\sim \mathrm{Gamma}(a_\beta, b_\beta),\\ K &\sim p(K) \end{align*}\]
where $\phi(\cdot)$ denotes the density of the standard normal distribution, $\mu_0 \in \mathbb{R}, \alpha, \sigma_0^2, a_\sigma, a_\beta, b_\beta > 0$ are fixed hyperparameters and $p(K)$ is a probability mass function supported on a finite subset of the positive integers.[1]
This model is available through the BayesDensityFiniteGaussianMixture package.
For Markov chain Monte Carlo-based inference, we provide an implementation of the telescope sampler (Frühwirth-Schnatter et al., 2021).
The variational inference algorithm used to compute the posterior first proceeds by separately fitting mixture models for different values of $K$, recording the corresponding value of the optimized evidence lower bound, $\mathrm{ELBO}(K)$ at the end of each optimization. The posterior over the number of mixture components $p(K\,|\, \boldsymbol{x})$ is then approximated via
\[q(K) \propto p(K)\,\exp\big\{\mathrm{ELBO}(K)\big\}.\]
This approximation can be justified in light of the fact that the ELBO is a lower bound on the log-marginal likelihood $p(\boldsymbol{x}\,|\, K)$. The approximate posterior for the number of mixture components together with the optimal variational densities given $K$ defines a distribution over a space of mixture of variable dimension, which is then used to make inferences about the density of the given sample.
The algorithm used to compute the conditional variational posterior $q(\boldsymbol{\mu}|k)\,q(\boldsymbol{\sigma}^2|k)\,q(\boldsymbol{w}|k)$ is a variant of the algorithm 5 in Ormerod and Wand (2010). Note that our version also includes an additional hyperprior on the rate parameters of the mixture scales and that the algorithm has been adjusted to account for this fact.
There are two main ways of proceeding with Bayesian inference for the variational posterior. One possibility is to proceed with the single value $\widehat{K}$ that maximizes the variational probability $q(K)$, the so-called maximum a posteriori model. Posterior inference then proceeds via the conditional variational posterior $q\big(\boldsymbol{\mu}, \boldsymbol{\sigma}^2, \boldsymbol{w} | \widehat{K}\big)$. This model can be retrieved by utilizing the maximum_a_posteriori method on a fitted variational posterior, which can then be used for posterior inference.
Another possibility is to take a fully Bayesian approach, where we do not condition on a single value of $K$, but treat it as a random variable. To pursure this approach to posterior inference, one can simply use the object returned by calling varinf directly (e.g. for plotting or computing other posterior summary statistics).
Module API
BayesDensityFiniteGaussianMixture.RandomFiniteGaussianMixture — Type
RandomFiniteGaussianMixture{T<:Real} <: AbstractBayesDensityModel{T}Struct representing a finite Gaussian mixture model with a variable (random) number of components.
Constructors
RandomFiniteGaussianMixture(x::AbstractVector{<:Real}; kwargs...)
RandomFiniteGaussianMixture{T}(x::AbstractVector{<:Real}; kwargs...)Arguments
x: The data vector.
Keyword arguments
prior_components: ADistributions.DiscreteNonParametricdistribution instance specifying the prior on the number of componentsK. Defaults toDiscreteNonParametric(1:50, fill(T(1/50), 50)), corresponding to a uniform prior on the set {1, …, 50}.prior_strength: Strength parameter of the symmetric Dirichlet prior on the mixture weights. E.g. the prior is Dirichlet(strength, ..., strength). Defaults to1.0.prior_location: Prior mean of the location parametersμ[k]. Defaults to the midpoint of the minimum and maximum values in the sample.prior_variance: The prior variance of the location parameterμ[k]. Defaults to the sample range.prior_shape: Prior shape parameter of the squared scale parametersσ2[k]: Defaults to2.0.hyperprior_shape: Prior shape parameter of the hyperprior on the rate parameter ofσ2[k]. Defaults to0.2.hyperprior_rate: Prior rate parameter of the hyperprior on the rate parameter ofσ2[k]. Defaults to0.2*R^2, whereRis the sample range.
Examples
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> rfgm = RandomFiniteGaussianMixture(x)
RandomFiniteGaussianMixture{Float64} with 20 values for the number mixture components.
Using 5000 observations.
Hyperparameters:
prior_location = 0.5, prior_variance = 1.0
prior_shape = 2.0, hyperprior_shape = 0.2, hyperprior_rate = 10.0
prior_strength = 1.0
julia> rfgm = RandomFiniteGaussianMixture(x; prior_components = DiscreteNonParametric(1:12, fill(1/12, 12)));Evaluating the pdf and cdf
Distributions.pdf — Method
pdf(
bsm::RandomFiniteGaussianMixture,
params::NamedTuple,
t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}
pdf(
bsm::RandomFiniteGaussianMixture,
params::AbstractVector{NamedTuple},
t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}Evaluate $f(t\, |\, \boldsymbol{\eta})$ for a given RandomFiniteGaussianMixture when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.
The named tuple should contain fields named :μ, :σ2, :w and optionally :β.
Distributions.cdf — Method
cdf(
bsm::RandomFiniteGaussianMixture,
params::NamedTuple,
t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}
cdf(
bsm::RandomFiniteGaussianMixture,
params::AbstractVector{NamedTuple},
t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}Evaluate $F(t\, |\, \boldsymbol{\eta})$ for a given RandomFiniteGaussianMixture when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.
The named tuple should contain fields named :μ, :σ2, :w and optionally :β.
Utility functions
BayesDensityCore.hyperparams — Method
hyperparams(
gm::RandomFiniteGaussianMixture{T}
) where {T} -> @NamedTuple{prior_components::DiscreteNonParametric{Int, T}, prior_strength::T, prior_location::T, prior_variance::T, prior_shape::T, prior_rate::T}Returns the hyperparameters of the finite Gaussian mixture model gm as a NamedTuple.
Markov chain Monte Carlo
StatsBase.sample — Method
sample(
[rng::Random.AbstractRNG],
rfgm::RandomFiniteGaussianMixture{T},
n_samples::Int;
n_burnin::Int = min(1000, div(n_samples, 5)),
initial_params::NamedTuple = _get_default_initparams_mcmc(rfgm)
) where {T} -> PosteriorSamples{T}Generate n_samples posterior samples from a RandomFiniteGaussianMixture using the telescope sampler.
Arguments
rng: Optional random seed used for random variate generation.rfgm: TheRandomFiniteGaussianMixtureobject for which posterior samples are generated.n_samples: The total number of samples (including burn-in).
Keyword arguments
n_burnin: Number of burn-in samples.initial_params: Initial values used in the MCMC algorithm. Should be supplied as aNamedTuplewith fields:μ,:σ2and:w, where all areK-dimensional vectors.
The following constraints must also be satisfied: σ2[k]>0 for all k and w[k]≥0 for all k and sum(w) ≈ 1
Returns
ps: APosteriorSamplesobject holding the posterior samples and the original model object.
Examples
julia> using Random
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> rfgm = RandomFiniteGaussianMixture(x);
julia> ps1 = sample(rfgm, 5_000);
julia> ps2 = sample(rfgm, 5_000; n_burnin=2_000, initial_params = (μ = [0.2, 0.8], σ2 = [1.0, 2.0], w = [0.7, 0.3]));Variational inference
BayesDensityFiniteGaussianMixture.RandomFiniteGaussianMixtureVIPosterior — Type
RandomFiniteGaussianMixtureVIPosterior{T<:Real} <: AbstractVIPosterior{T}Struct representing the variational posterior distribution of a RandomFiniteGaussianMixture.
Fields
posterior_components: The posterior probabilities on the number of components. Note thatsupport(posterior_components)[K]corresponds to the posterior probability of modelK.mixture_fits: Vector ofFiniteGaussianMixtureVIPosteriorobjects, containing the fitted variational posterior distributions for differing values of mixture components.rgfm: TheRandomFiniteGaussianMixtureto which the variational posterior was fit.
BayesDensityCore.varinf — Method
varinf(
rfgm::RandomFiniteGaussianMixture{T};
max_iter::Int = 2000
rtol::Real = 1e-6
) where {T} -> PitmanYorMixtureVIPosterior{T}Find a variational approximation to the posterior distribution of a RandomFiniteGaussianMixture using mean-field variational inference.
Arguments
rfgm: TheRandomFiniteGaussianMixturewhose posterior we want to approximate.
Keyword arguments
max_iter: Maximal number of VI iterations. Defaults to2000.rtol: Relative tolerance used to determine convergence. Defaults to1e-6.
Returns
vip: ARandomFiniteGaussianMixtureVIPosteriorobject representing the variational posterior.info: AVariationalOptimizationResultdescribing the result of the optimization.
To perform the optimization for a fixed number of iterations irrespective of the convergence criterion, one can set rtol = 0.0, and max_iter equal to the desired total iteration count. Note that setting rtol to a strictly negative value will issue a warning.
Examples
julia> using Random
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> rfgm = RandomFiniteGaussianMixture(x);
julia> vip = varinf(rfgm);
julia> vip = varinf(rfgm; rtol=1e-7, max_iter=3000);BayesDensityFiniteGaussianMixture.posterior_components — Function
posterior_components(vip::RandomFiniteGaussianMixtureVIPosterior{T}) where {T} -> DiscreteNonParametric{Int, T}Get the variational posterior probability mass function of the number of mixture components as a DiscreteNonParametric instance.
BayesDensityFiniteGaussianMixture.maximum_a_posteriori — Function
maximum_a_posteriori(
vip::RandomFiniteGaussianMixtureVIPosterior{T}
) where {T} -> FiniteGaussianMixtureVIPosterior{T}Get the variational posterior distribution that maximizes the approximate posterior probability on the number of components q(K).
- 1We use the rate parameterization of the Gamma distribution here. This differs from the scale-parameterization used by
Distributions.