PitmanYorMixture
Documentation for Pitman–Yor mixture models Ishwaran and James (2001), with a normal kernel and a normal-inverse gamma base measure.
PitmanYorMixture is an infinite-dimensional mixture model, where the regularity of the density is primarily governed through the mixture weights. Mathematically, the Pitman–Yor mixture model can be described as follows:
\[\begin{align*} x_i \,|\, \mu_i, \sigma_i^2 &\sim \mathrm{Normal}(\mu_i, \sigma_i^2), &i = 1,\ldots, n,\\ \mu_i, \sigma_i^2 \,|\, G &\sim G, &i = 1,\ldots, n,\\ G &\sim \mathrm{PitmanYor}(\alpha, \theta, G_0), \end{align*}\]
where $0 \leq \theta < 1$, $\alpha > -\theta$ and $G_0$ is the $\mathrm{NormalInverseGamma}(\mu_0, \lambda, a, b)$-distribution for some fixed hyperparameters $\mu_0\in \mathbb{R}, \lambda, a, b > 0$.
Alternatively, the Pitman–Yor mixture model can be written down in its stickbreaking form,
\[\begin{align*} x_i \,|\, \boldsymbol{v}, \boldsymbol{\mu}, \boldsymbol{\sigma}^2 &\sim \sum_{k=1}^\infty \frac{w_k}{\sigma_k} \phi\Big(\frac{x_i - \mu_k}{\sigma_k}\Big), &i = 1,\ldots, n,\\ v_k &\sim \text{Beta}(1-\theta, \alpha + k\theta), &k \in \mathbb{N},\\ \mu_k, \sigma_k^2 &\sim \text{NormalInverseGamma}(\mu_0, \lambda, a, b), &k \in \mathbb{N} \end{align*}\]
where $\phi(\cdot)$ is the density of the standard normal distribution and $w_k = v_k\prod_{j=1}^{k-1} (1 - v_j)$ for $k\in \mathbb{N}$.
For Markov chain Monte Carlo based inference, this module implements algorithm 2 by Neal (2000). For variational inference, we implement the truncated stickbreaking approach of Blei and Jordan (2006).
Since Dirichlet process mixture models are equivalent to a Pitman-Yor mixture model with discount parameter equal to 0, this module can also be used to fit the former type of models.
Module API
BayesDensityPitmanYorMixture.PitmanYorMixture — Type
PitmanYorMixture{T<:Real} <: AbstractBayesDensityModel{T}Struct representing a Pitman-Yor mixture model with a normal kernel and a conjugate Normal-InverseGamma base measure.
Constructors
PitmanYorMixture(x::AbstractVector{<:Real}; kwargs...)
PitmanYorMixture{T}(x::AbstractVector{<:Real}; kwargs...)Arguments
x: The data vector.
Keyword arguments
discount: Discount parameter of the Pitman-Yor process. Defaults to0.0, corresponding to a Dirichlet Process.strength: Strength parameter of the Pitman-Yor process. Defaults to1.0.prior_location: Prior mean of the location parameterμ. Defaults tomean(x).prior_inv_scale_fac: Factor by which the conditional prior varianceσ2ofμis scaled. Defaults to1.prior_shape: Prior shape parameter of the squared scale parameterσ2: Defaults to2.0.prior_rate: Prior rate parameter of the squared scale parameterσ2. Defaults tovar(x).
Examples
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> pym = PitmanYorMixture(x)
PitmanYorMixture{Float64}:
Using 5000 observations.
Hyperparameters:
discount = 0.0, strength = 1.0
prior_location = 0.578555, prior_inv_scale_fac = 1.0
prior_shape = 2.0, prior_rate = 0.0334916
julia> pym = PitmanYorMixture(x; strength = 2, discount = 0.5);Extended help
Evaluating the pdf and cdf
Distributions.pdf — Method
pdf(
bsm::PitmanYorMixture,
params::NamedTuple,
t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}
pdf(
bsm::PitmanYorMixture,
params::AbstractVector{NamedTuple},
t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}Evaluate $f(t\, |\, \boldsymbol{\eta})$ for a given PitmanYorMixture when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.
The named tuple should contain fields named :μ, :σ2 and a third field named :cluster_counts or :w, depending on whether the marginal or stickbreaking parameterization is used.
Distributions.cdf — Method
cdf(
bsm::PitmanYorMixture,
params::NamedTuple,
t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}
cdf(
bsm::PitmanYorMixture,
params::AbstractVector{NamedTuple},
t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}Evaluate $F(t\, |\, \boldsymbol{\eta})$ for a given PitmanYorMixture when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.
The named tuple should contain fields named :μ, :σ2 and a third field named :cluster_counts or :w, depending on whether the marginal or stickbreaking parameterization is used.
Utility functions
BayesDensityCore.hyperparams — Method
hyperparams(
pym::PitmanYorMixture{T}
) where {T} -> @NamedTuple{discount::T, strength::T, prior_location::T, prior_inv_scale_fac::T, prior_shape::T, prior_rate::T}Returns the hyperparameters of the Pitman-Yor mixture model pym as a NamedTuple.
Markov chain Monte Carlo
StatsBase.sample — Method
sample(
[rng::Random.AbstractRNG],
pym::PitmanYorMixture{T},
n_samples::Int;
n_burnin::Int = min(1000, div(n_samples, 5)),
initial_params::NamedTuple = _get_default_initparams_mcmc(pym)
) where {T} -> PosteriorSamples{T}Generate n_samples posterior samples from a PitmanYorMixture using an augmented marginal Gibbs sampler.
Arguments
rng: Optional random seed used for random variate generation.pym: ThePitmanYorMixtureobject for which posterior samples are generated.n_samples: The total number of samples (including burn-in).
Keyword arguments
n_burnin: Number of burn-in samples.initial_params: Initial values used in the MCMC algorithm. Should be supplied as aNamedTuplewith fields:μ,:σ2and:cluster_alloc, whereμandσ2are vector of the same dimension, andcluster_allocis vector of lengthlength(x)indicating the cluster membership of each observation.
Returns
ps: APosteriorSamplesobject holding the posterior samples and the original model object.
Examples
julia> using Random
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> pym = PitmanYorMixture(x);
julia> ps = sample(Xoshiro(1), model, 5000);
julia> ps = sample(Xoshiro(1), model, 5000; initial_params = (μ = [0.2, 0.8], σ2 = [1.0, 1.0], cluster_alloc = vcat(fill(1, 2500), fill(2, 2500))));Variational inference
BayesDensityPitmanYorMixture.PitmanYorMixtureVIPosterior — Type
PitmanYorMixtureVIPosterior{T<:Real} <: AbstractVIPosterior{T}Struct representing the variational posterior distribution of a PitmanYorMixture.
Fields
q_v: Vector of distributions representing the optimal variational densities q*(vₖ), i.e. the density of the stick-breaking weights.q_θ: Vector of distributions representing the optimal variational densities q*(θₖ), i.e. the joint density of the mixture component means and variances.pym: ThePitmanYorMixtureto which the variational posterior was fit.
BayesDensityCore.varinf — Method
varinf(
pym::PitmanYorMixture{T};
truncation_level::Int = 25,
initial_params::NamedTuple = _get_default_initparams(pym, truncation_level),
max_iter::Int = 3000
rtol::Real = 1e-6
) where {T} -> PitmanYorMixtureVIPosterior{T}Find a variational approximation to the posterior distribution of a PitmanYorMixture using mean-field variational inference based on a truncated stickbreaking-approach.
Arguments
pym: ThePitmanYorMixturewhose posterior we want to approximate.
Keyword arguments
truncation level: Positive integer specifying the truncation level of the variational approximation. Defaults to25.initial_params: Initial values of the VI parametersa_vb_v,locationsandinv_scale_facs,shapesandrates, supplied as a NamedTuple. Must have dimensions matching the supplied truncation level.max_iter: Maximal number of VI iterations. Defaults to3000.rtol: Relative tolerance used to determine convergence. Defaults to1e-6.
Returns
vip: APitmanYorMixtureVIPosteriorobject representing the variational posterior.info: AVariationalOptimizationResultdescribing the result of the optimization.
To perform the optimization for a fixed number of iterations irrespective of the convergence criterion, one can set rtol = 0.0, and max_iter equal to the desired total iteration count. Note that setting rtol to a strictly negative value will issue a warning.
Extended help
Convergence
The criterion used to determine convergence is that the relative change in the ELBO falls below the given rtol.
Truncation
The truncation level determines the maximal number of components used in the variational approximation. Generally, setting the truncation level to a higher value leads to an approximating class with a greater representational capacity, at the cost of increased computation.