PitmanYorMixture

Documentation for Pitman–Yor mixture models Ishwaran and James (2001), with a normal kernel and a normal-inverse gamma base measure.

PitmanYorMixture is an infinite-dimensional mixture model, where the regularity of the density is primarily governed through the mixture weights. Mathematically, the Pitman–Yor mixture model can be described as follows:

\[\begin{align*} x_i \,|\, \mu_i, \sigma_i^2 &\sim \mathrm{Normal}(\mu_i, \sigma_i^2), &i = 1,\ldots, n,\\ \mu_i, \sigma_i^2 \,|\, G &\sim G, &i = 1,\ldots, n,\\ G &\sim \mathrm{PitmanYor}(\alpha, \theta, G_0), \end{align*}\]

where $0 \leq \theta < 1$, $\alpha > -\theta$ and $G_0$ is the $\mathrm{NormalInverseGamma}(\mu_0, \lambda, a, b)$-distribution for some fixed hyperparameters $\mu_0\in \mathbb{R}, \lambda, a, b > 0$.

Alternatively, the Pitman–Yor mixture model can be written down in its stickbreaking form,

\[\begin{align*} x_i \,|\, \boldsymbol{v}, \boldsymbol{\mu}, \boldsymbol{\sigma}^2 &\sim \sum_{k=1}^\infty \frac{w_k}{\sigma_k} \phi\Big(\frac{x_i - \mu_k}{\sigma_k}\Big), &i = 1,\ldots, n,\\ v_k &\sim \text{Beta}(1-\theta, \alpha + k\theta), &k \in \mathbb{N},\\ \mu_k, \sigma_k^2 &\sim \text{NormalInverseGamma}(\mu_0, \lambda, a, b), &k \in \mathbb{N} \end{align*}\]

where $\phi(\cdot)$ is the density of the standard normal distribution and $w_k = v_k\prod_{j=1}^{k-1} (1 - v_j)$ for $k\in \mathbb{N}$.

For Markov chain Monte Carlo based inference, this module implements algorithm 2 by Neal (2000). For variational inference, we implement the truncated stickbreaking approach of Blei and Jordan (2006).

Note

Since Dirichlet process mixture models are equivalent to a Pitman-Yor mixture model with discount parameter equal to 0, this module can also be used to fit the former type of models.

Module API

BayesDensityPitmanYorMixture.PitmanYorMixture — Type

PitmanYorMixture{T<:Real} <: AbstractBayesDensityModel{T}

Struct representing a Pitman-Yor mixture model with a normal kernel and a conjugate Normal-InverseGamma base measure.

Constructors

PitmanYorMixture(x::AbstractVector{<:Real}; kwargs...)
PitmanYorMixture{T}(x::AbstractVector{<:Real}; kwargs...)

Arguments

x: The data vector.

Keyword arguments

discount: Discount parameter of the Pitman-Yor process. Defaults to 0.0, corresponding to a Dirichlet Process.
strength: Strength parameter of the Pitman-Yor process. Defaults to 1.0.
prior_location: Prior mean of the location parameter μ. Defaults to mean(x).
prior_inv_scale_fac: Factor by which the conditional prior variance σ2 of μ is scaled. Defaults to 1.
prior_shape: Prior shape parameter of the squared scale parameter σ2: Defaults to 2.0.
prior_rate: Prior rate parameter of the squared scale parameter σ2. Defaults to var(x).

Examples

julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);

julia> pym = PitmanYorMixture(x)
PitmanYorMixture{Float64}:
Using 5000 observations.
Hyperparameters:
 discount = 0.0, strength = 1.0
 prior_location = 0.578555, prior_inv_scale_fac = 1.0
 prior_shape = 2.0, prior_rate = 0.0334916

julia> pym = PitmanYorMixture(x; strength = 2, discount = 0.5);

Extended help

source

Evaluating the pdf and cdf

Distributions.pdf — Method

pdf(
    bsm::PitmanYorMixture,
    params::NamedTuple,
    t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}

pdf(
    bsm::PitmanYorMixture,
    params::AbstractVector{NamedTuple},
    t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}

Evaluate $f(t\, |\, \boldsymbol{\eta})$ for a given PitmanYorMixture when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.

The named tuple should contain fields named :μ, :σ2 and a third field named :cluster_counts or :w, depending on whether the marginal or stickbreaking parameterization is used.

source

Distributions.cdf — Method

cdf(
    bsm::PitmanYorMixture,
    params::NamedTuple,
    t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}

cdf(
    bsm::PitmanYorMixture,
    params::AbstractVector{NamedTuple},
    t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}

Evaluate $F(t\, |\, \boldsymbol{\eta})$ for a given PitmanYorMixture when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.

The named tuple should contain fields named :μ, :σ2 and a third field named :cluster_counts or :w, depending on whether the marginal or stickbreaking parameterization is used.

source

Utility functions

BayesDensityCore.hyperparams — Method

hyperparams(
    pym::PitmanYorMixture{T}
) where {T} -> @NamedTuple{discount::T, strength::T, prior_location::T, prior_inv_scale_fac::T, prior_shape::T, prior_rate::T}

Returns the hyperparameters of the Pitman-Yor mixture model pym as a NamedTuple.

source

Markov chain Monte Carlo

StatsBase.sample — Method

sample(
    [rng::Random.AbstractRNG],
    pym::PitmanYorMixture{T},
    n_samples::Int;
    n_burnin::Int              = min(1000, div(n_samples, 5)),
    initial_params::NamedTuple = _get_default_initparams_mcmc(pym)
) where {T} -> PosteriorSamples{T}

Generate n_samples posterior samples from a PitmanYorMixture using an augmented marginal Gibbs sampler.

Arguments

rng: Optional random seed used for random variate generation.
pym: The PitmanYorMixture object for which posterior samples are generated.
n_samples: The total number of samples (including burn-in).

Keyword arguments

n_burnin: Number of burn-in samples.
initial_params: Initial values used in the MCMC algorithm. Should be supplied as a NamedTuple with fields :μ, :σ2 and :cluster_alloc, where μ and σ2 are vector of the same dimension, and cluster_alloc is vector of length length(x) indicating the cluster membership of each observation.

Returns

ps: A PosteriorSamples object holding the posterior samples and the original model object.

Examples

julia> using Random

julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);

julia> pym = PitmanYorMixture(x);

julia> ps = sample(Xoshiro(1), model, 5000);

julia> ps = sample(Xoshiro(1), model, 5000; initial_params = (μ = [0.2, 0.8], σ2 = [1.0, 1.0], cluster_alloc = vcat(fill(1, 2500), fill(2, 2500))));

source

Variational inference

BayesDensityPitmanYorMixture.PitmanYorMixtureVIPosterior — Type

PitmanYorMixtureVIPosterior{T<:Real} <: AbstractVIPosterior{T}

Struct representing the variational posterior distribution of a PitmanYorMixture.

Fields

q_v: Vector of distributions representing the optimal variational densities q*(vₖ), i.e. the density of the stick-breaking weights.
q_θ: Vector of distributions representing the optimal variational densities q*(θₖ), i.e. the joint density of the mixture component means and variances.
pym: The PitmanYorMixture to which the variational posterior was fit.

source

BayesDensityCore.varinf — Method

varinf(
    pym::PitmanYorMixture{T};
    truncation_level::Int      = 25,
    initial_params::NamedTuple = _get_default_initparams(pym, truncation_level),
    max_iter::Int              = 3000
    rtol::Real                 = 1e-6
) where {T} -> PitmanYorMixtureVIPosterior{T}

Find a variational approximation to the posterior distribution of a PitmanYorMixture using mean-field variational inference based on a truncated stickbreaking-approach.

Arguments

pym: The PitmanYorMixture whose posterior we want to approximate.

Keyword arguments

truncation level: Positive integer specifying the truncation level of the variational approximation. Defaults to 25.
initial_params: Initial values of the VI parameters a_v b_v, locations and inv_scale_facs, shapes and rates, supplied as a NamedTuple. Must have dimensions matching the supplied truncation level.
max_iter: Maximal number of VI iterations. Defaults to 3000.
rtol: Relative tolerance used to determine convergence. Defaults to 1e-6.

Returns

vip: A PitmanYorMixtureVIPosterior object representing the variational posterior.
info: A VariationalOptimizationResult describing the result of the optimization.

Note

To perform the optimization for a fixed number of iterations irrespective of the convergence criterion, one can set rtol = 0.0, and max_iter equal to the desired total iteration count. Note that setting rtol to a strictly negative value will issue a warning.

Extended help

Convergence

The criterion used to determine convergence is that the relative change in the ELBO falls below the given rtol.

Truncation

The truncation level determines the maximal number of components used in the variational approximation. Generally, setting the truncation level to a higher value leads to an approximating class with a greater representational capacity, at the cost of increased computation.

source