BSplineMixture
Documentation for B-Spline Mixture Models.
This model is available through the BayesDensityBSplineMixture package.
BSplineMixture models the data-generating density as a mixture of cubic B-spline basis functions of a fixed, high dimension. The regularity of the estimate is controlled via a smoothness-inducing prior on the spline coefficients. Mathematically, the model can be written as
\[\begin{align*} x_i\,|\, \boldsymbol{\beta} &\sim \sum_{k=1}^K \theta_k b_k(x_i), &i = 1,\ldots, n,\\ \beta_k &\sim \text{Normal}(\mu_k, \sigma_\beta^2), &k = 1, 2,\\ \Delta^2 \big\{\beta_k - \mu_k\big\} \,|\, \tau^2, \boldsymbol{\delta}^2 &\sim \text{Normal}(0, \tau^2\delta_k^2), &k \geq 3\\ \tau^2 &\sim \text{InverseGamma}(a_\tau, b_\tau),\\ \delta^2_k &\sim \text{InverseGamma}(a_\delta, b_\delta), &k = 1, \ldots, K-3,\\ \end{align*}\]
where $b_k(\cdot)$ is a B-spline basis function, normalized to have unit integral, $\boldsymbol{\mu}\in \mathbb{R}^{K-1}, \sigma_\beta, a_\tau, b_\tau, a_\delta, b_\delta > 0$ are fixed hyperparameters, $\Delta^2 \alpha_k= \alpha_k - 2\alpha_{k-1} + \alpha_{k-2}$ is the discrete second-order difference operator and $\boldsymbol{\theta} = \boldsymbol{\theta}(\boldsymbol{\beta})$ is defined by the logistic stickbreaking-map,
\[\begin{align*} \theta_k &= \frac{e^{\beta_k}}{1 + e^{\beta_k}} \prod_{j = 1}^{k-1} \frac{1}{1 + e^{\beta_j}}, &k = 1,\ldots, K-1\\ \theta_K &= 1 - \sum_{k=1}^{K-1} \theta_k. \end{align*}\]
This model is supported on a compact interval, which by default is estimates based on the observed sample. It is also possible to manually specify the support of the model through the bounds keyword argument.
Module API
BayesDensityBSplineMixture.BSplineMixture — Type
BSplineMixture{T<:Real} <: AbstractBayesDensityModel{T}Struct representing a B-spline mixture model.
Constructors
BSplineMixture(x::AbstractVector{<:Real}; kwargs...)
BSplineMixture{T}(x::AbstractVector{<:Real}; kwargs...)Arguments
x: The data vector.
Keyword arguments
K: B-spline basis dimension of a regular augmented spline basis. Defaults to max(100, min(200, ⌈n/5⌉))bounds: A tuple giving the support of the B-spline mixture model.n_bins: Lower bound on the number of bins used when fitting theBSplineMixtureto data. Binned fitting can be disabled by setting this equal tonothing. The default setting uses unbinned fitting iflength(x) ≤ 1200and1000bins otherwise.prior_global_shape: Shape hyperparameter for the global smoothing parameter τ². Defaults to1.0.prior_global_rate: Rate hyperparameter for the global smoothing parameter τ². Defaults to1e-3.prior_local_shape: Shape hyperparameter for the local smoothing parameters δₖ². Defaults to0.5.prior_local_rate: Rate hyperparameter for the local smoothing parameters δₖ². Defaults to0.5.prior_stdev: Prior standard deviation of the first two unconstrained spline parameters β₁ and β₂. Defaults to1e5.
Returns
bsm: A B-Spline mixture model object.
Examples
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> model = BSplineMixture(x)
200-dimensional BSplineMixture{Float64}:
Using 5000 binned observations on a regular grid consisting of 1187 bins.
support: (-0.05, 1.05)
Hyperparameters:
prior_global_shape = 1.0, prior_global_rate = 0.001
prior_local_shape = 0.5, prior_local_rate = 0.5
julia> model = BSplineMixture(x; K = 150, bounds=(0, 1), n_bins=nothing, prior_global_rate = 5e-3);Extended help
Binned fitting
To disable binned fitting, one can set n_bins=nothing. Note that the binning is only used as part of the model fitting procedure, and the structure of the resulting fitted model object is the same regardless of whether the binning step is performed or not. Empirically, the results obtained from running the binned and unbinned model fitting procedures tend to be very similar. We therefore recommend using the binned fitting procedure, due to the large improvements in model fitting speed, particularly for larger samples.
For computational reasons, the supplied number of bins is rounded up to the nearest integer such that the bin boundaries overlap with the knots of the spline basis. This is done to ensure that at most 4 cubic splines have positive integrals over each bin.
Hyperparameter selection
The global variance parameter τ2 and the local variance parameters δ2[k] govern the smoothness of the B-spline mixture prior through the centered random walk prior on β | τ2, δ2:
β[k+2] = μ[k+2] + 2 {β[k+1] - μ[k+1]} - {β[k] - μ[k]} + τ * δ[k] * ϵ[k],where ϵ[k] is standard normal. The first two parameters β[1] and β[2] are assigned diffuse N(0, σ²) priors.
The prior distributions of the local and global smoothing parameters are given by
τ² ∼ InverseGamma(a_τ, b_τ)
δₖ² ∼ InverseGamma(a_δ, b_δ), 1 ≤ k ≤ K-3.As noninformative defaults, we suggest using prior_global_shape = 1, prior_global_rate = 1e-3, prior_local_shape = 0.5, prior_local_rate = 0.5 and prior_stdev = 1e5. To control the smoothness in the resulting density estimates, we recommend adjusting the value of prior_global_rate while keeping the other hyperparameters fixed. Setting prior_global_rate to a smaller value generally yields smoother curves. Similar priors for regression models suggest that values in the range [5e-5, 5e-3] are reasonable.
Evaluating the pdf and cdf
Distributions.pdf — Method
pdf(
bsm::BSplineMixture,
params::NamedTuple,
t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}
pdf(
bsm::BSplineMixture,
params::AbstractVector{NamedTuple},
t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}Evaluate $f(t\, |\, \boldsymbol{\eta})$ for a given BSplineMixture when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.
The named tuple should contain a field named :spline_coefs or :β.
Distributions.cdf — Method
cdf(
bsm::BSplineMixture,
params::NamedTuple,
t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}
cdf(
bsm::BSplineMixture,
params::AbstractVector{NamedTuple},
t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}Evaluate $F(t\, |\, \boldsymbol{\eta})$ for a given BSplineMixture when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.
The named tuple should contain a field named :spline_coefs or :β.
Utility functions
BayesDensityCore.hyperparams — Method
hyperparams(
bsm::BSplineMixture{T}
) where {T} -> @NamedTuple{prior_global_shape::T, prior_global_rate::T, prior_local_shape::T, prior_local_rate::T, prior_stdev::T}Returns the hyperparameters of the B-Spline mixture model bsm as a NamedTuple.
Markov chain Monte Carlo
StatsBase.sample — Method
sample(
[rng::Random.AbstractRNG],
bsm::BSplineMixture{T},
n_samples::Int;
n_burnin::Int = min(1000, div(n_samples, 5)),
initial_params::NamedTuple = get_default_initparams_mcmc(bsm)
) where {T} -> PosteriorSamples{T}Generate n_samples posterior samples from a BSplineMixture using an augmented Gibbs sampler.
Arguments
rng: Optional random seed used for random variate generation.bsm: TheBSplineMixtureobject for which posterior samples are generated.n_samples: The total number of samples (including burn-in).
Keyword arguments
n_burnin: Number of burn-in samples.initial_params: Initial values used in the MCMC algorithm. Should be supplied as aNamedTuplewith fields:βand:τ2, where:βis aK-1-dimensional vector andτ2is a positive scalar.
Returns
ps: APosteriorSamplesobject holding the posterior samples and the original model object.
Examples
julia> using Random
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> bsm = BSplineMixture(x);
julia> ps = sample(Xoshiro(1), bsm, 5000);Variational inference
BayesDensityBSplineMixture.BSplineMixtureVIPosterior — Type
BSplineMixtureVIPosterior{T<:Real, A<:MvNormalCanon{T}, B<:InverseGamma{T}, M<:BSplineMixture} <: AbstractVIPosterior{T}Struct representing the variational posterior distribution of a BSplineMixture.
Fields
q_β: Distribution representing the optimal variational density q*(β).q_τ: Distribution representing the optimal variational density q*(τ²).q_δ: Product distribution corresponding to the optimal variational density q*(δ²).bsm: TheBSplineMixtureto which the variational posterior was fit.
BayesDensityCore.varinf — Method
varinf(
bsm::BSplineMixture{T};
initial_params::NamedTuple = _get_default_initparams(bsm),
max_iter::Int = 2000
rtol::Real = 1e-6
) where {T} -> BSplineMixtureVIPosterior{T}Find a variational approximation to the posterior distribution of a BSplineMixture using mean-field variational inference.
Arguments
bsm: TheBSplineMixturewhose posterior we want to approximate.
Keyword arguments
initial_params: Initial values of the VI parametersμ_optinv_Σ_opt,b_τ_optandb_δ_opt, supplied as a NamedTuple.max_iter: Maximal number of VI iterations. Defaults to2000.rtol: Relative tolerance used to determine convergence. Defaults to1e-6.
Returns
vip: ABSplineMixtureVIPosteriorobject representing the variational posterior.info: AVariationalOptimizationResultdescribing the result of the optimization.
To sample for a fixed number of iterations irrespective of the convergence criterion, one can set rtol = 0.0, and max_iter equal to the desired total iteration count. Note that setting rtol to a strictly negative value will issue a warning.
Examples
julia> using Random
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> bsm = BSplineMixture(x);
julia> vip, info = varinf(bsm);
julia> vip, info = varinf(bsm; rtol=1e-7, max_iter=3000);Extended help
Convergence
The criterion used to determine convergence is that the relative change in the ELBO falls below the given rtol.