RandomBernsteinPoly
Documentation for random Bernstein polynomials (Petrone, 1999).
RandomBernsteinPoly models the data-generating density as a mixture of fixed $\text{Beta}$ distributions with an unknown number of mixture components. The smoothness of the density estimates is primarily controlled through the number of basis densities $K$, which is selected in a data-driven manner. Mathematically, the Bernstein density model and prior distribution takes the following form,
\[\begin{align*} x_i \,|\, K, \boldsymbol{w} &\sim \sum_{k=1}^K w_{k}b_{K,k}(x_i), &i = 1,\ldots, n,\\ \boldsymbol{w} \,|\, K &\sim \text{Dirichlet}_K(a/K, \ldots, a/K),\\ K &\sim p(K), \end{align*}\]
where $b_{K,k}(\cdot)$ denotes the probability density function of the $\text{Beta}(k, K-k+1)$-distribution, $a> 0$ a fixed hyperparameter and $p(K)$ is a probability mass function supported on a finite subset of the positive integers.
Although this model is formulated in terms of data on the unit interval, it can be also be applied more broadly to data on the entire real line. By default, we estimate the support of the Bernstein density based on the extrema of the data set, but it is also possible to use a prespecified support via the bounds keyword argument.
This module implements the Gibbs sampler of Petrone (1999) for Markov chain Monte Carlo-based posterior inference. We refer the interested reader to the same paper for a discussion of the role of the prior_strength hyperparameter. The default value of prior_strength = 1.0 is based on the discussion in Petrone and Wasserman (2002).
Keep in mind that this sampler is quite slow even for moderate sample sizes, due to the expensive sweeps involved in the Gibbs sampler.
Module API
BayesDensityRandomBernsteinPoly.RandomBernsteinPoly — Type
RandomBernsteinPoly{T<:Real} <: AbstractBayesDensityModel{T}Struct representing a random Bernstein polyomial.
Constructors
RandomBernsteinPoly(x::AbstractVector{<:Real}; kwargs...)
RandomBernsteinPoly{T}(x::AbstractVector{<:Real}; kwargs...)Arguments
x: The data vector.
Keyword arguments
prior_components: ADistributions.DiscreteNonParametricdistribution instance specifying the prior on the number of componentsK. Defaults toDiscreteNonParametric(1:50, fill(T(1/50), 50)), corresponding to a uniform prior on the set {1, …, 50}.prior_strength: Strength parameter of the symmetric Dirichlet prior on the mixture weights. E.g. the prior on the mixture weights is Dirichlet(strength/K, ..., strength/K) conditional on the number of mixture components being equal toK. Defaults to1.0.bounds: A tuple giving the support of the B-spline mixture model.
Returns
rbp: A random Bernstein polynomial model object.
Examples
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> rbp = RandomBernsteinPoly(x)
RandomBernsteinPoly{Float64} with 100 values for the number mixture components.
Using 5000 observations.
Hyperparameters:
prior_strength = 1.0
julia> fgm = RandomBernsteinPoly(x; prior_strength = 0.5, prior_components = DiscreteNonParametric(1:25, fill(1/25, 25)));Evaluating the pdf and cdf
Distributions.pdf — Method
pdf(
rbp::RandomBernsteinPoly,
params::NamedTuple,
t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}
pdf(
rbp::RandomBernsteinPoly,
params::AbstractVector{NamedTuple},
t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}Evaluate $f(t\, |\, \boldsymbol{\eta})$ for a given RandomBernsteinPoly when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.
The named tuple should contain fields named :w.
Distributions.cdf — Method
cdf(
rbp::RandomBernsteinPoly,
params::NamedTuple,
t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}
cdf(
rbp::RandomBernsteinPoly,
params::AbstractVector{NamedTuple},
t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}Evaluate $F(t\, |\, \boldsymbol{\eta})$ for a given RandomBernsteinPoly when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.
The named tuple should contain fields named :w.
Statistics.quantile — Method
quantile(
rbp::RandomBernsteinPoly,
params::NamedTuple,
p::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}
quantile(
rbp::RandomBernsteinPoly,
params::AbstractVector{NamedTuple},
p::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}Evaluate $Q(p\, |\, \boldsymbol{\eta})$ for a given RandomBernsteinPoly when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.
The named tuple should contain a field named :w.
quantile(
bdm::AbstractBayesDensityModel,
parameters::NamedTuple,
p::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}
quantile(
bdm::AbstractBayesDensityModel,
parameters::AbstractVector{<:NamedTuple},
p::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}Evaluate the quantile function $Q(p\,|\, \boldsymbol{\eta}) = \inf \{x\colon F(x) \geq p\}$ of the Bayesian density model bdm for every $\boldsymbol{\eta}$ in parameters and every element in the collection p.
If a single NamedTuple is passed to the parameters argument, this function outputs either a scalar or a vector depending on the input type of the third argument p.
If a vector of NamedTuples is passed to the second positional argument, then this function returns a Matrix of size (length(p), length(parameters)).
Utility functions
BayesDensityCore.hyperparams — Method
hyperparams(
rbp::RandomBernsteinPoly{T}
) where {T} -> @NamedTuple{prior_components::DiscreteNonParametric{Int, T}, prior_strength::T}Returns the hyperparameters of the random Bernstein polynomial rbp as a NamedTuple.
Markov chain Monte Carlo
StatsBase.sample — Method
sample(
[rng::Random.AbstractRNG],
rbp::RandomBernsteinPoly{T},
n_samples::Int;
n_burnin::Int = min(1000, div(n_samples, 5)),
initial_params::NamedTuple = _get_default_initparams_mcmc(rbp)
) where {T} -> PosteriorSamples{T}Generate n_samples posterior samples from a RandomBernsteinPoly using the telescope sampler.
Arguments
rng: Optional random seed used for random variate generation.rbp: TheRandomBernsteinPolyobject for which posterior samples are generated.n_samples: The total number of samples (including burn-in).
Keyword arguments
n_burnin: Number of burn-in samples.initial_params: Initial values used in the MCMC algorithm. Should be supplied as aNamedTuplewith a single fieldK, whereKis a positive integer. Defaults to the integer nearest to0.5*sqrt(n)which has positive prior probability, wherenis the sample size.
Returns
ps: APosteriorSamplesobject holding the posterior samples and the original model object.
Examples
julia> using Random
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> rbp = RandomBernsteinPoly(x);
julia> ps1 = sample(rbp, 5_000);
julia> ps2 = sample(rbp, 5_000; n_burnin=2_000, initial_params = (K = 5,));