RandomBernsteinPoly

Documentation for random Bernstein polynomials (Petrone, 1999).

RandomBernsteinPoly models the data-generating density as a mixture of fixed $\text{Beta}$ distributions with an unknown number of mixture components. The smoothness of the density estimates is primarily controlled through the number of basis densities $K$, which is selected in a data-driven manner. Mathematically, the Bernstein density model and prior distribution takes the following form,

\[\begin{align*} x_i \,|\, K, \boldsymbol{w} &\sim \sum_{k=1}^K w_{k}b_{K,k}(x_i), &i = 1,\ldots, n,\\ \boldsymbol{w} \,|\, K &\sim \text{Dirichlet}_K(a/K, \ldots, a/K),\\ K &\sim p(K), \end{align*}\]

where $b_{K,k}(\cdot)$ denotes the probability density function of the $\text{Beta}(k, K-k+1)$-distribution, $a> 0$ a fixed hyperparameter and $p(K)$ is a probability mass function supported on a finite subset of the positive integers.

Although this model is formulated in terms of data on the unit interval, it can be also be applied more broadly to data on the entire real line. By default, we estimate the support of the Bernstein density based on the extrema of the data set, but it is also possible to use a prespecified support via the bounds keyword argument.

This module implements the Gibbs sampler of Petrone (1999) for Markov chain Monte Carlo-based posterior inference. We refer the interested reader to the same paper for a discussion of the role of the prior_strength hyperparameter. The default value of prior_strength = 1.0 is based on the discussion in Petrone and Wasserman (2002).

Note

Keep in mind that this sampler is quite slow even for moderate sample sizes, due to the expensive sweeps involved in the Gibbs sampler.

Module API

BayesDensityRandomBernsteinPoly.RandomBernsteinPoly — Type

RandomBernsteinPoly{T<:Real} <: AbstractBayesDensityModel{T}

Struct representing a random Bernstein polyomial.

Constructors

RandomBernsteinPoly(x::AbstractVector{<:Real}; kwargs...)
RandomBernsteinPoly{T}(x::AbstractVector{<:Real}; kwargs...)

Arguments

x: The data vector.

Keyword arguments

prior_components: A Distributions.DiscreteNonParametric distribution instance specifying the prior on the number of components K. Defaults to DiscreteNonParametric(1:50, fill(T(1/50), 50)), corresponding to a uniform prior on the set {1, …, 50}.
prior_strength: Strength parameter of the symmetric Dirichlet prior on the mixture weights. E.g. the prior on the mixture weights is Dirichlet(strength/K, ..., strength/K) conditional on the number of mixture components being equal to K. Defaults to 1.0.
bounds: A tuple giving the support of the B-spline mixture model.

Returns

rbp: A random Bernstein polynomial model object.

Examples

julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);

julia> rbp = RandomBernsteinPoly(x)
RandomBernsteinPoly{Float64} with 100 values for the number mixture components.
Using 5000 observations.
Hyperparameters:
 prior_strength = 1.0

julia> fgm = RandomBernsteinPoly(x; prior_strength = 0.5, prior_components = DiscreteNonParametric(1:25, fill(1/25, 25)));

source

Evaluating the pdf and cdf

Distributions.pdf — Method

pdf(
    rbp::RandomBernsteinPoly,
    params::NamedTuple,
    t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}

pdf(
    rbp::RandomBernsteinPoly,
    params::AbstractVector{NamedTuple},
    t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}

Evaluate $f(t\, |\, \boldsymbol{\eta})$ for a given RandomBernsteinPoly when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.

The named tuple should contain fields named :w.

source

Distributions.cdf — Method

cdf(
    rbp::RandomBernsteinPoly,
    params::NamedTuple,
    t::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}

cdf(
    rbp::RandomBernsteinPoly,
    params::AbstractVector{NamedTuple},
    t::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}

Evaluate $F(t\, |\, \boldsymbol{\eta})$ for a given RandomBernsteinPoly when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.

The named tuple should contain fields named :w.

source

Statistics.quantile — Method

quantile(
    rbp::RandomBernsteinPoly,
    params::NamedTuple,
    p::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}

quantile(
    rbp::RandomBernsteinPoly,
    params::AbstractVector{NamedTuple},
    p::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}

Evaluate $Q(p\, |\, \boldsymbol{\eta})$ for a given RandomBernsteinPoly when the model parameters of the NamedTuple params are given by $\boldsymbol{\eta}$.

The named tuple should contain a field named :w.

source

quantile(
    bdm::AbstractBayesDensityModel,
    parameters::NamedTuple,
    p::Union{Real, AbstractVector{<:Real}}
) -> Union{Real, Vector{<:Real}}

quantile(
    bdm::AbstractBayesDensityModel,
    parameters::AbstractVector{<:NamedTuple},
    p::Union{Real, AbstractVector{<:Real}}
) -> Matrix{<:Real}

Evaluate the quantile function $Q(p\,|\, \boldsymbol{\eta}) = \inf \{x\colon F(x) \geq p\}$ of the Bayesian density model bdm for every $\boldsymbol{\eta}$ in parameters and every element in the collection p.

If a single NamedTuple is passed to the parameters argument, this function outputs either a scalar or a vector depending on the input type of the third argument p.

If a vector of NamedTuples is passed to the second positional argument, then this function returns a Matrix of size (length(p), length(parameters)).

source

Utility functions

BayesDensityCore.hyperparams — Method

hyperparams(
    rbp::RandomBernsteinPoly{T}
) where {T} -> @NamedTuple{prior_components::DiscreteNonParametric{Int, T}, prior_strength::T}

Returns the hyperparameters of the random Bernstein polynomial rbp as a NamedTuple.

source

Markov chain Monte Carlo

StatsBase.sample — Method

sample(
    [rng::Random.AbstractRNG],
    rbp::RandomBernsteinPoly{T},
    n_samples::Int;
    n_burnin::Int              = min(1000, div(n_samples, 5)),
    initial_params::NamedTuple = _get_default_initparams_mcmc(rbp)
) where {T} -> PosteriorSamples{T}

Generate n_samples posterior samples from a RandomBernsteinPoly using the telescope sampler.

Arguments

rng: Optional random seed used for random variate generation.
rbp: The RandomBernsteinPoly object for which posterior samples are generated.
n_samples: The total number of samples (including burn-in).

Keyword arguments

n_burnin: Number of burn-in samples.
initial_params: Initial values used in the MCMC algorithm. Should be supplied as a NamedTuple with a single field K, where K is a positive integer. Defaults to the integer nearest to 0.5*sqrt(n) which has positive prior probability, where n is the sample size.

Returns

ps: A PosteriorSamples object holding the posterior samples and the original model object.

Examples

julia> using Random

julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);

julia> rbp = RandomBernsteinPoly(x);

julia> ps1 = sample(rbp, 5_000);

julia> ps2 = sample(rbp, 5_000; n_burnin=2_000, initial_params = (K = 5,));

source