API

The AutomaticHistogram type

AutoHist.AutomaticHistogramType
AutomaticHistogram

A type for representing a histogram where the histogram partition has been chosen automatically based on the sample. Can be fitted to data using the fit or autohist methods.

Fields

  • breaks: AbstractVector consisting of the cut points in the chosen partition.
  • density: Estimated density in each bin.
  • counts: The bin counts for the partition corresponding to breaks.
  • type: Symbol indicating whether the histogram was fit using an irregular procedure (type==:irregular) or a regular one (type==:regular).
  • closed: Symbol indicating whether the drawn intervals should be right-inclusive or not. Possible values are :right (default) and :left.
  • a: Value of the Dirichlet concentration parameter corresponding to the chosen partition. Only of relevance if a Bayesian method was used to fit the histogram, and is otherwise set to NaN.

Examples

julia> x = LinRange(eps(), 1.0-eps(), 5000) .^(1.0/4.0);

julia> h = fit(AutomaticHistogram, x)
AutomaticHistogram{Vector{Float64}, Vector{Float64}, Vector{Int64}}
breaks: [0.0001220703125, 0.17763663029325183, 0.29718725232110504, 0.4022468898607337, 0.4928155429121377, 0.5797614498414855, 0.6667073567708333, 0.7572760098222373, 0.8405991706295289, 0.9202995853147645, 1.0]
density: [0.006626835974128547, 0.057821970706400425, 0.17596277991076312, 0.36279353706969375, 0.6214544825215076, 0.9730458529384184, 1.4481767793920146, 2.0440057561776532, 2.733509595364622, 3.545742066060377]
counts: [5, 34, 92, 164, 270, 423, 656, 852, 1090, 1414]
type: irregular
closed: right
a: 5.0
source

Fitting an automatic histogram to data

An automatic histogram based on regular or irregular partitions can be fitted to the data by calling the fit method.

StatsAPI.fitMethod
fit(
    AutomaticHistogram,
    x::AbstractVector{<:Real},
    rule::AbstractRule        = RIH();
    support::Tuple{Real,Real} = (-Inf,Inf),
    closed::Symbol            = :right
)

Fit a histogram to a one-dimensional vector x with an automatic and data-based selection of the histogram partition.

Arguments

  • x: 1D vector of data for which a histogram is to be constructed.
  • rule: The criterion used to determine the optimal number of bins. Default value is rule=RIH(), the random irregular histogram.

Keyword arguments

  • closed: Symbol indicating whether the drawn intervals should be right-inclusive or not. Possible values are :right (default) and :left.
  • support: Tuple specifying the the support of the histogram estimate. If the first element is -Inf, then minimum(x) is taken as the leftmost cutpoint. Likewise, if the second element is Inf, then the rightmost cutpoint is maximum(x). Default value is (-Inf, Inf), which estimates the support of the data.

Returns

Examples

julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);

julia> fit(AutomaticHistogram, x) == fit(AutomaticHistogram, x, RIH())
true

julia> h = fit(AutomaticHistogram, x, Wand(scalest=:stdev, level=4))
AutomaticHistogram{LinRange{Float64, Int64}, Vector{Float64}, Vector{Int64}}
breaks: LinRange{Float64}(0.0, 1.0, 27)
density: [0.0052, 0.0312, 0.0884, 0.1612, 0.2652, 0.4004, 0.5408, 0.7176, 0.8944, 1.0868  …  2.0072, 1.9656, 1.8616, 1.69, 1.4508, 1.1596, 0.8372, 0.5044, 0.2184, 0.0364]
counts: [1, 6, 17, 31, 51, 77, 104, 138, 172, 209  …  386, 378, 358, 325, 279, 223, 161, 97, 42, 7]
type: regular
closed: right
a: NaN
source
AutoHist.autohistMethod
autohist(
    x::AbstractVector{<:Real},
    rule::AbstractRule        = RIH();
    kwargs...
)

Fit an automatic histogram to data based on the supplied rule. This is an alias for fit(AutomaticHistogram, x, rule; kwargs...). See fit for further details.

Examples

julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);

julia> autohist(x, Sturges()) == fit(AutomaticHistogram, x, Sturges())
true
source

Additional methods for AutomaticHist

AutoHist.peaksMethod
peaks(h::AutomaticHistogram)

Return the location of the modes/peaks of h as a Vector, sorted in increasing order.

Formally, the modes/peaks of the histogram h are defined as the midpoints of an interval $\mathcal{J}$, where the density of h is constant on $\mathcal{J}$, and the density of h is strictly smaller than this value in the histogram bins adjacent to $\mathcal{J}$. Note that according this definition, $\mathcal{J}$ is in general a nonempty union of intervals in the histogram partition.

source
Base.minimumMethod
minimum(h::AutomaticHistogram)

Return the minimum of the support of h.

source
Base.maximumMethod
maximum(h::AutomaticHistogram)

Return the maximum of the support of h.

source
Base.extremaMethod
extrema(h::AutomaticHistogram)

Return the minimum and the maximum of the support of h as a 2-tuple.

source
Distributions.pdfMethod
pdf(h::AutomaticHistogram, x::Real)

Evaluate the probability density function of h at x.

source
Distributions.cdfMethod
cdf(h::AutomaticHistogram, x::Real)

Evaluate the cumulative distribution function of h at x.

source
Statistics.quantileMethod
quantile(h::AutomaticHistogram, q::Real)

Evaluate the quantile function of h at $q \in [0, 1]$.

source
Base.lengthMethod
length(h::AutomaticHistogram)

Returns the number of bins of h.

source
StatsAPI.loglikelihoodMethod
loglikelihood(h::AutomaticHistogram)

Compute the log-likelihood (up to proportionality) of an h.

The value of the log-likelihood is $\sum_j N_j \log (d_j)$ where $N_j$, $d_j$ are the bin counts and estimated densities for bin j.

source
AutoHist.logmarginallikelihoodFunction
logmarginallikelihood(h::AutomaticHistogram, a::Real)
logmarginallikelihood(h::AutomaticHistogram)

Compute the log-marginal likelihood (up to proportionality) of h from a Bayesian model where then bin probabilities have been endowed with a symmetric Dirichlet prior with concentration parameter equal to a. The value of a can be automatically inferred if the histogram was fitted with the rule argument set to RIH or RRH, and does not have to be explicitly passed as an argument in this case.

Assumes that the Dirichlet prior is centered on the uniform distribution, so that $a_j = a/k$ for a scalar $a>0$ and all $j$. The value of the log-marginal likelihood is $\sum_j \{ \log \Gamma (a_j + N_j) - \log \Gamma (a_j) - N_j\log |\mathcal{I}_j| \} - \log \Gamma (a+n) + \log \Gamma (a)$ , where $N_j$ is the bin count for bin $j$ .

source
Base.convertMethod
convert(Histogram, h::AutomaticHistogram)

Convert an h to a StatsBase.Histogram, normalized to be a probability density.

source
AutoHist.distanceFunction
distance(h1::AutomaticHistogram, h2::AutomaticHistogram, dist::Symbol=:iae; p::Real=1.0)

Compute a statistical distance between two histogram probability densities.

Arguments

  • h1, h2: The two histograms for which the distance should be computed
  • dist: The name of the distance to compute. Valid options are :iae (default), :ise, :hellinger, :sup, :kl, :lp. For the $L_p$-metric, a given power p can be specified as a keyword argument.

Keyword arguments

  • p: Power of the $L_p$-metric, which should be a number in the interval $[1, \infty]$. Ignored if dist != :lp. Defaults to p=1.0.
source

Index