API

The AutomaticHistogram type

AutoHist.AutomaticHistogram — Type

AutomaticHistogram

A type for representing a histogram where the histogram partition has been chosen automatically based on the sample. Can be fitted to data using the fit method.

Fields

breaks: AbstractVector consisting of the cut points in the chosen partition.
density: Estimated density in each bin.
counts: The bin counts for the partition corresponding to breaks.
type: Symbol indicating whether the histogram was fit using an irregular procedure (type==:irregular) or a regular one (type==:regular).
closed: Symbol indicating whether the drawn intervals should be right-inclusive or not. Possible values are :right (default) and :left.
a: Value of the Dirichlet concentration parameter corresponding to the chosen partition. Only of relevance if a Bayesian method was used to fit the histogram, and is otherwise set to NaN.

Examples

julia> x = LinRange(eps(), 1.0-eps(), 5000) .^(1.0/4.0);

julia> h = fit(AutomaticHistogram, x)
AutomaticHistogram
breaks: [0.0001220703125, 0.17763663029325183, 0.29718725232110504, 0.4022468898607337, 0.4928155429121377, 0.5797614498414855, 0.6667073567708333, 0.7572760098222373, 0.8405991706295289, 0.9202995853147645, 1.0]
density: [0.006626835974128547, 0.057821970706400425, 0.17596277991076312, 0.36279353706969375, 0.6214544825215076, 0.9730458529384184, 1.4481767793920146, 2.0440057561776532, 2.733509595364622, 3.545742066060377]
counts: [5, 34, 92, 164, 270, 423, 656, 852, 1090, 1414]
type: irregular
closed: right
a: 5.0

source

Fitting an automatic histogram to data

An automatic histogram based on regular or irregular partitions can be fitted to the data by calling the fit method.

StatsAPI.fit — Method

fit(AutomaticHistogram, x::AbstractVector{x<:Real}, rule::AbstractRule=RIH(); support::Tuple{Real,Real}=(-Inf,Inf), closed::Symbol=:right)

Fit a histogram to a one-dimensional vector x with an automatic and data-based selection of the histogram partition.

Arguments

x: 1D vector of data for which a histogram is to be constructed.

Keyword arguments

rule: The criterion used to determine the optimal number of bins. Default value is rule=RIH(), the random irregular histogram.
closed: Symbol indicating whether the drawn intervals should be right-inclusive or not. Possible values are :right (default) and :left.
support: Tuple specifying the the support of the histogram estimate. If the first element is -Inf, then minimum(x) is taken as the leftmost cutpoint. Likewise, if the second element is Inf, then the rightmost cutpoint is maximum(x). Default value is (-Inf, Inf), which estimates the support of the data.

Returns

h: An object of type AutomaticHistogram, corresponding to the fitted histogram.

Examples

julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);

julia> fit(AutomaticHistogram, x) == fit(AutomaticHistogram, x, RIH())
true

julia> h = fit(AutomaticHistogram, x, Wand(scalest=:stdev, level=4))
AutomaticHistogram
breaks: LinRange{Float64}(0.0, 1.0, 27)
density: [0.0052, 0.0312, 0.0884, 0.1612, 0.2652, 0.4004, 0.5408, 0.7176, 0.8944, 1.0868  …  2.0072, 1.9656, 1.8616, 1.69, 1.4508, 1.1596, 0.8372, 0.5044, 0.2184, 0.0364]
counts: [1, 6, 17, 31, 51, 77, 104, 138, 172, 209  …  386, 378, 358, 325, 279, 223, 161, 97, 42, 7]
type: regular
closed: right
a: NaN

source

Additional methods for AutomaticHist

AutoHist.peaks — Method

peaks(h::AutomaticHistogram)

Return the location of the modes/peaks of h as a Vector, sorted in increasing order.

Formally, the modes/peaks of the histogram h are defined as the midpoints of an interval $\mathcal{J}$, where the density of h is constant on $\mathcal{J}$, and the density of h is strictly smaller than this value in the histogram bins adjacent to $\mathcal{J}$. Note that according this definition, $\mathcal{J}$ is in general a nonempty union of intervals in the histogram partition.

source

Base.minimum — Method

minimum(h::AutomaticHistogram)

Return the minimum of the support of h.

source

Base.maximum — Method

maximum(h::AutomaticHistogram)

Return the maximum of the support of h.

source

Base.extrema — Method

extrema(h::AutomaticHistogram)

Return the minimum and the maximum of the support of h as a 2-tuple.

source

Distributions.insupport — Method

insupport(h::AutomaticHistogram, x::Real)

Return true if x is in the support of h, and false otherwise.

source

Distributions.pdf — Method

pdf(h::AutomaticHistogram, x::Real)

Evaluate the probability density function of h at x.

source

AutoHist.cdf — Method

cdf(h::AutomaticHistogram, x::Real)

Evaluate the cumulative distribution function of h at x.

source

Base.length — Method

length(h::AutomaticHistogram)

Returns the number of bins of h.

source

StatsAPI.loglikelihood — Method

loglikelihood(h::AutomaticHistogram)

Compute the log-likelihood (up to proportionality) of an h.

The value of the log-likelihood is $\sum_j N_j \log (d_j)$ where $N_j$, $d_j$ are the bin counts and estimated densities for bin j.

source

AutoHist.logmarginallikelihood — Function

logmarginallikelihood(h::AutomaticHistogram, a::Real)
logmarginallikelihood(h::AutomaticHistogram)

Compute the log-marginal likelihood (up to proportionality) of h when the value of the Dirichlet concentration parameter equals a. This can be automatically inferred if the histogram was fitted with rule=:bayes, and does not have to be explicitly passed as an argument in this case.

Assumes that the Dirichlet prior is centered on the uniform distribution, so that $a_j = a/k$ for a scalar $a>0$ and all $j$. The value of the log-marginal likelihood is $\sum_j \{ \log \Gamma (a_j + N_j) - \log \Gamma (a_j) - N_j\log |\mathcal{I}_j| \} - \log \Gamma (a+n) + \log \Gamma (a)$ , where $N_j$ is the bin count for bin $j$ .

source

Base.convert — Method

convert(Histogram, h::AutomaticHistogram)

Convert an h to a StatsBase.Histogram, normalized to be a probability density.

source

AutoHist.distance — Function

distance(h1::AutomaticHistogram, h2::AutomaticHistogram, dist::Symbol=:iae; p::Real=1.0)

Compute a statistical distance between two histogram probability densities.

Arguments

h1, h2: The two histograms for which the distance should be computed
dist: The name of the distance to compute. Valid options are :iae (default), :ise, :hellinger, :sup, :kl, :lp. For the $L_p$-metric, a given power p can be specified as a keyword argument.

Keyword arguments

p: Power of the $L_p$-metric, which should be a number in the interval $[1, \infty]$. Ignored if dist != :lp. Defaults to p=1.0.

source