API
The AutomaticHistogram type
AutoHist.AutomaticHistogram
— TypeAutomaticHistogram
A type for representing a histogram where the histogram partition has been chosen automatically based on the sample. Can be fitted to data using the fit
method.
Fields
breaks
: AbstractVector consisting of the cut points in the chosen partition.density
: Estimated density in each bin.counts
: The bin counts for the partition corresponding tobreaks
.type
: Symbol indicating whether the histogram was fit using an irregular procedure (type==:irregular
) or a regular one (type==:regular
).closed
: Symbol indicating whether the drawn intervals should be right-inclusive or not. Possible values are:right
(default) and:left
.a
: Value of the Dirichlet concentration parameter corresponding to the chosen partition. Only of relevance if a Bayesian method was used to fit the histogram, and is otherwise set toNaN
.
Examples
julia> x = LinRange(eps(), 1.0-eps(), 5000) .^(1.0/4.0);
julia> h = fit(AutomaticHistogram, x)
AutomaticHistogram
breaks: [0.0001220703125, 0.17763663029325183, 0.29718725232110504, 0.4022468898607337, 0.4928155429121377, 0.5797614498414855, 0.6667073567708333, 0.7572760098222373, 0.8405991706295289, 0.9202995853147645, 1.0]
density: [0.006626835974128547, 0.057821970706400425, 0.17596277991076312, 0.36279353706969375, 0.6214544825215076, 0.9730458529384184, 1.4481767793920146, 2.0440057561776532, 2.733509595364622, 3.545742066060377]
counts: [5, 34, 92, 164, 270, 423, 656, 852, 1090, 1414]
type: irregular
closed: right
a: 5.0
Fitting an automatic histogram to data
An automatic histogram based on regular or irregular partitions can be fitted to the data by calling the fit
method.
StatsAPI.fit
— Methodfit(AutomaticHistogram, x::AbstractVector{x<:Real}, rule::AbstractRule=RIH(); support::Tuple{Real,Real}=(-Inf,Inf), closed::Symbol=:right)
Fit a histogram to a one-dimensional vector x
with an automatic and data-based selection of the histogram partition.
Arguments
x
: 1D vector of data for which a histogram is to be constructed.
Keyword arguments
rule
: The criterion used to determine the optimal number of bins. Default value isrule=RIH()
, the random irregular histogram.closed
: Symbol indicating whether the drawn intervals should be right-inclusive or not. Possible values are:right
(default) and:left
.support
: Tuple specifying the the support of the histogram estimate. If the first element is-Inf
, thenminimum(x)
is taken as the leftmost cutpoint. Likewise, if the second element isInf
, then the rightmost cutpoint ismaximum(x)
. Default value is(-Inf, Inf)
, which estimates the support of the data.
Returns
h
: An object of typeAutomaticHistogram
, corresponding to the fitted histogram.
Examples
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> fit(AutomaticHistogram, x) == fit(AutomaticHistogram, x, RIH())
true
julia> h = fit(AutomaticHistogram, x, Wand(scalest=:stdev, level=4))
AutomaticHistogram
breaks: LinRange{Float64}(0.0, 1.0, 27)
density: [0.0052, 0.0312, 0.0884, 0.1612, 0.2652, 0.4004, 0.5408, 0.7176, 0.8944, 1.0868 … 2.0072, 1.9656, 1.8616, 1.69, 1.4508, 1.1596, 0.8372, 0.5044, 0.2184, 0.0364]
counts: [1, 6, 17, 31, 51, 77, 104, 138, 172, 209 … 386, 378, 358, 325, 279, 223, 161, 97, 42, 7]
type: regular
closed: right
a: NaN
Additional methods for AutomaticHist
AutoHist.peaks
— Methodpeaks(h::AutomaticHistogram)
Return the location of the modes/peaks of h
as a Vector, sorted in increasing order.
Formally, the modes/peaks of the histogram h
are defined as the midpoints of an interval $\mathcal{J}$, where the density of h
is constant on $\mathcal{J}$, and the density of h
is strictly smaller than this value in the histogram bins adjacent to $\mathcal{J}$. Note that according this definition, $\mathcal{J}$ is in general a nonempty union of intervals in the histogram partition.
Base.minimum
— Methodminimum(h::AutomaticHistogram)
Return the minimum of the support of h
.
Base.maximum
— Methodmaximum(h::AutomaticHistogram)
Return the maximum of the support of h
.
Base.extrema
— Methodextrema(h::AutomaticHistogram)
Return the minimum and the maximum of the support of h
as a 2-tuple.
Distributions.insupport
— Methodinsupport(h::AutomaticHistogram, x::Real)
Return true
if x
is in the support of h
, and false
otherwise.
Distributions.pdf
— Methodpdf(h::AutomaticHistogram, x::Real)
Evaluate the probability density function of h
at x
.
AutoHist.cdf
— Methodcdf(h::AutomaticHistogram, x::Real)
Evaluate the cumulative distribution function of h
at x
.
Base.length
— Methodlength(h::AutomaticHistogram)
Returns the number of bins of h
.
StatsAPI.loglikelihood
— Methodloglikelihood(h::AutomaticHistogram)
Compute the log-likelihood (up to proportionality) of an h
.
The value of the log-likelihood is $\sum_j N_j \log (d_j)$ where $N_j$, $d_j$ are the bin counts and estimated densities for bin j.
AutoHist.logmarginallikelihood
— Functionlogmarginallikelihood(h::AutomaticHistogram, a::Real)
logmarginallikelihood(h::AutomaticHistogram)
Compute the log-marginal likelihood (up to proportionality) of h
when the value of the Dirichlet concentration parameter equals a
. This can be automatically inferred if the histogram was fitted with rule=:bayes
, and does not have to be explicitly passed as an argument in this case.
Assumes that the Dirichlet prior is centered on the uniform distribution, so that $a_j = a/k$ for a scalar $a>0$ and all $j$. The value of the log-marginal likelihood is $\sum_j \{ \log \Gamma (a_j + N_j) - \log \Gamma (a_j) - N_j\log |\mathcal{I}_j| \} - \log \Gamma (a+n) + \log \Gamma (a)$ , where $N_j$ is the bin count for bin $j$ .
Base.convert
— Methodconvert(Histogram, h::AutomaticHistogram)
Convert an h
to a StatsBase.Histogram, normalized to be a probability density.
AutoHist.distance
— Functiondistance(h1::AutomaticHistogram, h2::AutomaticHistogram, dist::Symbol=:iae; p::Real=1.0)
Compute a statistical distance between two histogram probability densities.
Arguments
h1
,h2
: The two histograms for which the distance should be computeddist
: The name of the distance to compute. Valid options are:iae
(default),:ise
,:hellinger
,:sup
,:kl
,:lp
. For the $L_p$-metric, a given powerp
can be specified as a keyword argument.
Keyword arguments
p
: Power of the $L_p$-metric, which should be a number in the interval $[1, \infty]$. Ignored ifdist != :lp
. Defaults top=1.0
.