API
The AutomaticHistogram type
AutoHist.AutomaticHistogram — TypeAutomaticHistogramA type for representing a histogram where the histogram partition has been chosen automatically based on the sample. Can be fitted to data using the fit or autohist methods.
Fields
breaks: AbstractVector consisting of the cut points in the chosen partition.density: Estimated density in each bin.counts: The bin counts for the partition corresponding tobreaks.type: Symbol indicating whether the histogram was fit using an irregular procedure (type==:irregular) or a regular one (type==:regular).closed: Symbol indicating whether the drawn intervals should be right-inclusive or not. Possible values are:right(default) and:left.a: Value of the Dirichlet concentration parameter corresponding to the chosen partition. Only of relevance if a Bayesian method was used to fit the histogram, and is otherwise set toNaN.
Examples
julia> x = LinRange(eps(), 1.0-eps(), 5000) .^(1.0/4.0);
julia> h = fit(AutomaticHistogram, x)
AutomaticHistogram{Vector{Float64}, Vector{Float64}, Vector{Int64}}
breaks: [0.0001220703125, 0.17763663029325183, 0.29718725232110504, 0.4022468898607337, 0.4928155429121377, 0.5797614498414855, 0.6667073567708333, 0.7572760098222373, 0.8405991706295289, 0.9202995853147645, 1.0]
density: [0.006626835974128547, 0.057821970706400425, 0.17596277991076312, 0.36279353706969375, 0.6214544825215076, 0.9730458529384184, 1.4481767793920146, 2.0440057561776532, 2.733509595364622, 3.545742066060377]
counts: [5, 34, 92, 164, 270, 423, 656, 852, 1090, 1414]
type: irregular
closed: right
a: 5.0Fitting an automatic histogram to data
An automatic histogram based on regular or irregular partitions can be fitted to the data by calling the fit method.
StatsAPI.fit — Methodfit(
AutomaticHistogram,
x::AbstractVector{<:Real},
rule::AbstractRule = RIH();
support::Tuple{Real,Real} = (-Inf,Inf),
closed::Symbol = :right
)Fit a histogram to a one-dimensional vector x with an automatic and data-based selection of the histogram partition.
Arguments
x: 1D vector of data for which a histogram is to be constructed.rule: The criterion used to determine the optimal number of bins. Default value isrule=RIH(), the random irregular histogram.
Keyword arguments
closed: Symbol indicating whether the drawn intervals should be right-inclusive or not. Possible values are:right(default) and:left.support: Tuple specifying the the support of the histogram estimate. If the first element is-Inf, thenminimum(x)is taken as the leftmost cutpoint. Likewise, if the second element isInf, then the rightmost cutpoint ismaximum(x). Default value is(-Inf, Inf), which estimates the support of the data.
Returns
h: An object of typeAutomaticHistogram, corresponding to the fitted histogram.
Examples
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> fit(AutomaticHistogram, x) == fit(AutomaticHistogram, x, RIH())
true
julia> h = fit(AutomaticHistogram, x, Wand(scalest=:stdev, level=4))
AutomaticHistogram{LinRange{Float64, Int64}, Vector{Float64}, Vector{Int64}}
breaks: LinRange{Float64}(0.0, 1.0, 27)
density: [0.0052, 0.0312, 0.0884, 0.1612, 0.2652, 0.4004, 0.5408, 0.7176, 0.8944, 1.0868 … 2.0072, 1.9656, 1.8616, 1.69, 1.4508, 1.1596, 0.8372, 0.5044, 0.2184, 0.0364]
counts: [1, 6, 17, 31, 51, 77, 104, 138, 172, 209 … 386, 378, 358, 325, 279, 223, 161, 97, 42, 7]
type: regular
closed: right
a: NaNAutoHist.autohist — Methodautohist(
x::AbstractVector{<:Real},
rule::AbstractRule = RIH();
kwargs...
)Fit an automatic histogram to data based on the supplied rule. This is an alias for fit(AutomaticHistogram, x, rule; kwargs...). See fit for further details.
Examples
julia> x = (1.0 .- (1.0 .- LinRange(0.0, 1.0, 5000)) .^(1/3)).^(1/3);
julia> autohist(x, Sturges()) == fit(AutomaticHistogram, x, Sturges())
trueAdditional methods for AutomaticHist
AutoHist.peaks — Methodpeaks(h::AutomaticHistogram)Return the location of the modes/peaks of h as a Vector, sorted in increasing order.
Formally, the modes/peaks of the histogram h are defined as the midpoints of an interval $\mathcal{J}$, where the density of h is constant on $\mathcal{J}$, and the density of h is strictly smaller than this value in the histogram bins adjacent to $\mathcal{J}$. Note that according this definition, $\mathcal{J}$ is in general a nonempty union of intervals in the histogram partition.
Base.minimum — Methodminimum(h::AutomaticHistogram)Return the minimum of the support of h.
Base.maximum — Methodmaximum(h::AutomaticHistogram)Return the maximum of the support of h.
Base.extrema — Methodextrema(h::AutomaticHistogram)Return the minimum and the maximum of the support of h as a 2-tuple.
Distributions.insupport — Methodinsupport(h::AutomaticHistogram, x::Real)Return true if x is in the support of h, and false otherwise.
Distributions.pdf — Methodpdf(h::AutomaticHistogram, x::Real)Evaluate the probability density function of h at x.
Distributions.cdf — Methodcdf(h::AutomaticHistogram, x::Real)Evaluate the cumulative distribution function of h at x.
Statistics.quantile — Methodquantile(h::AutomaticHistogram, q::Real)Evaluate the quantile function of h at $q \in [0, 1]$.
Base.length — Methodlength(h::AutomaticHistogram)Returns the number of bins of h.
StatsAPI.loglikelihood — Methodloglikelihood(h::AutomaticHistogram)Compute the log-likelihood (up to proportionality) of an h.
The value of the log-likelihood is $\sum_j N_j \log (d_j)$ where $N_j$, $d_j$ are the bin counts and estimated densities for bin j.
AutoHist.logmarginallikelihood — Functionlogmarginallikelihood(h::AutomaticHistogram, a::Real)
logmarginallikelihood(h::AutomaticHistogram)Compute the log-marginal likelihood (up to proportionality) of h from a Bayesian model where then bin probabilities have been endowed with a symmetric Dirichlet prior with concentration parameter equal to a. The value of a can be automatically inferred if the histogram was fitted with the rule argument set to RIH or RRH, and does not have to be explicitly passed as an argument in this case.
Assumes that the Dirichlet prior is centered on the uniform distribution, so that $a_j = a/k$ for a scalar $a>0$ and all $j$. The value of the log-marginal likelihood is $\sum_j \{ \log \Gamma (a_j + N_j) - \log \Gamma (a_j) - N_j\log |\mathcal{I}_j| \} - \log \Gamma (a+n) + \log \Gamma (a)$ , where $N_j$ is the bin count for bin $j$ .
Base.convert — Methodconvert(Histogram, h::AutomaticHistogram)Convert an h to a StatsBase.Histogram, normalized to be a probability density.
AutoHist.distance — Functiondistance(h1::AutomaticHistogram, h2::AutomaticHistogram, dist::Symbol=:iae; p::Real=1.0)Compute a statistical distance between two histogram probability densities.
Arguments
h1,h2: The two histograms for which the distance should be computeddist: The name of the distance to compute. Valid options are:iae(default),:ise,:hellinger,:sup,:kl,:lp. For the $L_p$-metric, a given powerpcan be specified as a keyword argument.
Keyword arguments
p: Power of the $L_p$-metric, which should be a number in the interval $[1, \infty]$. Ignored ifdist != :lp. Defaults top=1.0.