AutoHist.jl Documentation
Fast automatic histogram construction. Supports a plethora of regular and irregular histogram procedures.
Introduction
Despite being the oldest nonparametric density estimator, the histogram remains widespread in use even to this day. Regrettably, the quality of a histogram density estimate is rather sensitive to the choice of partition used to draw the histogram, which has led to the development of automatic histogram methods that select the partition based on the sample itself. Unfortunately, most default histogram plotting software only support a few regular automatic histogram procedures, where all the bins are of equal length, and use very simple plug-in rules by default to compute the number of bins, frequently leading to poor density estimates for non-normal data. Moreover, fast and fully automatic irregular histogram methods are rarely supported by default plotting software, which has prevented their adoption by practitioners.
The AutoHist.jl package makes it easy to construct both regular and irregular histograms automatically based on a given one-dimensional sample. It currently supports 8 different methods for irregular histograms and 12 criteria for regular histograms from the statistical literature. In addition, the package provides a number of convenience functions for automatic histograms, such as methods for evaluating the histogram probability density function or identifying the location of modes.
Quick Start
The main functions exported by this package are fit and autohist, which lets the user to fit a histogram to 1-dimensional data with an automatic and data-based choice of bins. The following short example shows how the fit method is used in practice
using AutoHist, Random, Distributions
x = rand(Xoshiro(1812), Normal(), 10^6) # simulate some data
h_irr = fit(AutomaticHistogram, x) # compute an automatic irregular histogram (default method)The third argument to fit controls the rule used to select the histogram partition, with the default being the RIH method. To fit an automatic histogram with a specific rule, e.g. Knuth's rule, all we have to do is change the value of the rule argument.
h_reg = fit(AutomaticHistogram, x, Knuth()) # compute an automatic regular histogramThe above calls to fit return an object of type AutomaticHistogram, with weights normalized so that the resulting histograms are probability densities. This type represents the histogram in a similar fashion to StatsBase.Histogram, but has more fields to enable the use of several convenience functions.
h_irrAutomaticHistogram{Vector{Float64}, Vector{Float64}, Vector{Int64}}
breaks: [-4.52658035430858, -3.868824574172096, -3.6067046536424856, -3.4017420582028075, -3.26956574393234, -3.066835856503987, -2.8797349251549127, -2.740413945248203, -2.671199996897114, -2.5676023451716117 … 2.601563242649118, 2.711859018408597, 2.8288529182020516, 2.9507587756204225, 3.042746345687032, 3.216004487365888, 3.347734260034091, 3.6228038870293897, 4.017100121829467, 4.831592004360999]
density: [5.222494909128692e-5, 0.0003591469804469288, 0.0008933756620966539, 0.0014606978301090485, 0.002580308745480233, 0.0048535056430419775, 0.007694957920247944, 0.010099604226745735, 0.012838598935761112, 0.01671234281018688 … 0.015909154110459872, 0.011859496711650154, 0.008556493162692037, 0.006062558877091009, 0.004457635936285105, 0.002978734419350727, 0.002103313406332847, 0.0010075470160038863, 0.00030994477661629855, 3.368362558765091e-5]
counts: [34, 94, 183, 193, 523, 908, 1072, 699, 1330, 1694 … 1641, 1308, 1001, 739, 410, 516, 277, 277, 122, 27]
type: irregular
closed: right
a: 5.0
Alternatively, an automatic histogram can be fitted to data through the autohist method, which serves as an alias for fit(AutomaticHistogram, x, rule; kwargs...).
h_reg = autohist(x, Knuth()) # equivalent to fit(AutomaticHistogram, x, Knuth())Plotting
AutomaticHistogram objects are compatible with Plots.jl and Makie.jl, which lets us easily plot the two histograms resulting from the above code snippet via e.g. Plots.plot(h_irr). As an example, the irregular histogram fitted above can be displayed via Plots.jl as follows:
import Plots; Plots.gr()
Plots.plot(h_irr, label="") # Plot the irregular histogram.Alternatively, Makie.jl can also be used to make graphical displays of the fitted histograms via e.g. Makie.plot(h_irr). To produce a plot similar to the above display, we may for instance do the following:
import CairoMakie, Makie # using the CairoMakie backend
Makie.plot(h_reg) # Plot the regular histogram
AutoHist.jl also makes it possible to draw automatic histograms directly using Makie and Plots functions, for instance via Makie.hist(x, RIH()) or Plots.histogram(x, Knuth()). For a more detailed account of the plotting capabilities offered by AutoHist, please consult the plotting tutorial.
Supported methods
Both the regular and the irregular procedure support a large number of criteria to select the histogram partition. The keyword argument rule controls the criterion used to choose the best partition, and includes the following options:
- Regular Histograms:
- Knuth's rule, Random regular histogram,
RRH - L2 cross-validation,
L2CV_R - Kullback-Leibler cross-validation,
KLCV_R - Akaike's information criterion,
AIC - The Bayesian information criterion,
BIC - Birgé and Rozenholc's criterion,
BR - Normalized Maximum Likelihood,
NML_R - Minimum Description Length.
MDL - Sturges' rule,
Sturges - Freedman and Diaconis' rule,
FD - Scott's rule,
Scott - Wand's rule,
Wand
- Knuth's rule, Random regular histogram,
- Irregular Histograms:
A description of each method along with references for each method can be found on the methods page.