AutoHist.jl Documentation

Fast automatic histogram construction. Supports a plethora of regular and irregular histogram procedures.

Introduction

Despite being the oldest nonparametric density estimator, the histogram remains widespread in use even to this day. Regrettably, the quality of a histogram density estimate is rather sensitive to the choice of partition used to draw the histogram, which has lead to the development of automatic histogram methods that select the partition based on the sample itself. Unfortunately, most default histogram plotting software only support a few regular automatic histogram procedures, where all the bins are of equal length, and use very simple plug-in rules by default to compute the the number of bins, frequently leading to poor density estimates for non-normal data. Moreover, fast and fully automatic irregular histogram methods are rarely supported by default plotting software, which has prevented their adaptation by practitioners.

The AutoHist.jl package makes it easy to construct both regular and irregular histograms automatically based on a given one-dimensional sample. It currently supports 7 different methods for irregular histograms and 12 criteria for regular histograms from the statistical literature. In addition, the package provides a number of convenience functions for automatic histograms, such as methods for evaluating the histogram probability density function or identifying the location of modes.

Quick Start

The main function exported by this package is fit, which allows the user to fit a histogram to 1-dimensional data with an automatic and data-based choice of bins. The following short example shows how the fit method is used in practice

using AutoHist, Random, Distributions
x = rand(Xoshiro(1812), Normal(), 10^6)    # simulate some data
h_irr = fit(AutomaticHistogram, x)         # compute an automatic irregular histogram

The third argument to fit controls the rule used to select the histogram partition, with the default being the RIH method. To fit an automatic histogram with a specific rule, e.g. Knuth's rule, all we have to do is change the value of the rule argument.

h_reg = fit(AutomaticHistogram, x, Knuth()) # compute an automatic regular histogram

The above calls to fit returns an object of type AutomaticHistogram, with weights normalized so that the resulting histograms are probability densities. This type represents the histogram in a similar fashion to StatsBase.Histogram, but has more fields to enable the use of several convenience functions.

h_irr

AutomaticHistogram
breaks: [-4.52658035430858, -3.868824574172096, -3.6067046536424856, -3.4017420582028075, -3.26956574393234, -3.066835856503987, -2.8797349251549127, -2.740413945248203, -2.671199996897114, -2.5676023451716117  …  2.601563242649118, 2.711859018408597, 2.8288529182020516, 2.9507587756204225, 3.042746345687032, 3.216004487365888, 3.347734260034091, 3.6228038870293897, 4.017100121829467, 4.831592004360999]
density: [5.222494909128692e-5, 0.0003591469804469288, 0.0008933756620966539, 0.0014606978301090485, 0.002580308745480233, 0.0048535056430419775, 0.007694957920247944, 0.010099604226745735, 0.012838598935761112, 0.01671234281018688  …  0.015909154110459872, 0.011859496711650154, 0.008556493162692037, 0.006062558877091009, 0.004457635936285105, 0.002978734419350727, 0.002103313406332847, 0.0010075470160038863, 0.00030994477661629855, 3.368362558765091e-5]
counts: [34, 94, 183, 193, 523, 908, 1072, 699, 1330, 1694  …  1641, 1308, 1001, 739, 410, 516, 277, 277, 122, 27]
type: irregular
closed: right
a: 5.0

AutomaticHistogram objects are compatible with Plots.jl, which allows us to easily plot the two histograms resulting from the above code snippet via e.g. Plots.plot(h_irr). To show both histograms side by side, we can create a plot as follows:

import Plots; Plots.gr()
p_irr = Plots.plot(h_irr, xlabel="x", ylabel="Density", title="Irregular", alpha=0.4, color="black", label="")
p_reg = Plots.plot(h_reg, xlabel="x", title="Regular", alpha=0.4, color="red", label="")
Plots.plot(p_irr, p_reg, layout=(1, 2), size=(670, 320))

Alternatively, Makie.jl can also be used to make graphical displays of the fitted histograms via e.g. Makie.plot(h_irr). To produce a plot similar to the above display, we may for instance do the following:

import CairoMakie, Makie # using the CairoMakie backend
fig = Makie.Figure(size=(670, 320))
ax1 = Makie.Axis(fig[1, 1], title="Irregular", xlabel="x", ylabel="Density")
ax2 = Makie.Axis(fig[1, 2], title="Regular", xlabel="x")
p_irr = Makie.plot!(ax1, h_irr, alpha=0.4, color="black")
p_reg = Makie.plot!(ax2, h_reg, alpha=0.4, color="red")
fig

Supported methods

Both the regular and the irregular procedure support a large number of criteria to select the histogram partition. The keyword argument rule controls the criterion used to choose the best partition, and includes the following options:

Regular Histograms:
- Knuth's rule, Random regular histogram, RRH
- L2 cross-validation, L2CV_R
- Kullback-Leibler cross-validation, KLCV_R
- Akaike's information criterion, AIC
- The Bayesian information criterion, BIC
- Birgé and Rozenholc's criterion, BR
- Normalized Maximum Likelihood, NML_R
- Minimum Description Length. MDL
- Sturges' rule, Sturges
- Freedman and Diaconis' rule, FD
- Scott's rule, Scott
- Wand's rule, Wand
Irregular Histograms:
- Random irregular histogram, RIH
- L2 cross-validation, L2CV_I
- Kullback-Leibler cross-validation, KLCV_I
- Rozenholc et al. penalty A, RMG_penA
- Rozenholc et al. penalty B, RMG_penB
- Rozenholc et al. penalty R, RMG_penR
- Normalized Maximum Likelihood, NML_I

A description of each method along with references for each method can be found on the methods page.