How to make a 2D histogram with D3 library (tutorial)

Before you get started with this article, I suggest you to read the article about Hexagonal Binning. In that article this method of data aggregation is shown and explained in detail. Two scatterplots, generated from two different datasets are compared, and so a class of specific datasets are highlighted. In fact, analyzing “sparse” datasets can be difficult, especially when we are looking for trends or clusters. As solution to this analysis the binning technique is shown, both the simple rectangular binning and more complex hexagonal binning.

The following list shows the two links to the two datasets contained in CSV files:

scatterplot01 : the dataset producing the “sparse” scatterplot;

scatterplot02 : the dataset producing a scatterplot with a good linear trend;

In this article we will show you how to use the rectangular binning and apply it to a generic dataset, using the D3 JavaScript library.

hexbins-scatterplot
Fig.1: the “sparse” scatterplot

Generally we refer to the method with the term rectangular binning, whereas for the type of chart produced by this technique, we prefer to use the term 2D histogram. This kind of chart combined with a particular color scale produces a  heatmap..

libro

In this article we see hot to develop a histogram 2D using the D3 library. Thus we will mainly use the JavaScript programming language.

For those who are not familiar with the development of charts using JavaScript libraries, I suggest to read the book  Beginning JavaScript Charts with jqPlot, D3 and Highcharts (Apress 2013). This book contains many examples (over 250 examples) and it is explained, step by step, how  to achieve the most common types of charts using various JavaScript libraries.

In the D3 framework, we can find a specific plugin for the hexagonal binning (d3.hexbin.js). This plugin provides the d3.hexbin layout, an essential tool for managing the tessellation of the XY plane into bins and for counting the samples within each hexagonal bin. .Strange to say, there is no layout that handles the binning rectangular, although theoretically much easier.

In fact, giving a look on the internet, you can find several examples of 2D histograms, but they use data that have already been grouped into rectangular bins with the count already entered. So it is up to the user to implement the method of rectangular binning (differently from the hexagonal binning). So first we will see how to implement a plugin that performs the same work performed by d3.hexbin. We’ll call him d3.bin.

In order to implement this plugin, i preferred to start from the d3.hexbin code and then I modified it to achieve a plugin which performs the rectangular binning instead of the hexagonal binning. Download the d3.bin.zip file. This zip file contains the  d3.bin.js. Once you extracted the file, place it in the same directory of the HTML page in which you want to perform a rectangular binning. (otherwise you need to change the plugin path in the web page).

To better understand how to use the plugin and how the rectangular binning works, let’s follow, step by step, this small tutorial. For example, let’s consider an XY plane of size 100×100. This is the area in which we want to visualize the dataset. For the sake of clarity we consider a dataset with only three samples.

hexbin-tutorial1
Fig.2: three points on XY plane

Now, we want to apply the rectangular binning method on this XY plane. For istance, we want to tile it with 20×20 squares (using the side() function). Moreover we want to apply this method on the whole 100×100 area (using the size() function). Thus we have to define:

hexbin-tutorial2
Fig.2. The rectangular binning applied on  XY plane

Once we have configured the binning parameters, we need to apply them to the dataset. To this purpose, let’s pass the points array as argument of the binning() function.

hexbin-tutorial3
Fig.4: il risultato del rectangular binning.

Fig.4 shows the result of the rectangular binning. We can see that 16 bins with 20×20 size are been created. Each bin is indexed by two integer values: i and j. In addition we can notice that only two bins are occupied: the (i=0,,j=0) bin has a the count = 2, and the (i=1,j=1) bin has count = 1. If we now analyze the content of the bins variable we achieve the following data structure:

[     [[0,0],[10,10]]  i=0, j=0, x=0, y=0,

[[30,30]]          i=1, j=1, x=20, y=20  ]

Indeed, we find only two bins stored in the bins variable ( bins with count = 0 are not considered in binning method). Each bin is represented by an array containing the points enclosed in the area covered by the bin, the i and j indexes, and the x and y values that are the coordinate of the bottom left vertex of the square.

Thus, using this plugin we can apply the rectangular binning to any dataset, to thereby produce a data structure useful for the visualization of 2D histograms. In this article we will refer particularly to the dataset in the scatterplot01.csv file. This dataset produces the following 2D histogram (see Fig.5).

hexbins-rectangBinning
Fig.5: rectangular Binning

As you can see in Fig.5, unlike the corresponding scatterplot, the linear trend is evident. This is due to the fact that a scatterplot does not take account of the overlapping points, or of their density. The scatterplot is indeed a fast way to see how a set of data is distributed in space, but as we have just experienced, it is certainly not the proper way to display the density of points in the XY plane. But now let’s look at the Web page code producing the 2D histogram in Figure 5. Then we will pass to analyze some of its parts.

An important point to keep in mind it is the size of the squares with which to perform the binning. In fact, depending on the distribution of the data and the dataset that we are analyzing, we will need to adjust the size of the bins. Considering the fact that dataset covers a range of 0-100 for both the x-axis and the y-axis, and it does not contain so many elements (only 3) I choose to use squares with side 10 (it is the numerical value not pixels!)

In Fig.6 we can see how the chart varies with the size of the squares.

hexbins-rectangBinning02
Fig.6: Tre Rectangular binning applicati con bin quadrate di lato 5,10 e 15.

Another parameter to adjust is the gradation of color to apply depending on the dataset points contained in each bin. For this example I used yellow for the lowest values ​​and dark red for the highest values​​. In defining the color scale, you can adjust the gradient color by defining a range within the domain() function. In this example I set yellow when the count = 1 (don’t forget that bin containing no samples are not represented) and the dark red when count = 3. If the count is greater then the color tends to an even darker gradation (black).

In Fig.7 we can notice as the appearance of the charts changes as adjusting the color range.

hexbins-rectangBinning03
Fig.7. Three different color gradient  [1-10], [0-6], [0-2]

Furthermore it is possible to add black borders to each bin modifying the CSS styles.

hexbins-rectangBinning04
Fig.8: modificando gli stili CSS si possono ottenere alcuni diversi effetti grafici

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.