Visualizing Web Analytics in R Part 5: Interactive Heatmap

This article is the fifth in a series about visualizing Google Analytics and other web analytics data using R. This article focuses on using heatmaps to determine which articles are important in which geographical areas. The series hopes to show how R and interactive visualizations can help to answer the following business questions:

  • Which articles need work to improve search engine ranking
  • Which articles are well ranked but do not get clicked, and need work on titles or meta data
  • Where to focus efforts for new content
  • How to use passive web search data to focus new product development

The other articles in the series are:

Heatmap of Region and Page

In looking at large amounts of data with only three dimensions, heatmaps can frequently be faster to understand the general trends than scatter plots or geographic maps, though they are not as good at identifying outliers as the scatter plots shown in articles 1 through 3, nor are they good at displaying geographic relations as are the maps shown in article 4. Heatmaps are best used to get the overall sense of a dataset. Figures 1, 2 and 3 show heatmaps of ISO region vs. URL with scaling by column (ISO Region), row (URL) and none. It is important to know how scaling is done when reading a heatmap as these three interactive graphs show.

Figure 1 is scaled by column, and shows that the social-buttons.com article is the most important article in all geographies, followed effective-yield-loan-amortization in Saudi Arabia, Nigeria the Philippines and Egypt and and sales-and-lead-management-with-suite-crm in Lithuania, Slovakia and other smaller countries.

Figure 2 is scaled by row and is visually very different; it shows that the overwhelming sources of traffic for all pages are the US, the UK and other English-speaking countries. This isn’t a surprise since the site is English-only.

Figure 3 is scaled across all cells, and shows that from the perspective of all traffic, only three articles and three countries are important: social-buttons.com, effective-yield-loan-amortization and stopping-rachel-from-cardholder-services are the main articles and are largely accessed in the US, the UK and by users whose country is not set.

Figure 1. Interactive heatmap of URL vs. ISO Region showing column (ISO Region) oriented color scaling. This scaling clearly shows which pages are important in each region but not the relative importance of a region.
Figure 2. Interactive heatmap of URL vs. ISO Region showing row (URL) oriented color scaling. This scaling clearly shows which regions are important to the site, but does not give information on which URLs are important.
Figure 3. Interactive heatmap of URL vs. ISO Region showing all-cell oriented color scaling. This scaling shows which pages and regions are important, but hides much detail found in Figures 1 and 2.

Heatmaps are easier to read than scatterplots, but don’t necessarily yield as much information and can be misleading if the user does not know how data was scaled for the plot.

Notes

This article was written in RStudio and uses the d3heatmap package for all graphics.