Visualizing Web Analytics in R Part 6: Interactive Networks

This article is the sixth in a series about visualizing Google Analytics and other web analytics data using R. This article focuses on using interactive network visualizations to show where search-related website traffic originates. The series hopes to show how R and interactive visualizations can help to answer the following business questions:

  • Which articles need work to improve search engine ranking
  • Which articles are well ranked but do not get clicked, and need work on titles or meta data
  • Where to focus efforts for new content
  • How to use passive web search data to focus new product development

The other articles in the series are:

Network Diagram Using networkD3 and htmlwidgets

Network diagrams are useful for understanding complex datasets. The networkD3 and htmlwidgets package provides a way to generate interactive network diagrams that can be manipulated in a web browser. Figure 1 shows the by page and country Google Analytics session data displayed using the simpleNetwork call, while Figure 2 shows the same data displayed using the forceNetwork call. In Figure 2, the large nodes represent countries, while the smaller nodes in a ring around each country represent the pages that are referenced from that country. The size of each node is log(sessions + 1) to differentiate nodes based upon the number of sessions where the sessions vary from 1 to about 20,000.

For this particular data set, neither of these visualizations are as useful as the heatmaps discussed previously, but they demonstrate what can be done with the network visualizations. For the behavioral flow in Google Analytics, these networks would be the best possible visualization; unfortunately, I have not gotten that data out of Google Analytics yet.

In preparing the forceNetwork visualization, it is important to remember that the indexing begins at 0 as in C rather than 1 as used in R. As you develop a forceNetwork visualization, start with an empty link dataframe and then add links. The JavaScript code used for the on-click box is shown in Figure 3 while the full forceNetwork call is shown in Figure 4.

The htmlwidgets package is used to provide the functions to save the visualizations with the

saveWidget(chartWid,saveName,selfcontained = TRUE)

call. The selfcontained = TRUE parameter puts everything into a single HTML file that is easier to manage on some web sites. The files are minified, but are not compressed.

Figure 1. An interactive network visualization generated by the simpleNetwork call in the networkd3 package.
Figure 2. An interactive network visualization generated by the forceNetwork call in the networkd3 package. Clicking on a node will display the session count for that node.
Figure 3. JavaScript used to generate the interactive click-text for the forceNetwork visualization.
  MyClickScript <-'alert("You clicked node " + d.name + " which is in row " + (d.index + 1) +  " of the nodeDf R data frame.  It has " + (Math.floor(Math.exp(d.nodesize) - 1)) + " sessions.");'
Figure 4. R call for the forceNetwork visualization.
  sapply(nodeDf,class)
##        name       group        size    Sessions    destNode      nodeID 
##    "factor"    "factor"   "numeric"   "numeric" "character"   "numeric" 
##  destNodeID 
##   "numeric"
  sapply(arcFinDf,class)
##    source    target     value 
## "integer" "integer" "integer"
  chartWid <- forceNetwork(Links = arcFinDf,
              Nodes = nodeDf,
              Source = "source",
              Target = "target",
              Group = "group",
              NodeID = "name",
              Value = "value",
              Nodesize = "size",
              width=800,
              height=600,
              charge=-(1e1),
              linkDistance = JS("function(d){return d.value * 10}"),
              radiusCalculation = JS("d.nodesize+6"),
              zoom=TRUE,
              legend=TRUE,
              opacity = 0.8,
              clickAction = MyClickScript,
              bounded=FALSE
              )

Notes

This article was written in RStudio and uses the networkd3 package for rendering and htmlwidgets package for saving HTML files.