Visualizing Web Analytics in R Part 6: Interactive Networks
This article is the sixth in a series about visualizing Google Analytics and other web analytics data using R. This article focuses on using interactive network visualizations to show where search-related website traffic originates. The series hopes to show how R and interactive visualizations can help to answer the following business questions:
- Which articles need work to improve search engine ranking
- Which articles are well ranked but do not get clicked, and need work on titles or meta data
- Where to focus efforts for new content
- How to use passive web search data to focus new product development
The other articles in the series are:
- Visualizing Web Analytics Data in R Part 1: the Problem
- Visualizing Web Analytics Data in R Part 2: Interactive Outliers
- Visualizing Web Analytics Data in R Part 3: Interactive 5D (3D)
- Visualizing Web Analytics Data in R Part 4: Interactive Globe
- Visualizing Web Analytics Data in R Part 5: Interactive Heatmap
- Visualizing Web Analytics Data in R Part 7: Interactive Complex
Network Diagram Using networkD3 and htmlwidgets
Network diagrams are useful for understanding complex datasets. The networkD3 and htmlwidgets package provides a way to generate interactive network diagrams that can be manipulated in a web browser. Figure 1 shows the by page and country Google Analytics session data displayed using the simpleNetwork
call, while Figure 2 shows the same data displayed using the forceNetwork
call. In Figure 2, the large nodes represent countries, while the smaller nodes in a ring around each country represent the pages that are referenced from that country. The size of each node is log(sessions + 1)
to differentiate nodes based upon the number of sessions where the sessions vary from 1 to about 20,000.
For this particular data set, neither of these visualizations are as useful as the heatmaps discussed previously, but they demonstrate what can be done with the network visualizations. For the behavioral flow in Google Analytics, these networks would be the best possible visualization; unfortunately, I have not gotten that data out of Google Analytics yet.
In preparing the forceNetwork
visualization, it is important to remember that the indexing begins at 0 as in C
rather than 1 as used in R. As you develop a forceNetwork
visualization, start with an empty link dataframe and then add links. The JavaScript code used for the on-click box is shown in Figure 3 while the full forceNetwork
call is shown in Figure 4.
The htmlwidgets
package is used to provide the functions to save the visualizations with the
saveWidget(chartWid,saveName,selfcontained = TRUE)
call. The selfcontained = TRUE
parameter puts everything into a single HTML file that is easier to manage on some web sites. The files are minified, but are not compressed.
forceNetwork
visualization.MyClickScript <-'alert("You clicked node " + d.name + " which is in row " + (d.index + 1) + " of the nodeDf R data frame. It has " + (Math.floor(Math.exp(d.nodesize) - 1)) + " sessions.");'
forceNetwork
visualization.sapply(nodeDf,class)
## name group size Sessions destNode nodeID ## "factor" "factor" "numeric" "numeric" "character" "numeric" ## destNodeID ## "numeric"
sapply(arcFinDf,class)
## source target value ## "integer" "integer" "integer"
chartWid <- forceNetwork(Links = arcFinDf, Nodes = nodeDf, Source = "source", Target = "target", Group = "group", NodeID = "name", Value = "value", Nodesize = "size", width=800, height=600, charge=-(1e1), linkDistance = JS("function(d){return d.value * 10}"), radiusCalculation = JS("d.nodesize+6"), zoom=TRUE, legend=TRUE, opacity = 0.8, clickAction = MyClickScript, bounded=FALSE )
Notes
This article was written in RStudio and uses the networkd3
package for rendering and htmlwidgets
package for saving HTML files.