Visualizing Web Analytics Data in R Part 3: Interactive 5D (3D)
This article is the second in a series about visualizing Google Analytics and other web analytics data using R. This article focuses on determining which articles have click-through rates higher than would be expected for their average search position and which articles need work. The series hopes to show how R and interactive visualizations can help to answer the following business questions:
- Which articles need work to improve search engine ranking
- Which articles are well ranked but do not get clicked, and need work on titles or meta data
- Where to focus efforts for new content
- How to use passive web search data to focus new product development
The other articles in the series are:
- Visualizing Web Analytics Data in R Part 1: the Problem
- Visualizing Web Analytics Data in R Part 2: Interactive Outliers
- Visualizing Web Analytics Data in R Part 4: Interactive Globe
- Visualizing Web Analytics Data in R Part 5: Interactive Heatmap
- Visualizing Web Analytics Data in R Part 6: Interactive Networks
- Visualizing Web Analytics Data in R Part 7: Interactive Complex
Interactive 3D Scatterplot showing Five Dimensional Data
Showing more than three dimensional data gets to be a problem an any visualization; parallel axes work very well if you are presenting the data to scientists and engineers that are familiar with this type of plot, but it does not go over well with typical business professionals. For users that have not been trained to read parallel axes and other types of advanced charts, a 3D scatter plot is a good way to represent up to five dimensions; three on the x, y and z axes, a fourth dimension with dot size and a fifth dimension with color. If you can use additional dot types, you can represent a sixth data dimension.
The R threejs
makes it easy to generate an 3D scatterplot with five variables, though it has some limitations that will hopefully be removed as the package is enhanced.
Figure 1 shows a 3D scatter plot of Google Analytics Click-through rate (CTR), search position, and article size generated with the scatterplot3d
call in the threejs
package. The call allows two different rending tools–“canvas” and “webgl”. The “webgl” option is faster in most browsers as it uses OpenGL, but it does not allow changing the size of the dots. It also may not allow users on some mobile devices to manipulate the image, as some devices do not support OpenGL. The “canvas” rendering option is slower, but it allows changes to dot size and works better on mobile devices.
Figure 2 shows the same plot but uses “canvas” so that the dot size represents the number of times the page was presented in a search (impressions) and the color represents the classification of the page as general (blue), web (red), banking (green) and consumer (yellow).
Rotating it so that the Y axis (postion) is closest to the foreground, Figure 2 shows that the CTR (X axis) increases strongly with article size (Z axis) for general, banking and consumer articles, but not as strongly for web articles. With the X axis (CTR) in the foreground, it is clear that search position improves (low number to left is good) with article size.
The most prominent article, stopping-rachel-from-cardholder-services is pretty clearly in the top half of article size (Z axis). The single largest article, statistical-example-of-fair-lending-disparate-treatment-problem has one of the highest click-through rates and is generally positioned well in searches. The small articles that do well on both CTR and position tend to be very, very specific articles that people are trying to find specifically: bruce-moore
, the author’s contact page, and various articles with links to presentation slides.
Linear Regression of Article Size
The linear regression shown in Figure 3 below indicates that increasing page size and decreasing position improve the CTR, but the adjusted R-squared value of only 0.1181 indicates that this is not a good regression model, and that some other variable is probably much more important to the CTR. Including the query area (general, banking, web and consumer) does not shed much additional light on the situation as shown in Figure 4.
## ## Call: ## lm(formula = CTR ~ Position + pageSize, data = scatterDf) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.031509 -0.016552 -0.004023 0.012603 0.101812 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.400e-02 9.695e-03 1.444 0.1529 ## Position -2.296e-04 1.407e-04 -1.632 0.1069 ## pageSize 4.225e-07 1.743e-07 2.423 0.0178 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.0252 on 75 degrees of freedom ## Multiple R-squared: 0.1117, Adjusted R-squared: 0.08799 ## F-statistic: 4.714 on 2 and 75 DF, p-value: 0.01179
## ## Call: ## lm(formula = CTR ~ Position + pageSize + queryArea, data = scatterDf) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.031381 -0.018114 -0.003975 0.013424 0.105836 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.690e-02 1.272e-02 2.115 0.0379 * ## Position -1.837e-04 1.435e-04 -1.280 0.2045 ## pageSize 3.469e-07 1.852e-07 1.873 0.0652 . ## queryAreaConsumer -1.510e-02 1.077e-02 -1.403 0.1650 ## queryAreaGeneral -1.436e-02 8.816e-03 -1.629 0.1078 ## queryAreaWeb -5.934e-03 1.014e-02 -0.585 0.5604 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.02513 on 72 degrees of freedom ## Multiple R-squared: 0.1516, Adjusted R-squared: 0.09272 ## F-statistic: 2.574 on 5 and 72 DF, p-value: 0.03369
With no additional data elements in the Google Search Console data set, there isn’t much more insight that can be gained from this dataset without doing text mining on both the queries and the article content. Since this series is primarily about visualization, the next article in the series will look at using the globejs
call to visualize geographic data on an interactive globe.
Notes
This article was written in RStudio and uses the threejs
package for all graphics except for the linear regression residual plot. The formula display uses MathJax.