Visualizing Web Analytics Data in R Part 2: Interactive Outliers
This article is the second in a series about visualizing Google Analytics and other web analytics data using R. This article focuses on determining which articles have click-through rates higher than would be expected for their average search position and which articles need work. The series hopes to show how R and interactive visualizations can help to answer the following business questions:
- Which articles need work to improve search engine ranking
- Which articles are well ranked but do not get clicked, and need work on titles or meta data
- Where to focus efforts for new content
- How to use passive web search data to focus new product development
The other articles in the series are:
- Visualizing Web Analytics Data in R Part 1: the Problem
- Visualizing Web Analytics Data in R Part 3: Interactive 5D (3D)
- Visualizing Web Analytics Data in R Part 4: Interactive Globe
- Visualizing Web Analytics Data in R Part 5: Interactive Heatmap
- Visualizing Web Analytics Data in R Part 6: Interactive Networks
- Visualizing Web Analytics Data in R Part 7: Interactive Complex
Interactive Scatterplot Labels Help Domain Experts Understand Data
In the previous article, simple plots of the daily Google Search Console information provided a great deal of information on how to search position influences the click-through rate (CTR), but did not help to give information on why some days performed better than expected, or for that matter which days performed better than expected. The googleVis
R package provides the ability to label fly-overs for data points, which makes it much easier to figure out what is going on with with better-than-expected and worse-than-expected events.
Figure 1 shows a googleVis
plot of CTR vs. position with point labels that include the date and number of impressions. This makes it possible to figure out which dates to go back and investigate further. Clearly, June 28, 29, and 30 are days to investigate further to figure out what content users were searching to find (which drove high numbers of impressions) and what content they found the was especially relevant (high CTR).
An analysis of active queries at that time showed that there was a high search volume for information on a Google Analytics referral spam domain 100dollars-seo.com
; during this period, an article on this web site ranked highly on this search term and was unusual in that most of the other highly ranked articles were operated by the referral spammer. The content was unusual relative to the other items in the search list, and was clearly different from other highly ranked sites due to the title and snippet.
This still does not directly tell what page to look at; the next section looks at visualizations for landing pages.
Interactive Scatterplot of Landing Page Data
Although the previous analysis provided a lot of seasonality information, it did not give actionable information on which individual pages worked well and which did not. For that a data set based upon page is needed. Figure 2 shows a googleVis
plot of pages by search position and CTR where the average position is 19.057
and average CTR is 0.038
. The pages above a CTR of 0.04 perform better than expected; a qualitative review of these pages indicates three general attributes:
- The articles contain specialized
schema.org
structured markup. - The articles are topically very different from others returned for the search terms.
- The articles may be longer than the pages that underperform with respect to CTR.
The simple step of adding interactive fly-over text can provide a great deal of information quickly and easily. The next visualization will show how to show five data dimensions in a three-dimensional interactive scatter plot.
Notes
This article was written in RStudio and uses the googleVis
package for all graphics.