An essential aspect of assigning priority to any SEO recommendation is to understand scale as it applies to the site in question.
For example if a site is linking to URLs that have a certain parameter, these pages are unique in the eyes of search engines and if not handled properly can be considered duplication. The solution to cleaning up any particular non-ranking page type is two fold.
- Direct search engines to remove pages from index (like noindex).
- Make sure the site is not using these URLs within internal linking.
Most SEO crawlers have the ability to export internal linking information, but it can be a time and resource consuming process to do comprehensively. Benchmarks are necessary to make sure our clients non-ranking page types are handled properly. This article shows how to quickly and easily use information already available to you in Google Analytics to get a list and find pages linking to a particular group of URLs.
Tools used for this demonstration:
Before we begin create your own custom report for internal linking.
Google Webmaster Tool URL Parameters
One place to look for finding groups of pages we don’t want to be indexed is in Google Webmaster Tools -> Crawl -> URL Parameters. This report gives some great insight into parameters Googlebot has come across and how many.
Note: You may have to select Configure Parameters before seeing what is shown above.
Once given access to the parameters by clicking Edit, you will have access to a sample of URLs Googlebot saw containing the parameter in question. This attachement_id will be used as an example of a non-ranking page type for the duration of this demonstration.
For the purpose of this exercise, this is one of many parameters found within internal linking we do not want in Google’s index. We’re going to recommend the client add noindex meta tags to these pages, but first we want to further benchmark in order to properly prioritize and make sure the problem is completely dealt with after implementation. To do this we’ll gather a count and comprehensive list of:
- Links containing attachment_id – For checking noindex after implementation.
- Pages pointing to URLs containing attachment_id – To clean up internal linking pointing to the parameter.
Number of Links Containing the Parameter (or Non-Ranking URL Page Type)
To grab a quick count and list of pages by navigating in Google Analytics to Behavior -> Site Content -> All Pages or import this custom report.
Be sure to expand the date in Google Analytics, for low traffic sites (like this one), it might take a larger range.
Note: Advanced filter can be used for getting lists of pages with more difficult URL patterns. For example you can use regular expression. We use advanced filter further down.
Once you have a full list expand the Show rows dropdown to include everything and Export. We now have a full list and count of 27 instances of the non-ranking page type parameter attachment_id. We can crawl this list to determine what directives (like noindex) if any exist as a benchmark. For this example there were none.
Now to pull internal linking information.
Pages Pointing to URLs Containing the Parameter (or Non-Ranking URL Page Type)
As a first step, click the advanced link next to the search box in the analytics Internal Linking custom report (instructions at the top of this article).
Accessing the advanced filter gives us powerful functionality to identify groups of pages.
To identify pages linking to URL containing the attachment_id parameter we want to include within the Page (1) column URLs Containing (2) the attachment_id (3) parameter. Previous Page Path (the first column) represents the page containing the link (the second column).
If you see the icon below, be sure to yank that slider to the right for more data.
Expand Show rows to include all rows and export, or save the report as a Shortcut to reference in Google Analytics.
Conclusion
Using Google Analytics we’ve created a list, identified 27 pages that use the attachment_id parameter, and where those pages are linked to within the site. Using a crawler to comprehensively benchmark any directives or annotations (noindex, canonicals, ect) would be a good idea at this point.
In this hypothetical situation, we could now send the client this information with a recommendation to add the noindex meta tag and update the links found with internal linking.
For Consideration
Using the attachment_id parameter was just one example of what can be accomplished with any URL pattern that can be used to identify non-ranking page types. Using the advanced filter in Google Analytics is a quick powerful way to do this.
One further possibility, is to use the Google Analytics notification system to send an email when the instances of pageviews for non-ranking page types drops below a certain pageview threshold. This would essentially let you know when the implementation has occurred on the client’s site.
This article is meant to be a starting point for further development. Try taking these custom reports and tweaking for even better and personalized insights!
Note dependencies: Users will actually have to fire the analytics code on these pages for them to show up in reports. Date range is also an important consideration. If set 2 years ago, what shows up in reports may be outdated.