Information is power. It’s no different in the world of SEO. So here’s an interesting way to get more information on indexation by optimizing your sitemap index file.
What is a Sitemap Index?
A sitemap index file is simply a group of individual sitemaps, using an XML format similar to a regular sitemap file.
You can provide multiple Sitemap files, but each Sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 10MB (10,485,760 bytes). […] If you want to list more than 50,000 URLs, you must create multiple Sitemap files.
If you do provide multiple Sitemaps, you should then list each Sitemap file in a Sitemap index file.
Most sites begin using a sitemap index file out of necessity when they bump up against the 50,000 URL limit for a sitemap. Don’t tune out if you don’t have that many URLs. You can still use a sitemap index to your benefit.
Googling a Sitemap Index
I’m going to search for a sitemap index to use as an example. To do so I’m going to use the inurl: and site: operators in conjunction.
Best Buy was top of mind since I recently bought a TV there and I have a Reward Zone credit I need to use. The sitemap index wasn’t difficult to find in this case. However, they don’t have to be named as such. So if you’re doing some competitive research you may need to poke around a bit to find the sitemap index and then validate that it’s the correct one.
Opening a Sitemap Index
You can then click on the result and see the individual sitemaps.
Here’s what the sitemap index looks like. A listing of each individual sitemap. In this case there are 15 of them, all sequentially numbered.
Looking at a Sitemap
The sitemaps are compressed using gzip so you’ll need to extract them to look at an individual sitemap. Copy the URL into your browser bar and the rest should take care of itself. Fire up your favorite text program and you’re looking at the individual URLs that comprise that sitemap.
So within one of these sitemaps I quickly find that there are URLs that go to a TV a Digital Camera and a Video Game. They are all product pages but there doesn’t seem to be any grouping by category. This is standard, but it’s not what I’d call optimized.
Sitemap Index Metrics
Within Google Webmaster tools you’ll be able to see the number of URLs submitted and the number indexed by sitemap.
Here’s an example (not Best Buy) of sitemap index reporting in Google Webmaster tools.
So in the case of the Best Buy sitemap index, they’d be able to drill down and know the indexation rate for each of their 15 sitemaps.
What if you created those sitemaps with a goal in mind?
Sitemap Index Optimization
Instead using some sequential process and having products from multiple categories in an individual sitemap, what if you created a sitemap specifically for each product type?
In the case of video games you might need multiple sitemaps if the URL count exceeds 50,000. No problem.
Now, you’d likely have more than 15 sitemaps at this point but the level of detail you suddenly get on indexation is dramatic. You could instantly find that TVs were indexed at a 95% rate while video games were indexed at a 56% rate. This is information you can use and act on.
It doesn’t have to be one dimensional either, you can pack a lot of information into individual sitemaps. For instance, maybe Best Buy would like to know the indexation rate by product type and page type. By this I mean, would Best Buy want to know the indexation rate of category pages (lists of products) versus product pages (an individual product page.)
To do so would be relatively straight forward. Just split each product type into separate page type sitemaps.
And so on and so forth. Grab the results from Webmaster Tools and drop them into Excel and in no time you’ll be able to slice and dice the indexation rates to answer the following questions. What’s the indexation rate for category pages versus product pages? What’s the indexation rate by product type?
You can get pretty granular if you want though you can only pack each sitemap index with 50,000 sitemaps. Then again, you’re not limited to just one sitemap index either!
In addition, you don’t need 50,000 URLs to use a sitemap index. Each sitemap could contain a small amount of URLs, so don’t pass on this type of optimization thinking it’s just for big sites.
Connecting the Dots
Knowing the indexation rate for each ‘type’ of content gives you an interesting view into what Google thinks of specific pages and content. The two other pieces of the puzzle are what happens before (crawl) and after (traffic). Both of these can be solved.
Crawl tracking can done by mining weblogs for Googlebot (and Bingbot) by the same sitemap criteria. So, not only do I know how much bots are crawling each day I know where they’re crawling. As you make SEO changes, you are then able to see how it impacts the crawl and follow it through to indexation.
The last step is mapping it to traffic. This can be done by creating Google Analytics Advanced Segments that match the sitemaps using regular expressions. (RegEx is your friend.) With that in place, you can track changes in the crawl to changes in indexation to changes in traffic. Nirvana!
Go to the Moon
Doing this is often not an easy exercise and may, in fact, require a hard look at site architecture and URL naming conventions. That might not be a bad thing in some cases. And I have implemented this enough times to see the tremendous value it can bring to an organization.
I know I covered a lot of ground so please let me know if you have any questions.