In February Aaron Bradley sent me an email to let me know that I had a duplicate content problem on this blog. He had just uncovered and rectified this issue on his own blog and was kind enough to give me a heads up.
The problem comes in the way that WordPress handles comment pagination. The default setting essentially creates a duplicate comment page.
Here's what it looks like in the wild. Two pages with the same exact content.
That's not good. Not good at all.
The comment-page-1 issue offends my own SEO sensibilities, but how big of a problem is it really?
There are 28 million inurl results for comment-page-1. 28 million!
Do the same inurl search for comment-page-2 and you get about 5 million results. This means that only 5 million of these posts attracted enough comments to create a second paginated comment page. Subtract one from the other and you wind up with 23 million duplicate pages.
The Internet is a huge place so this is probably not a large percentage of total pages but ... it's material in my opinion.
Change Your Discussion Settings
If you're running a WordPress blog I implore you to do the following.
Go to your WordPress Dashboard and select Settings --> Discussions.
Unchecking the 'break comments into pages' setting will ensure you're not creating duplicate comment pages moving forward. Prior comment-page-1 URLs did redirect, but seemed to be doing so using a 302 (yuck). Not satisfied I sought out a more permanent solution.
Implement an .htaccess RewriteRule
It turns out that this has been a known issue for some time and there's a nice solution to the comment-page-1 problem in the WordPress Forum courtesy of Douglas Karr. Simply add the following rewrite rule to your .htaccess file.
RewriteRule ^(.*)/comment-page-1/ $1/ [R=301,L]
This puts 301s in place for any comment-page-1 URL. You could probably use this and keep the 'break comments into pages' setting on, which would remove duplicate comment-page-1 URLs but preserve comment-page-2 and above.
Personally, I'd rather have the comments all on one page or move to a commenting platform. So I turned the 'break comments into pages' setting off and went a step further in my rewrite rule.
RewriteRule ^.*/comment-page-.* $1/ [R=301,L]
This puts 301s in place for any comment-page-#. Better safe than sorry.
Don't Rely on rel=canonical
Many of the comment-page-1 URLs have a rel=canonical in place. However, sometimes it is set up improperly.
Here the rel=canonical actually reinforces the duplicate comment-page-1 URL. I'm not sure if this is a problem with the Meta SEO Pack or simple user error in using that plugin.
Many times the rel=canonical is set up just fine.
The All in One SEO Pack does have a Canonical URL option. I don't use that option but I'm guessing it probably addresses this issue. The problem is that rel=canonical doesn't stick nearly as well as a 301.
So even though this post from over three months ago has a rel=canonical, the comment-page-1 URL is still being returned. In fact, there are approximately 110 instances of this on this domain alone.
Stop Comment-Page-1 Spam
23 million pages and counting. Sure, it would be nice if WordPress would fix this issue, but short of that it's up to us to stop this. Fix your own blog and tell a friend.
Friends don't let friends publish duplicate content.