I have 155 pending comments right now. The overwhelming majority of them are pingbacks from benign scrapers. Some may see this as a boon but I view these scrapers as arterial plaque that could ultimately give the Internet a heart attack.
Here's my personal diagnosis.
My definition of a benign scraper is a site that scrapes content but provides attribution. I've gotten a ton of these recently because of links I received in high profile sites within the search community. Those sites are the target of these scrapers so my link gets carried along as part of the deal.
The attitude by most is that the practice won't damage the scraped site and may actually provide a benefit through the additional links. Heck, Jon Cooper at Point Blank SEO even came up with a clever way to track the scrape rate of a site as a way to determine which sites might be the best candidates for guest posts.
Signs and Symptoms
But what do these scraper sites look like? Some of these scrapers might have original content mixed in with the scraped content but in reviewing my pingbacks this seems like the exception and not the rule. Most of these benign scrapers are just pulling in content from a number of feeds and stuffing it onto the page hoping that users show up and click on ads and that the content owners don't take exception.
"Hey, I gave you a link, so we're cool, right bro?"
No bro, we're not cool.
This stuff is garbage. It's content pollution. It is the arterial plaque of the Internet.
Google is trying to keep up and often removes this dreck from the index.
But for every one that Google removes there's another that persists.
How long until the build up of this arterial plaque gives the Internet a heart attack? One day we'll wake up and the garbage will be piled high like a horrifying episode of Hoarders.
The industry attitude toward these scrapers is essentially a tacit endorsement. It brings to mind the quote attributed to Edmund Burke.
All that is necessary for the triumph of evil is that good men do nothing.
We turn a blind eye and whistle past the graveyard happily trusting that Google will sort it all out. They'll make sure that the original content is returned instead of the scraped content. That's a lot of faith to put in Google, particularly as they struggle to keep up with the increasing pace of digital content.
Are we really this desperate for links?
Yet, we whine about how SEO is viewed by those outside of the industry. And we'll whine again when Google gets a search result wrong and shows a scraper above the original content. Indignant blog posts will be written.
Even if we wanted to, we have few tools at our disposal to tell Google about these sites. The tools we do have are onerous and inefficient.
It doesn't have to be that way.
Why not build a Chrome extension that lets me flag and report scraper sites? Or a WordPress Plugin that lets me mark and report a site as a scraper directly within the comment interface. Or how about a section in Google Webmaster Tools where I can review links?
Sure, there are reporting issues and biases but those are solvable problems. Thing is, many doctors have a God complex. Google may not think we're able to contribute to the diagnosis. That would be a mistake.
Maybe we don't want to be cured. Perhaps we're all willing to let this junk persist, willing to smile as your mom finds one of these sites when she's looking for that article you wrote. Willing to believe that your brand is totally safe when it appears on these sites. But the rest of the world isn't nearly as savvy as you think.
I know many of these links work, but they shouldn't. The fact that they do worries me. Because, over time, people might not be able to tell the difference and that's not the Internet I want.
Today these scrapers are benign but tomorrow they could turn malignant.