<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: No Such Thing As A Good Scraper</title>
	<atom:link href="http://www.blindfiveyearold.com/no-such-thing-as-a-good-scraper/feed" rel="self" type="application/rss+xml" />
	<link>http://www.blindfiveyearold.com/no-such-thing-as-a-good-scraper?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=no-such-thing-as-a-good-scraper</link>
	<description>SEO, SEM, Marketing and Technology sprinkled with Sports, Parenting and Rants</description>
	<lastBuildDate>Fri, 17 May 2013 15:11:34 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/>	<item>
		<title>By: Content Curation Marketing, Content Scraping…What’s The Difference? &#124; Content Equals Money</title>
		<link>http://www.blindfiveyearold.com/no-such-thing-as-a-good-scraper#comment-8058</link>
		<dc:creator>Content Curation Marketing, Content Scraping…What’s The Difference? &#124; Content Equals Money</dc:creator>
		<pubDate>Thu, 23 Aug 2012 10:49:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.blindfiveyearold.com/?p=6261#comment-8058</guid>
		<description><![CDATA[[...] group content curation with content scraping. Some call it “overrated plagiarism” or “the arterial plaque of the Internet.” For all its detractors, though, content curation has just as many advocates, if not [...]]]></description>
		<content:encoded><![CDATA[<p>[...] group content curation with content scraping. Some call it “overrated plagiarism” or “the arterial plaque of the Internet.” For all its detractors, though, content curation has just as many advocates, if not [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Craig S. Kiessling</title>
		<link>http://www.blindfiveyearold.com/no-such-thing-as-a-good-scraper#comment-6989</link>
		<dc:creator>Craig S. Kiessling</dc:creator>
		<pubDate>Tue, 22 May 2012 18:28:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.blindfiveyearold.com/?p=6261#comment-6989</guid>
		<description><![CDATA[Excellent topics - both in the post and in the comments. I&#039;ve often wondered/worried about scrapers, social shares, canonical, authorship, etc. and how it would all play out.]]></description>
		<content:encoded><![CDATA[<p>Excellent topics &#8211; both in the post and in the comments. I&#8217;ve often wondered/worried about scrapers, social shares, canonical, authorship, etc. and how it would all play out.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Content-scraping is theft, pure and simple</title>
		<link>http://www.blindfiveyearold.com/no-such-thing-as-a-good-scraper#comment-6887</link>
		<dc:creator>Content-scraping is theft, pure and simple</dc:creator>
		<pubDate>Sun, 29 Apr 2012 09:42:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.blindfiveyearold.com/?p=6261#comment-6887</guid>
		<description><![CDATA[[...] Connolly posted a link on his Google+ page to &#8220;No Such Thing As A Good Scraper&#8221; by AJ Kohn, who writes a pretty good analysis and summary of what content scrapers often do [...]]]></description>
		<content:encoded><![CDATA[<p>[...] Connolly posted a link on his Google+ page to &#8220;No Such Thing As A Good Scraper&#8221; by AJ Kohn, who writes a pretty good analysis and summary of what content scrapers often do [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: AJ Kohn</title>
		<link>http://www.blindfiveyearold.com/no-such-thing-as-a-good-scraper#comment-6313</link>
		<dc:creator>AJ Kohn</dc:creator>
		<pubDate>Fri, 06 Apr 2012 14:26:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.blindfiveyearold.com/?p=6261#comment-6313</guid>
		<description><![CDATA[Kaj,

Yes, it&#039;s a &lt;strong&gt;very&lt;/strong&gt; strong signal and should provide additional fidelity to reducing scraped content and outright plagiarism.]]></description>
		<content:encoded><![CDATA[<p>Kaj,</p>
<p>Yes, it&#8217;s a <strong>very</strong> strong signal and should provide additional fidelity to reducing scraped content and outright plagiarism.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kaj Kandler</title>
		<link>http://www.blindfiveyearold.com/no-such-thing-as-a-good-scraper#comment-6288</link>
		<dc:creator>Kaj Kandler</dc:creator>
		<pubDate>Fri, 06 Apr 2012 00:41:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.blindfiveyearold.com/?p=6261#comment-6288</guid>
		<description><![CDATA[Isn&#039;t Google+ Authrorship verification the means to know original from copy? See &#039;&lt;a href=&quot;http://blog.conficio.com/blog/2012/04/05/is-google-author-ship-the-end-of-scraper-sites/ &quot; rel=&quot;nofollow&quot;&gt;Is Google Authorship the end of scraper sites&lt;/a&gt;&#039; for my thoughts.]]></description>
		<content:encoded><![CDATA[<p>Isn&#8217;t Google+ Authrorship verification the means to know original from copy? See &#8216;<a href="http://blog.conficio.com/blog/2012/04/05/is-google-author-ship-the-end-of-scraper-sites/ " rel="nofollow">Is Google Authorship the end of scraper sites</a>&#8216; for my thoughts.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Is Google+ author ship the end of scraper sites &#124; Plan-B for Software Documentation</title>
		<link>http://www.blindfiveyearold.com/no-such-thing-as-a-good-scraper#comment-6287</link>
		<dc:creator>Is Google+ author ship the end of scraper sites &#124; Plan-B for Software Documentation</dc:creator>
		<pubDate>Fri, 06 Apr 2012 00:39:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.blindfiveyearold.com/?p=6261#comment-6287</guid>
		<description><![CDATA[[...] pages of a site and re-publishes it as their own content. Tonight I read a blog post about &#8220;benign scraper sites&#8221; by AK [...]]]></description>
		<content:encoded><![CDATA[<p>[...] pages of a site and re-publishes it as their own content. Tonight I read a blog post about &#8220;benign scraper sites&#8221; by AK [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Marcos</title>
		<link>http://www.blindfiveyearold.com/no-such-thing-as-a-good-scraper#comment-6007</link>
		<dc:creator>Jon Marcos</dc:creator>
		<pubDate>Wed, 28 Mar 2012 05:20:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.blindfiveyearold.com/?p=6261#comment-6007</guid>
		<description><![CDATA[I am not entirely against junk, scraped content.Sometimes the original source host gets taken down or deleted, and we would otherwise not be able to access that information if it wasn&#039;t scraped. I would love Google to give more value to original posters, but if the article gives attribution, I think Google should still index it.]]></description>
		<content:encoded><![CDATA[<p>I am not entirely against junk, scraped content.Sometimes the original source host gets taken down or deleted, and we would otherwise not be able to access that information if it wasn&#8217;t scraped. I would love Google to give more value to original posters, but if the article gives attribution, I think Google should still index it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Micah</title>
		<link>http://www.blindfiveyearold.com/no-such-thing-as-a-good-scraper#comment-5983</link>
		<dc:creator>Micah</dc:creator>
		<pubDate>Tue, 27 Mar 2012 04:27:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.blindfiveyearold.com/?p=6261#comment-5983</guid>
		<description><![CDATA[P1 &amp; P2 agreed.

P3: If Google can get this to be around technorati and not /marketers/ then they might have something (hence the original functionality around links). But if identity is attached via marketers instead, then this won&#039;t be a normalization of contributions, but a bias of SEOs (whether white or black hat). I&#039;d say SU can figure it out b/c there&#039;s not enough money (at scale) to really go after them (similar to Apple and virii)--but for Google it makes economic sense and means a continual battle unfortunately.]]></description>
		<content:encoded><![CDATA[<p>P1 &amp; P2 agreed.</p>
<p>P3: If Google can get this to be around technorati and not /marketers/ then they might have something (hence the original functionality around links). But if identity is attached via marketers instead, then this won&#8217;t be a normalization of contributions, but a bias of SEOs (whether white or black hat). I&#8217;d say SU can figure it out b/c there&#8217;s not enough money (at scale) to really go after them (similar to Apple and virii)&#8211;but for Google it makes economic sense and means a continual battle unfortunately.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: AJ Kohn</title>
		<link>http://www.blindfiveyearold.com/no-such-thing-as-a-good-scraper#comment-5935</link>
		<dc:creator>AJ Kohn</dc:creator>
		<pubDate>Sun, 25 Mar 2012 16:00:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.blindfiveyearold.com/?p=6261#comment-5935</guid>
		<description><![CDATA[I agree Micah. I don&#039;t like the fact that Panda is applied at the domain level. Soon after I wrote about how &lt;a href=&quot;http://www.blindfiveyearold.com/farmer-update-about-sites-not-content&quot; rel=&quot;nofollow&quot;&gt;Panda treats lousy content the same as great content&lt;/a&gt;. While i think search results got marginally better, the overall effect was that they become &lt;strong&gt;different&lt;/strong&gt;. 

You&#039;re also right that smaller sites or blogs will encounter more problems. However, to date I haven&#039;t seen many scrapers target these sites. Frankly, a smarter scraper would look for high-quality but small blogs and try to do this. It&#039;s a scary idea.

I was generally negative about the use of the block functionality as a signal, in part because it &lt;strong&gt;was&lt;/strong&gt; biased to the Internati. However, with identity attached Google could normalize the contributions and throw out the ones that are clearly biased or submitted with malicious intent. (Heck, if StumbleUpon can figure out that I submitted my own site too often I know Google can crack this type of problem.) Now, the caveat here is that it won&#039;t be perfect. There will be a few false-positives. But I think the net result would be faster and more comprehensive identification of scrapers.]]></description>
		<content:encoded><![CDATA[<p>I agree Micah. I don&#8217;t like the fact that Panda is applied at the domain level. Soon after I wrote about how <a href="http://www.blindfiveyearold.com/farmer-update-about-sites-not-content" rel="nofollow">Panda treats lousy content the same as great content</a>. While i think search results got marginally better, the overall effect was that they become <strong>different</strong>. </p>
<p>You&#8217;re also right that smaller sites or blogs will encounter more problems. However, to date I haven&#8217;t seen many scrapers target these sites. Frankly, a smarter scraper would look for high-quality but small blogs and try to do this. It&#8217;s a scary idea.</p>
<p>I was generally negative about the use of the block functionality as a signal, in part because it <strong>was</strong> biased to the Internati. However, with identity attached Google could normalize the contributions and throw out the ones that are clearly biased or submitted with malicious intent. (Heck, if StumbleUpon can figure out that I submitted my own site too often I know Google can crack this type of problem.) Now, the caveat here is that it won&#8217;t be perfect. There will be a few false-positives. But I think the net result would be faster and more comprehensive identification of scrapers.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Micah</title>
		<link>http://www.blindfiveyearold.com/no-such-thing-as-a-good-scraper#comment-5850</link>
		<dc:creator>Micah</dc:creator>
		<pubDate>Thu, 22 Mar 2012 04:08:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.blindfiveyearold.com/?p=6261#comment-5850</guid>
		<description><![CDATA[Oh, I don&#039;t disagree about a different form of scraping, but it&#039;s these things we have to keep in mind for bots to differentiate between good and bad scraping; between sites that are legitimate but also have a bad scraping section; between a bad scraping section and some useful content. 

This is where domain level penalties can be hazardous as you go for a shotgun approach: Kill sites that are bad even if good content is on them in the hopes that good sites don&#039;t create junk to fill in the gaps and that other bad sites don&#039;t come up in turn.

Discovery date and link graph work well if you&#039;re first to market and getting links; the smaller you are, the harder it will be and the more those problems arise.

I&#039;m generally negative about allowing others to help for the following reasons:

-Biased group of people (basically marketers or techies)
-Once people know you can do this, spammers join in and kill legitimate sites
-Once you&#039;re penalized, you won&#039;t get looked at again.

This is likely why they don&#039;t use the block function anymore, it creates a strong incentive to nuke other people&#039;s websites.]]></description>
		<content:encoded><![CDATA[<p>Oh, I don&#8217;t disagree about a different form of scraping, but it&#8217;s these things we have to keep in mind for bots to differentiate between good and bad scraping; between sites that are legitimate but also have a bad scraping section; between a bad scraping section and some useful content. </p>
<p>This is where domain level penalties can be hazardous as you go for a shotgun approach: Kill sites that are bad even if good content is on them in the hopes that good sites don&#8217;t create junk to fill in the gaps and that other bad sites don&#8217;t come up in turn.</p>
<p>Discovery date and link graph work well if you&#8217;re first to market and getting links; the smaller you are, the harder it will be and the more those problems arise.</p>
<p>I&#8217;m generally negative about allowing others to help for the following reasons:</p>
<p>-Biased group of people (basically marketers or techies)<br />
-Once people know you can do this, spammers join in and kill legitimate sites<br />
-Once you&#8217;re penalized, you won&#8217;t get looked at again.</p>
<p>This is likely why they don&#8217;t use the block function anymore, it creates a strong incentive to nuke other people&#8217;s websites.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

 Served from: www.blindfiveyearold.com @ 2013-05-19 22:42:11 by W3 Total Cache -->