What does Google do about duplicate content?
You may have heard that Google warns against duplicate content and may even penalize a site that violates guiding principles. Google’s Webmaster Guidelines clearly outlines in their “Quality Guidelines” that one should not “create multiple pages, sub domains, or domains with substantially duplicate content.”
What is duplicate content?
When the number one search engine in the world speaks, the world should listen. Google defines duplicate content as “substantive blocks of content within or across domains that either completely match other content or are appreciably similar.”
What isn’t duplicate content?
Google clearly spells out what they do not consider duplicate content in Deftly dealing with duplicate content. Included is the statement that their algorithms do not see the “same article written in English and Spanish as duplicate content.” In addition, the use of occasional snippets and quotes should not be a concern and will not be flagged as duplicate content.
Google on duplicate content due to “scrapers”
What if your website content has been scraped and republished on another site? Will Google penalize the original site? Google published an extremely informative article, “Duplicate content due to scrapers” that addresses these concerns. Sven Naumann, from Google’s Search Quality Team, explains that the search engine giant realizes that webmasters have no control or influence on others lifting content and redistributing that content without their consent. The result would be identical content across several websites and Google perceives this as “not inherently regarded as a violation of our webmaster guidelines.”
Naumann explains that Google takes further “processes with the intent of determining the original source of the content.” He reveals that the number one search engine is “quite good” in most circumstances at deciding which site the content originated at “resulting in no negative effects for the site that originated the content.”
How many webmasters and site owners have fought the plagiarism battle with the major concern that somehow their original work will be perceived as duplicate content resulting in de-indexing from the world’s number one search engine?
In general, Google can distinguish between two main situations associated to duplicate content that include:
- Within your domain duplicate content: identical content which appears in more than one place on your site – often unintentionally.
- Cross domain duplicate content: identical content of your site which appears (again, often unintentionally) on different external sites.
In the case of the “scrapers” when someone plagiarizes your site’s content, Google assures us that they “look at various signals to determine which site is the original one, which usually works very well.” Furthermore, the explanation the states that one should not be extremely concerned about noticing negative effects on their website’s presence on Google if you discover that your site has been plagiarized. The “scraped” content will simply get filtered out of Google.
The final verdict on duplicate content resulting in a violation of Google’s webmaster guidelines and penalty is when, “there are signals pointing to deliberate and malicious intent.”
For frustrating plagiarism cases, Google states that website owners are welcome to file a DMCA request to claim content ownership and have them handle the offending site. For tips on fighting plagiarism, see: 5 Steps to Fight Website Plagiarism.
© R & R Web Design LLC – a Michigan web design company serving Michigan, Chicago, Illinois and beyond specializing in web design, search engine optimization and website maintenance.
Tags: duplicate content, Google on duplicate content, what is duplicate content?







