It's a war out there - the search engine companies throwing massive resources and working 24/7 to deliver the most relevant results. The accuracy, usefulness and relevance of search results are their currency, their lifeblood and their core business differential. Top search engine placements are the holy grail of SEO professionals and website owners; high positioning means high visibility, and therefore high traffic and potentially enormous returns. As well as numerous, legitimate White Hat techniques that comply with search engine optimisation guidelines, there are so-called Black Hat techniques that seek to trick the search engines into assigning unwarranted relevance and authority to sites. It's the Black Hat techniques that the search companies strive to nullify in order to maintain the integrity of their business models.
With the recent introduction of new filters, one of the hot topics of debate recently amongst web masters and SEO professionals has been duplicate content. Google define duplicate content as 'substantive blocks of content within or across domains that either completely match other content or are appreciably similar'.
With everyone talking content, content, content, as a means of improving SEO, it's proving a common SEO misconception that creating multiple or similar copies of the same page will either increase the chances of getting listed in search engines or help achieve multiple listings due to the presence of more keywords.
There are a number of reasons that search engines dislike duplicated content. One is that they don't want to show the same pages in their search results. Another is that they don't want to spend the resources in indexing pages that are substantially similar. Perhaps the most important is that deliberately gaming the system threatens the integrity of their search results. It's Black Hat and to be discouraged.
When duplicate content and search engine spam is identified the page is filtered and dropped from the search engine index. No indexing, no search returns. Extremely bad new for all concerned - except your competition.
Duplication, however, isn't always the product of underhand attempts to gain search advantage; there are other ways that it can occur and it's important to recognise how some of these unforeseen circumstances might occur so that they can be avoided.
In many cases a webmaster has little or no influence on identical content appearing on third party sites having been scraped and redistributed without the web master's consent or knowledge. The search engines know this happens and don't always treat this identical content as a violation of web master guidelines. Identification of duplicate content will normally lead in the first instance to further investigative processes with the intent of determining the original source of the content. In most cases the original content can be verified as legitimate and there will be no negative effects for the site that originated the content.
There are occasions when good pages and legitimate sites are accidentally filtered out. In order to avoid falling victim to such a miscarriage of search engine justice, it's a good idea to appreciate how search engines determine what duplicate content is and what might cause an engine to scream spam.
- Websites with identical pages and mirroring - these pages are considered duplicate and can also considered to be spam. Affiliate sites with an identical look and feel containing identical content are (unsurprisingly) especially vulnerable to a duplicate content filter, as are websites with doorway pages. Legitimate international organisations with representation in different countries can avoid any confusion by creating content specific to each location. On a technical level using country specific Top Level Domains and local hosting accentuates the authenticity of the operation. Also be aware of the fact that mirror sites that simply republish the same content on multiple domains are also filtered. For example, some sites having multiple domains like that are all mirroring the same content. 301 redirection of the duplicate domains to the main domain eliminates duplicate content issues and consolidates link popularity.
- Scraped Content - scraped content is using repackaged content to make it look original; in essence it is no more than a duplicate page. With the volume of blog generated content growing rapidly through syndication, scraping is becoming an increasing problem for the search engines. Staking a claim to your content by frequently using your brand, including fixed links, and hosting page images locally will make people less inclined to abuse your content and easier to identify if they do. Taking the time and trouble to make your content specific to your organisation and not prone to duplication should be at the forefront of your SEO efforts as well as a good technique to avoid having your pages filtered from the index.
- E-Commerce Product Descriptions - product pages are a ready source of duplicate content as they've often been constructed using a single template. Many eCommerce sites typically share the same basic product descriptions, rarely altering the content. The search when indexing virtually identical content across numerous pages will identify it as spam. Options are to generate unique content for all of your product pages, instruct your robots.txt so they only crawl only one product description or consider product page consolidation. Consolidate your product pages and apply an alternative method to display product descriptions and information, by using CSS for example.
- Syndication - publishing an article, having it copied and posted all over the Internet can be both good and bad. Though Yahoo! and MSN determine the source of the original article and deem it most relevant in search results, other search engines like Google may not, possibly filtering out the original article and showing one of the syndicated copies. To try to prevent this it is wise to follow Google's advice on this issue and ensure that each site on which your content is syndicated includes a link back to your original article. Also, ask those who use your syndicated material to block the version on their sites with robots.txt.
As we are familiar with the regulation and experienced when it comes to applying them effectively, SEO Consult would be delighted to address any duplication issues your organisation may have. Contact us today for more information.









