Saturday, November 25, 2006 - Most recent posts

Googlebot: Preventing Duplicate Content

Duplicate content caused by your root AND index pages being indexed

You can find your content being indexed for the root and the index.php (the default page which can also be: index.html, index.htm, default.html, index.asp, etc.) pages like this:
www.Mydomain,com/index.php AND www.Mydomain,com
Especially if you have inadvertently linked to the index file (www.Mydomain,com/index.php), rather than the root (www.Mydomain,com). This situation can and often does occur with subfolders too.

Duplicate content caused by multiple domains on the same web space
Do you have more than one domain mapped to the same website/web content?
If so then you are not alone. This can have a negative affect on your pr and SEO.

Consider the affects it can have on search bots (googlebot, etc) finding duplicate content:
finds domain1 then domain2 which both map to the same web space. It could then decide to drop domain1 in favour of domain2, or worse still drop both domains!

Duplicate content caused by your domain being indexed with and without the "www"
www.Mydomain,com AND Mydomain,com
This is just like having two domain names pointing to the same content, it creates duplicate content.

So what do you do about it? There are two options.

The first option is where you are able to move the domain(s):
host each of the domain on their own web space with unique content.

The second option is where you are unable to host the domains on separate web space with separate unique content:
is to issue a HTTP status code: "301 Moved Permanently"

Below is the PHP code to redirect the traffic using an appropriate header and issue a HTTP 301 permanently moved status.
WAIT - is this correct? Yes, check with GoD (Google's official Documentation) here:
http://www.google.co.uk/support/webmasters/bin/answer.py?answer=34481
http://www.google.co.uk/support/webmasters/bin/answer.py?answer=34464
Using PHP you can use this to redirect traffic (for ASP, ColdFusion and .htaccess methods see other relevant reading below) and tell googlebot the pages have moved, add this PHP to the top of your PHP pages:
$PHP_URI=$_SERVER['REQUEST_URI']; // folders and filename - used over PHP_SELF as will NOT pickup on filename from root access (webserver loads default file: index.php)
$HTTP_HOST=$_SERVER['HTTP_HOST']; // domain name, may include .www
if ($HTTP_HOST=="localhost") $main_url="localhost"; // useful for testing on localhost
else $main_url="www.dwalker.co.uk";// * CHANGE to your domain to be indexed
if( (!ereg("www.",$HTTP_HOST) AND ($HTTP_HOST!="localhost") ) OR (ereg("index.php",$PHP_URI)>0) )
{
$the_folder = str_replace("index.php", "", $PHP_URI);
header ('HTTP/1.1 301 Moved Permanently');
header ('Location: http://'.$main_url.$the_folder);
exit;
}

*Make sure you check which domain you should add here, and whether you it should or should not include "www". Only you can know this by checking your website stats and the current indexed domain and pages with Google (the above PHP code can be downloaded here: http://www.dwalker.co.uk/blog/blog301redirect.zip )

Duplicate content is to be avoided at all costs, it will harm your website in the long term. You should always work towards UNIQUE content:
http://en.wikipedia.org/wiki/Big_Daddy_Google#Duplicate_Content

Other relevant reading:

Google’s Matt Cutts on Duplicate Content:
http://blog.outer-court.com/archive/2006-08-02-n60.html

ASP, .htaccess, and Coldfusion methods of issuing a 301 HTTP status:
http://www.thegooglecache.com/white-hat-seo/duplicate-content-round-up-diagnosis-and-correction-with-free-tools/

This post follows the information from: googlewebmastercentral.blogspot.com


Recommended UK Dedicated Servers
Recommended UK web hosting


Have fun and remember "content is still king".

Send me your comments...

Dave ;-)

Labels: ,