Google Indexing Duplicate URLs : Ignoring Robots & Canonical Tags

JBGlobalSEO

Hi Moz Community,

We have the following robots command that should prevent URLs with tracking parameters being indexed.

Disallow: /*?

We have noticed google has started indexing pages that are using tracking parameters. Example below.

http://www.oakfurnitureland.co.uk/furniture/original-rustic-solid-oak-4-drawer-storage-coffee-table/1149.html

http://www.oakfurnitureland.co.uk/furniture/original-rustic-solid-oak-4-drawer-storage-coffee-table/1149.html?ec=affee77a60fe4867

These pages are identified as duplicate content yet have the correct canonical tags:

https://www.google.co.uk/search?num=100&site=&source=hp&q=site%3Ahttp%3A%2F%2Fwww.oakfurnitureland.co.uk%2Ffurniture%2Foriginal-rustic-solid-oak-4-drawer-storage-coffee-table%2F1149.html&oq=site%3Ahttp%3A%2F%2Fwww.oakfurnitureland.co.uk%2Ffurniture%2Foriginal-rustic-solid-oak-4-drawer-storage-coffee-table%2F1149.html&gs_l=hp.3..0i10j0l9.4201.5461.0.5879.8.8.0.0.0.0.82.376.7.7.0....0...1c.1.58.hp..3.5.268.0.JTW91YEkjh4

With various affiliate feeds available for our site, we effectively have duplicate versions of every page due to the tracking query that Google seems to be willing to index, ignoring both robots rules & canonical tags.

Can anyone shed any light onto the situation?

AlanBleiweiss

Google's multi-layered multi-algorithm system has come a long way in being able to "figure it all out", yet at the same time, falls far short of always successfully "getting it right".

Robots.txt files are no longer an absolute directive. They're now "just another signal", as are canonical tags, meta robots instructions, and their own Google Webmaster URL Parameters system.

Because of this its critical to be consistent across all signals. If you've got the robots.txt file set to not index pages, but also have inbound links from affiliates, that's a prime example of where inbound link signals can override the robots.txt file's instruction if they're not nofollowed links.

While they technically SHOULD not index them after discovering them off-site (because the destination says "index this other version"), that's part of their confused multilayered system.

I have a question though - from what limited information you've provided, this example is based on a url parameter of ?ec=

When I search Google using site:http://www.oakfurnitureland.co.uk/ inurl:ec

I see only three such pages indexed AND where those pages are "fully" indexed. All the rest (over 1,000 additional URLs), are in the Google system, however every one of those others has a meta description of "A description for this result is not available because of this site's robots.txt - learn more."

What that means is they are NOT fully indexing those pages - there is no worry to be had about duplicate content for those. Google is simply tracking that those URLs exist.

So - is that the only URL parameter you're worried about? If so, it's not a major problem on your site. Except for those few exceptions, Google is doing what you need them to do with those.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Google Indexing Duplicate URLs : Ignoring Robots & Canonical Tags

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Geo-Targeted Sub-Domains & Duplicate Content/Canonical

Incorrect cached page indexing in Google while correct page indexes intermittently

Can Google index PDFs with flash?

Huge google index with un-relevant pages

About robots.txt for resolve Duplicate content

Lots of incorrect urls indexed - Googlebot found an extremely high number of URLs on your site

Original Source and Canonical tags

Duplicate Listings on Google Maps