Crawl Diagnostics bringing 20k+ errors as duplicate content due to session ids

blagger

Signed up to the trial version of Seomoz today just to check it out as I have decided I'm going to do my own SEO rather than outsource it (been let down a few times!). So far I like the look of things and have a feeling I am going to learn a lot and get results.

However I have just stumbled on something. After Seomoz dones it's crawl diagnostics run on the site (www.deviltronics.com) it is showing 20,000+ plus errors. From what I can see almost 99% of this is being picked up as erros for duplicate content due to session id's, so i am not sure what to do!

I have done a "site:www.deviltronics.com" on google and this certainly doesn't pick up the session id's/duplicate content. So could this just be an issue with the Seomoz bot. If so how can I get Seomoz to ignore these on the crawl?

Can I get my developer to add some code somewhere.

Help will be much appreciated. Asif

Gareth_Cartman

Hello Tom and Asif,

First of all Tom thanks for the excellent blog post re google docs.

We are also using the Jshop platform for one of our sites. And am not sure whether it is working correctly in terms of SEO. I just ran an seomoz crawl of the site and found that every single link in the list has a rel canonical in it, even the ones with session id's.

Here is an example:

www.strictlybeautiful.com/section.php/184/1/davines_shampoo/d112a41df89190c3a211ec14fdd705e9

www.strictlybeautiful.com/section.php/184/1/davines_shampoo

As Asif has pointed out the Jshop people say they have programmed it so that google cannot pick up the session ids, firstly is that even possible? And if I assume thats not an issue then what about the fact that every single page on the site has a rel canonical link on it?

Any help would be much appreciated.

<colgroup><col width="1074"></colgroup>
| |
| |

KeriMorgret

Asif, here's the page with the information on the SEOmoz bot.

http://www.seomoz.org/dp/rogerbot

blagger

Thanks for the reply Tom. Spoke to our developer he has told me that the website platform (Jshop) does not show session ID's to the search engines so we are ok on that side. However as it doesn't recognise the Seomoz bot it shows it the session ID's. Do you know where I can find info on the Seomoz bot so we can see what it identifies itself as so it can be added to the list of recognised spiders?

Thanks

Tom-Anthony

Hi Asif!

Firstly - I'd suggest that as soon as possible you address the core problem - the use of session ids in the URL. There are not many upsides to the approach and there are many downsides.That it doesn't show up with the site: command doesn't mean it isn't having a negative impact.

In the meantime, you should add a rel=canonical tag to all the offending pages pointing to the URL without the session id. Secondly, you could use robots.txt to block the SEOmoz bot from crawling pages with session ids, but it may affect the bots ability to crawl the site if all the links it is presented with are with session ids - which takes us back around to fixing the core problem.

Hope this helps a little!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Crawl Diagnostics bringing 20k+ errors as duplicate content due to session ids

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Unsolved Using Weglot on wordpress (errors)

Duplicate Content in WordPress Taxonomies & Noindex, Follow

404 errors in SEOMoz crawl tool

How long is a full crawl?

Duplicate Content

Campaign 4XX error gives duplicate page URL

Why are these pages considered duplicate page content?

Crawl Issues