Where do these URL's come from?! (Indexation issues)

DocdataCommerce

We have an international webshop with languages in the URLs. Our URLs are now set up as follows:

http://thermalunderwear.eu/eng/category/product

Now, we know that there's some kind of strange redirect problem causing problems with our indexation, this is a technical issue that should be fixed soon. But whether this is the cause of some other strange problems, I do not know. I'd be happy with any help/advice/tips.

1. The SEOmoz site crawler starts at http://thermalunderwear.eu. This currently does not yet redirect to http://thermalunderwear.eu/eng like we want it to, but all the links on the page do include the default language code. So all links on the page are http://thermalunderwear.eu/eng/category etc. However, apart from those URLs, the site crawler finds many URLs in the form http://thermalunderwear.eu/category/product etc., so not including the language variable. Where it gets these I do not know, and since these URLs dont exist and the webshop simply shows the homepage, these URLs all have 50+ duplicate titles/content. Why oh why?

2. If I do a Google search for indexed URL's with English as language, I get many results formatted like this:

Coldpruf Enthusiast mens thermal shirt - Thermal wear for men ...
thermalunderwear.eu/eng/men/coldpruf-enthusiast-mens-thermal-shirt 170+ items – Fine-ribbed longsleeve thermal shirt men from Enthusiast ... {$SCRIPT_NAME} eng/men/coldpruf-enthusiast-mens-the {$ajax_url} http://thermalunderwear.eu/ajax

What are those variables doing there? It looks like it's taking something from our Smarty debug console, which is hidden but still active in the source code, but also the ajax URL which is in a completely different location. What is Google trying to show here?

AlanMosley

It sees it as a list, its like rich snipits , its a huge amount of your content, and things it is the main content.

see these reullts. 40+ is a list i have in my page, it shows a few samples

http://www.google.com.au/webhp?hl=en#hl=en&site=webhp&q=perth+seo+company+violations&oq=perth+seo+company+violations&aq=f&aqi=&aql=&gs_sm=e&gs_upl=904l904l0l1303l1l1l0l0l0l0l380l380l3-1l1l0&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=9a730073ec537500&biw=1229&bih=846

DocdataCommerce

I guess that is the only solution then. I don't quite understand why Google picks that information to show in the SERP text (as well as the 170+ items) but we'll try disabling the Smarty debugging when we're not actively using it. I hope it helps!

AlanMosley

I looked in the souce code of this page

http://thermalunderwear.eu/eng/men/devold-alpine-knee-thermal-socks-electric-blue

And i found {$SCRIPT_NAME} eng/men/coldpruf-enthusiast-mens-the

Your dubug code is in the souce code. you need to get rid of it, disable it or something. I have not used smarty debug, so I cant help much.

DocdataCommerce

Ah thanks Alan! It looks like there is a problem in the code that generates the breadcrumb URLs. We will get that fixed asap, whicih should lower the number of duplicate content warnings considerably.

AlanMosley

Your first problem

Look at this page,

http://thermalunderwear.eu/eng/kids-thermal-underwear/coldpruf-enthusiast-kids-thermal-shirt

you will see a link to http://thermalunderwear.eu/kids-thermal-underwear/coldpruf-enthusiast-kids-thermal-shirt

I will look at your other porblem in a few minutes

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Where do these URL's come from?! (Indexation issues)

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Magento Dynamic Pages Being Indexed

How to track data from old site and new site with the same URL?

How can I see the URL's affected in Seomoz Crawl when Notices increase

Tool for tracking actions taken on problem urls

URL Encoding

How to remove Duplicate content due to url parameters from SEOMoz Crawl Diagnostics

Anyone having issues with OSE?

"Issue: Duplicate Page Content " in Crawl Diagnostics - but these pages are noindex