Duplicate Page Titles and Content

johnsearles

The SeoMoz crawler has found many pages like this on my site with /?Letter=Letter, e.g. http://www.johnsearles.com/metal-art-tiles/?D=A. I believe it is finding multiple caches of a page and identifying them as duplicates. Is there any way to screen out these multiple cache results?

johnsearles

I think I figured out what to add to Robots.txt to screen out any url with an '?' in it. I believe these ?urls are session IDs for Urls. I'll see what Roger-bot does next time it crawls my site.

Disallow: /*?

kenneth_martin

Hey John,

My apologies for any issues that you are experiencing with our service. I would definitely like to address any other issues, besides this one, that you may be experiencing. You could either respond to this Q&A thread or submit a private customer support ticket to our help team. If you go to our help hub (www.seomoz.org/help) you can easily submit a ticket by clicking the contact help team button.

As for your duplicate content question, it is important to know that any time the same content is found on more than one URL that it is considered duplicate content. WordPress is a good example where duplicate is often found but can be easily addressed.

In WordPress you could have your homepage www.domain.com and an author page www.domain.com/author/authorname. If your blog only has one author though this author page is going to be identical to your homepage and the result is your site having duplicate content. There are a few ways to resolve this though with the most popular being simply preventing access to the author page and redirecting it back to the homepage. This would prevent other sites from linking to these duplicate pages and they would instead link directly to the homepage.

Another option would be to use meta robots noindex and follow tags on the duplicate page, in this case the author page. This would prevent the page from being indexed but will still allow the links on the page to be found and crawled. You can also prevent access to these pages in your robots.txt file and our crawler can be isolated by using the user-agent rogerbot.

I hope that makes sense.

Let me know if you have any additional questions or concerns.

Kenny

johnsearles

Thanks Guy. I was thinking of subscribing to SeoMoz but the site reports have been less than useful. This is just one of 5 issues I've found.

warnerdata

So far no. Until they fix that little error you can use Google Webmaster Tool's to double check for real duplicate content.

The spider is seeing whatever.php?var=1 as a different page because some sites just use index.php?p=103 to be a page and p=102 another page. While others use the variables in the URL on the same page.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Duplicate Page Titles and Content

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Is one page with long content better than multiple pages with shorter content?

New pages on my web site

SEOmoz giving duplicate content that does not exist.

Adding canonical still returns duplicate pages

Google Hiding Indexed Pages from SERPS?

How do I scan down to 10000 pages?

Crawling One Page

We were unable to grade that page. We received a response code of 301\. URL content not parseable