I have more pages in my site map being blocked by the robot file than I have being allowed to be crawled. Is Google going to hate me for this?

absoauto

Using some rules to block all pages which start with "copy-of" on my website because people have a bad habit of duplicating new product listings to create our refurbished, surplus etc. listings for those products. To avoid Google seeing these as duplicate pages I've blocked them in the robot file, but of course they are still automatically generated in our sitemap. How bad is this?

SVmedia

When you say "people," are you saying your own web team duplicates content to make their job easier? Or am I missing something?...

If that's the case, you really should create unique URL's with unique page titles, product info, etc. That's the correct way to avoid getting hit for duplicate content - don't create it. It seems like what you're doing now is more of a band-aid solution to the problem.

I'd consider that even though creating unique content in situations like this can seem daunting and/or be more expensive, there's probably huge long-term gains to made if you do it right.

Asher

It is not bad, just not best practices because Google will still index the URL's if they are mentioned on other pages. Just to quote them:

"While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information..."

What I would do instead is either use rel="canonical" or 301 redirects. I hope that helps.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

I have more pages in my site map being blocked by the robot file than I have being allowed to be crawled. Is Google going to hate me for this?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Wrong page ranking on SERP, above more relevant page

Why is Google replacing my meta title with the business name on home page?

New google serps page design

Blocked By Meta Robots

Source page leading to a 404 pages in reports

How do i get Google to rank my page?

Pages not cached

Home page ranking dropped below internal pages