Duplicate Page content | What to do?
-
Hello Guys,
I have some duplicate pages detected by MOZ. Most of the URL´s are from a registracion process for users, so the URL´s are all like this:
www.exemple.com/user/login?destination=node/125%23comment-form
What should I do? Add this to robot txt? If so how? Whats the command to add in Google Webmaster?
Thanks in advance!
Pedro Pereira
-
Hi Carly,
It needs to be done to each of the pages. In most cases, this is just a minor change to a single page template. Someone might tell you that you can add an entry to robots.txt to solve the problem, but that won't remove them from the index.
Looking at the links you provided, I'm not convinced you should deindex them all - as these are member profile pages which might have some value in terms of driving organic traffic and having unique content on them. That said I'm not party to how your site works, so this is just an observation.
Hope that helps,
George
-
Hi George,
I am having a similar issue with my site, and was looking for a quick clarification.
We have several "member" pages that have been created as a part of registration (thousands) and they are appearing as duplicate content. When you say add noindex and and a canonical, is this something that needs to be done to every individual page or is there something that can be done that would apply to the thousands of pages at once?
Here are a couple of examples of what the pages look like:
http://loyalty360.org/me/members/8003
http://loyalty360.org/me/members/4641
Thank you!
-
1. If you add just noindex, Google will crawl the page, drop it from the index but it will also crawl the links on that page and potentially index them too. It basically passes equity to links on the page.
2. If you add nofollow, noindex, Google will crawl the page, drop it from the index but it will not crawl the links on that page. So no equity will be passed to them. As already established, Google may still put these links in the index, but it will display the standard "blocked" message for the page description.
If the links are internal, there's no harm in them being followed unless you're opening up the crawl to expose tons of duplicate content that isn't canonicalised.
noindex is often used with nofollow, but sometimes this is simply due to a misunderstanding of what impact they each have.
George
-
Hello,
Thanks for your response. I have learn more which is great
My question is should I add a noindex only to that page or a noidex, nofolow?
Thanks!
-
Yes it's the worst possible scenario that they basically get trapped in SERPs. Google won't then crawl them until you allow the crawling, then set noindex (to remove from SERPS) and then add nofollow,noindex back on to keep them out of SERPs and to stop Google following any links on them.
Configuring URL parameters again is just a directive regarding the crawl and doesn't affect indexing status to the best of my knowledge.
In my experience, noindex is bulletproof but nofollow / robots.txt is very often misunderstood and can lead to a lot of problems as a result. Some SEOs think they can be clever in crafting the flow of PageRank through a site. The unsurprising reality is that Google just does what it wants.
George
-
Hi George,
Thanks for this, It's very interesting... the urls do appear in search results but their descriptions are blocked(!)
Did you try configuring URL parameters in WMT as a solution?
-
Hi Rafal,
The key part of that statement is "we might still find and index information about disallowed URLs...". If you read the next sentence it says: "As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results".
If you look at moz.com/robots.txt you'll see an entry for:
Disallow: /pages/search_results*
But if you search this on Google:
site:moz.com/pages/search_results
You'll find there are 20 results in the index.
I used to agree with you, until I found out the hard way that if Google finds a link, regardless of whether it's in robots.txt or not it can put it in the index and it will remain there until you remove the nofollow restriction and noindex it, or remove it from the index using webmaster tools.
George
-
George,
I went to check with Google to make sure I am correct and I am!
"While Google won't crawl or index the content blocked by
robots.txt
, we might still find and index information about disallowed URLs from other places on the web." Source: https://support.google.com/webmasters/answer/6062608?hl=enYes, he can fix these problems on page but disallowing it in robots will work fine too!
-
Just adding this to robots.txt will not stop the pages being indexed:
Disallow: /*login?
It just means Google won't crawl the links on that page.
I would do one of the following:
1. Add noindex to the page. PR will still be passed to the page but they will no longer appear in SERPs.
2. Add a canonical on the page to: "www.exemple.com/user/login"
You're never going to try and get these pages to rank, so although it's worth fixing I wouldn't lose too much sleep on the impact of having duplicate content on registration pages (unless there are hundreds of them!).
Regards,
George
-
In GWT: Crawl=> URL Parameters => Configure URL Parameters => Add Parameter
Make sure you know what you are doing as it's easy to mess up and have BIG issues.
-
Add this line to your robots.txt to prevent google from indexing these pages:
Disallow: /*login?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content only an Issue on a Huge Scale?
To what extent is duplicate content an issue? We have a support forum with some duplicate content because users ask the same questions. The Moz reports we receive highlights our duplicate content and page title for our support forum as a "big" issue. I'm unsure to what extent it harms our SEO, and making the support section non-crawable would impair our level of support. It would be nice to know for sure if we should be concerned about this, and if yes, how can we do it differently? Thanks, I appreciate you help. -Allan
On-Page Optimization | | Todoist0 -
Similar content multiple pages
I have run in to a situation on an e-commerce store where products from a certain manufacturer require a fairly large chunk of corporate information to be posted underneath the product description: I.E. Trademark information, etc. This information happens to be close to half the size of the product description information. Am I at risk of getting hit negatively for this portion of text duplicated across multiple products? I was considering putting a link to a separate informational page with this information but am not sure if it even matters? What are your recommendations brilliant SEO'erz?
On-Page Optimization | | wishmedia0 -
Events in Wordpress Creating Duplicate Content Canonical Issues
Hi, I have a site which uses Event Manager Pro within Wordpress to create Events (as custom post types on my blog. I use it to advertise cookery classes. In a given month I might run one type of class 4 times. The event page I have made for each class is the same and I duplicate it 4 times and just change the dates to promote it. The problem is with over 10 different classes, which are then duplicated up to 4 times each per month. I get loads of duplicate content errors. How can I fix this without redirecting people away from the correct page for the date they are interested in? Is it best just to use a no follow for ALL events and rely on the other parts of my site for SEO? Thanks, T23
On-Page Optimization | | tekton230 -
E-commerce site product descriptions and duplicate content
Hi everyone. I'm developing an e-commerce site using Prestashop and concerned about the issue of duplicate content among product descriptions. My main concerns are: If there are 500 or more products and those product descriptions are obtained from a manufacturer or supplier's website hence running into external duplicate content issues. Internal duplicate content is also an issue, if there are multiple similar products and each product has the same description across several pages. What would be the best approach to eliminate the possibility of incurring a duplicate content penalty due to similar product descriptions? I've already considered the suggestion of noindex-ing the complete range of products to help protect from duplicate content penalties and having unique articles written in the site blog discussing products instead linking to certain products on the site. Another consideration I had was noindex-ing all product pages except pages for featured products in the store and rewriting descriptions for a set amount of those featured products regularly (this will still have the problem of internal duplicate content across pages if similar product descriptions are rewritten). The product range is intended to be very large so I'm really seeking an alternative solution from the insane task of rewriting many product descriptions. Any suggestions to make SEO work efficient are very much welcome and appreciated. Thank you!
On-Page Optimization | | valuepets0 -
Duplicate Content- Best Practise Usage of the canonical url
Canonical urls stop self competition - from duplicate content. So instead of a 2 pages with a rank of 5 out of 10, it is one page with a rank of 7 out of 10.
On-Page Optimization | | WMA
However what disadvantages come from using canonical urls. For example am I excluding some products like green widet, blue widget. I have a customer with 2 e-commerce websites(selling different manufacturers of a type jewellery). Both websites have massive duplicate content issues.
It is a hosted CMS system with very little SEO functionality, no plugins etc. The crawling report- comes back with 1000 of pages that are duplicates. It seems that almost every page on the website has a duplicate partner or more. The problem starts in that they have 2 categorys for each product type, instead of one category for each product type.
A wholesale category and a small pack category. So I have considered using a canonical url or de-optimizing the small pack category as I believe it receives less traffic than the whole category. On the original website I tried de- optimizing one of the pages that gets less traffic. I did this by changing the order of the meta title(keyword at the back, not front- by using small to start of with). I also removed content from the page. This helped a bit. Or I was thinking about just using a canonical url on the page that gets less traffic.
However what are the implications of this? What happens if some one searches for "small packs" of the product- will this no longer be indexed as a page. The next problem I have is the other 1000s of pages that are showing as duplicates. These are all the different products within the categories. The CMS does not have a front office that allows for canonical urls to be inserted. Instead it would have to be done going into the html of the pages. This would take ages. Another issue is that these product pages are not actually duplicate, but I think it is because they have such little content- that the rodger(seo moz crawler, and probably googles one too) cant tell the difference.
Also even if I did use the canonical url - what happened if people searched for the product by attributes(the variations of each product type)- like blue widget, black widget, brown widget. Would these all be excluded from Googles index.
On the one hand I want to get rid of the duplicate content, but I also want to have these pages included in the search. Perhaps I am taking too idealistic approach- trying to optimize a website for too many keywords. Should I just focus on the category keywords, and forget about product variations. Perhaps I look into Google Analytics, to determine the top landing pages, and which ones should be applied with a canonical. Also this website(hosted CMS) seems to have more duplicate content issues than I have seen with other e-commerce sites that I have applied SEO MOZ to On final related question. The first website has 2 landing pages- I think this is a techical issue. For example www.test.com and www.test.com/index. I realise I should use a canonical url on the page that gets less traffic. How do I determine this? (or should I just use the SEO MOZ Page rank tool?)0 -
Duplicate Content - Potential Issue.
Hello, here we go again, If I write an article somewhere, lets say Squidoo for instance, then post it to my blog on my website will google see this as duplicate content and probably credit Squidoo for it or is there soemthing I can do to prevent this, maybe a linkk back to Squidoo from my website or a dontfollow on my website? Im not sure so any help here would be great, Also If I use other peoples material in my blog and link back to them, obviously I dont want the credit for the original material I am simply collating some of this on my blog for others to have a specific library if you like. Is this going to damage my websites reputation? Thanks again peeps. Craig Fenton IT
On-Page Optimization | | craigyboy0 -
Duplicate Title & Content in WordPress
I'm getting a lot of Crawl Errors due to duplicate content and duplicate title because of category and tag posts in WordPress. I rebuilt the sitemap and said to exclude category and tags, should that clear up the issue? I've also went through and did NO INDEX and NO FOLLOW for all categories and posts. Any thoughts on this issue?
On-Page Optimization | | seantgreen0 -
Duplicate Product BUT Unique Content -- any issues?
We have the situation where a group of products fit into 2 different categories and also serve different purposes (to the customer). Essentially, we want to have the same product duplicated on the site, but with unique content and it would even have a slightly different product name. Some specifications would be redundant, but the core content would be different. Any issues?
On-Page Optimization | | SEOPA1