Duplicate Page content | What to do?
-
Hello Guys,
I have some duplicate pages detected by MOZ. Most of the URL´s are from a registracion process for users, so the URL´s are all like this:
www.exemple.com/user/login?destination=node/125%23comment-form
What should I do? Add this to robot txt? If so how? Whats the command to add in Google Webmaster?
Thanks in advance!
Pedro Pereira
-
Hi Carly,
It needs to be done to each of the pages. In most cases, this is just a minor change to a single page template. Someone might tell you that you can add an entry to robots.txt to solve the problem, but that won't remove them from the index.
Looking at the links you provided, I'm not convinced you should deindex them all - as these are member profile pages which might have some value in terms of driving organic traffic and having unique content on them. That said I'm not party to how your site works, so this is just an observation.
Hope that helps,
George
-
Hi George,
I am having a similar issue with my site, and was looking for a quick clarification.
We have several "member" pages that have been created as a part of registration (thousands) and they are appearing as duplicate content. When you say add noindex and and a canonical, is this something that needs to be done to every individual page or is there something that can be done that would apply to the thousands of pages at once?
Here are a couple of examples of what the pages look like:
http://loyalty360.org/me/members/8003
http://loyalty360.org/me/members/4641
Thank you!
-
1. If you add just noindex, Google will crawl the page, drop it from the index but it will also crawl the links on that page and potentially index them too. It basically passes equity to links on the page.
2. If you add nofollow, noindex, Google will crawl the page, drop it from the index but it will not crawl the links on that page. So no equity will be passed to them. As already established, Google may still put these links in the index, but it will display the standard "blocked" message for the page description.
If the links are internal, there's no harm in them being followed unless you're opening up the crawl to expose tons of duplicate content that isn't canonicalised.
noindex is often used with nofollow, but sometimes this is simply due to a misunderstanding of what impact they each have.
George
-
Hello,
Thanks for your response. I have learn more which is great
My question is should I add a noindex only to that page or a noidex, nofolow?
Thanks!
-
Yes it's the worst possible scenario that they basically get trapped in SERPs. Google won't then crawl them until you allow the crawling, then set noindex (to remove from SERPS) and then add nofollow,noindex back on to keep them out of SERPs and to stop Google following any links on them.
Configuring URL parameters again is just a directive regarding the crawl and doesn't affect indexing status to the best of my knowledge.
In my experience, noindex is bulletproof but nofollow / robots.txt is very often misunderstood and can lead to a lot of problems as a result. Some SEOs think they can be clever in crafting the flow of PageRank through a site. The unsurprising reality is that Google just does what it wants.
George
-
Hi George,
Thanks for this, It's very interesting... the urls do appear in search results but their descriptions are blocked(!)
Did you try configuring URL parameters in WMT as a solution?
-
Hi Rafal,
The key part of that statement is "we might still find and index information about disallowed URLs...". If you read the next sentence it says: "As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results".
If you look at moz.com/robots.txt you'll see an entry for:
Disallow: /pages/search_results*
But if you search this on Google:
site:moz.com/pages/search_results
You'll find there are 20 results in the index.
I used to agree with you, until I found out the hard way that if Google finds a link, regardless of whether it's in robots.txt or not it can put it in the index and it will remain there until you remove the nofollow restriction and noindex it, or remove it from the index using webmaster tools.
George
-
George,
I went to check with Google to make sure I am correct and I am!
"While Google won't crawl or index the content blocked by
robots.txt
, we might still find and index information about disallowed URLs from other places on the web." Source: https://support.google.com/webmasters/answer/6062608?hl=enYes, he can fix these problems on page but disallowing it in robots will work fine too!
-
Just adding this to robots.txt will not stop the pages being indexed:
Disallow: /*login?
It just means Google won't crawl the links on that page.
I would do one of the following:
1. Add noindex to the page. PR will still be passed to the page but they will no longer appear in SERPs.
2. Add a canonical on the page to: "www.exemple.com/user/login"
You're never going to try and get these pages to rank, so although it's worth fixing I wouldn't lose too much sleep on the impact of having duplicate content on registration pages (unless there are hundreds of them!).
Regards,
George
-
In GWT: Crawl=> URL Parameters => Configure URL Parameters => Add Parameter
Make sure you know what you are doing as it's easy to mess up and have BIG issues.
-
Add this line to your robots.txt to prevent google from indexing these pages:
Disallow: /*login?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Regarding internal duplicate content
Suppose two of my webpages from the same site are having 30% to 35% common content. The reason behind this common content is that I put same data and images (in the main content area) since both pages are partially related. But, title tag, meta description, h1 tag, urls are different.
On-Page Optimization | | b.me
My questions are Can Google consider it as duplicate content?
Can it hamper the ranking of my pages ?
How can I deal with it?0 -
Which page to rank for a Keyword? Home Page or Deep Page?
So, we have a situation where there is one particular keyword we want to rank for. We have been up and down over the years, at our best probably position 4-5, and now at 20ish. Thats for our home page of course, which the majority of our linking is probably pointing at. We also have a sub page which is optimised for that particular service. The term is "web design brisbane".
On-Page Optimization | | MauriceKintek
So as you can imagine, Web Design is in itself a service and we offer others. Should we optimise our home page for it and remove the sub page?
Keep the sub page because its one our services and optimise both?
Do some kind of canonical thing?
Change our interlinking? All our competitors home pages seem to be the ones that rank, and it feels and looks better in results if its the home page, but if switching up to our sub page is better im all ears. Also if our sub page is somehow hurting or leaking SEO from the home page, id like to know as well. Would prefer to not have to provide a link, due to competition but if someone wants to know we can always PM.0 -
Duplicate Content on Category Pages
Hi Everyone, I have a few category pages within a category for my eCommerce store and I've recently started writing a short description for each. However a lot of these paragraphs can be replicated for the same category. For instance '1 Inch thickness' I'll show all the information, and it'll be very similar to '2 inch thickness' but obviously one is 1 inch and one is 2 inch so I would only be changing one keyword and that is the thickness. I feel that this is helping customers because it has all the information in each category e.g. how to filter your choices. But it might be duplicate content. What would you recommend?
On-Page Optimization | | EcomLkwd0 -
Index Page Content
Mozers, I am of the believe and as a person who puts the utmost emphasis on the index page of any website I am trying to rank, especially with a new domain ... insuring content is relevant, structured, optimized and we have some link juice flowing in. I find once we get the index page ranked, Google's little bots then start to index and rank accordingly the rest of the website ... and we start producing results. We also develop websites (dare I say its where we expertise in) and unexpectantly the client has asked us to carry out SEO work additionally to their web development. Problem lies here, their index page, has absolutely no written content at all, just one large image with a logo (Fashion Website) ...Which I identify as a huge issue as per my explanation is paragraphs one or two. I am sure withe the many more qualified SEO experts and gurus within the SEOmoz community, you have also come across this issue So a few questions, if you don't mind adding advice. 1 - Am I putting too much emphasize on content within the index page, in terms of indexing and actually ranking ...yes I appreciate that terms within the website will be ranked against other pages other than the index page, but will it harm us for having no content at all within the index page 2 - If so, and yes is the answer to above, how do we handle it, we have spoke with the client and he is pretty adamant that he want the index page as is, he has been through out the whole website building process. As suggested, any advice would be really appreciated, its a difficult market to rank within a it is, and i can only see this index page making the task a lot more difficult Cheers John
On-Page Optimization | | Johnny4B0 -
Duplicate content because of content scrapping - please help
We manage brands websites in a very competitive industry that have thousands of affiliate links We see that more and more websites (mainly affiliates websites) are scrapping our brand websites content and it generate many duplicate content (but most of them link to us back with an affiliate link). Our brand websites still rank for any sentence in brackets you search in Google, Will this duplicate content hurt our brand websites ? If yes, should we take some preventive actions ? We are not able to add ongoing UGC or additional text to all our duplicate content and trying to stop those websites of stealing our content is like playing cat and mouse... Thanks for your advices
On-Page Optimization | | Tit0 -
Title tags in duplicate pages
hi there, we have a new ecommerce platform which has just been deployed, and I've been asked to tidy up the onpage SEO. we have employed canonicals across the category and product pages and we now have a nice set of unique product pages my question is - do we need to create the title tags in all of the duplicate non-canonical pages eg www.mysite.com/niceproduct.html (canonical) www.mysite.com/acategory/niceproduct.html (duplicate) Can we leave the duplicate title tag empty and not worry about it, or should we put in a duplicate of the canonical title tag hope the question makes sense! thanks in advance for all help
On-Page Optimization | | k3nn3dy30 -
Duplicate Page Content and Duplicate Page Title
Hi All, I'm new in SEOMoz and have some questions after I have already spend 2-3 days trying to resolve the problems identified from Crawling one of my clients websites. I get quite a lot of Duplicate Page Conntent and Page Titles warnings and trying to find a workaround through the forums and posts. I continuously get this error on most of my pages: URL: http://domain.com/benefits with the same Page but with a WWW in front URL: http://www.domain.com/benefits Any advice will be highly appreciated. Thanks, Athos
On-Page Optimization | | athosk0 -
Will a "no follow" "no index" meta tag resolve duplicate content issue?
I have a duplicate content issue. If the page has already been indexed will a no follow no index tag resolve the issue or do I also need a rel canonical statement?
On-Page Optimization | | McKeeMarketing0