Do you bother cleaning duplicate content from Googles Index?
-
Hi,
I'm in the process of instructing developers to stop producing duplicate content, however a lot of duplicate content is already in Google's Index and I'm wondering if I should bother getting it removed... I'd appreciate it if you could let me know what you'd do...
For example one 'type' of page is being crawled thousands of times, but it only has 7 instances in the index which don't rank for anything. For this example I'm thinking of just stopping Google from accessing that page 'type'.
Do you think this is right?
Do you normally meta NoIndex,follow the page, wait for the pages to be removed from Google's Index, and then stop the duplicate content from being crawled?
Or do you just stop the pages from being crawled and let Google sort out its own Index in its own time?
Thanks
FashionLux
-
One tricky point - you don't necessarily want to fix the duplicate URLs before you 301-redirect and clear out the index. This is counter-intuitive and throws many people off. If you cut the crawl paths to the bad URLs, then Google will never crawl them and process the 301-redirects (since those exist on the page level). Same is try for canonical tags. Clear out the duplicates first, THEN clean up the paths. I know it sounds weird, but it's important.
For malformed URLs and usability, you could still dynamically 301-redirect. In most cases, those bad URLs shouldn't get indexed, because they have no crawl path in your site. Someone would have to link to them. Google will never mis-type, in other words.
-
Hi Highland/Dr Pete,
My apologies I wasn't very clear - fixing the duplicate problem... or rather stopping our site from generating further duplicate content isn't an issue at all, I'm going to instruct our developers to stop generating dupe content by doing things like no longer passing variables in the URL's (mysite.com/page2?previouspage=page1).
However the problem is that for a lot of instances duplicate URL's work and they need to work - for example if a user types in the URL but gets one character wrong ('1q' rather than '1') then from a usability perspective its the correct thing to serve the content they wanted. You don't want to make the user have to stop, figure out what they did wrong and redo it - not when you can make it work seamlessly.
My question relates to 'once my site is no longer generating unnecessary duplicate content, what should I to do about the duplicate pages that have already made their way into the Index?' and you have both answered the question very well, thank you.
I can manually set-up 301 redirects for all of the duplicate pages that I find in the index, once they disappear from the index I can probably remove those 301's. I was thinking of going down the noindex meta tag route which is harder to develop.
Thanks guys
FashionLux
-
I DO NOT believe in letting Google sort it out - they don't do it well, and, since Panda (and really even before), they basically penalize sites for their inability to sort out duplicates. I think it's very important to manage your index.
Unfortunately, how to do that can be very complex and depends a lot on the situation. Highland's covered the big ones, but the details can get messy. I wrote a mega-post about it:
http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world
Without giving URLs, can you give us a sense of what kind of duplicates they are (or maybe some generic URL examples)?
-
Your options are
- De-index the duplicate pages yourself and save yourself the crawl budget
- 301 the duplicates to the pages you want to keep (preferred)
- Canonical the duplicate pages, which lets you pick which page remains in the index. The duplicate pages will still be crawled, however.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My url disappeared from Google but Search Console shows indexed. This url has been indexed for more than a year. Please help!
Super weird problem that I can't solve for last 5 hours. One of my urls: https://www.dcacar.com/lax-car-service.html Has been indexed for more than a year and also has an AMP version, few hours ago I realized that it had disappeared from serps. We were ranking on page 1 for several key terms. When I perform a search "site:dcacar.com " the url is no where to be found on all 5 pages. But when I check my Google Console it shows as indexed I requested to index again but nothing changed. All other 50 or so urls are not effected at all, this is the only url that has gone missing can someone solve this mystery for me please. Thanks a lot in advance.
Intermediate & Advanced SEO | | Davit19850 -
No Index thousands of thin content pages?
Hello all! I'm working on a site that features a service marketed to community leaders that allows the citizens of that community log 311 type issues such as potholes, broken streetlights, etc. The "marketing" front of the site is 10-12 pages of content to be optimized for the community leader searchers however, as you can imagine there are thousands and thousands of pages of one or two line complaints such as, "There is a pothole on Main St. and 3rd." These complaint pages are not about the service, and I'm thinking not helpful to my end goal of gaining awareness of the service through search for the community leaders. Community leaders are searching for "311 request service", not "potholes on main street". Should all of these "complaint" pages be NOINDEX'd? What if there are a number of quality links pointing to the complaint pages? Do I have to worry about losing Domain Authority if I do NOINDEX them? Thanks for any input. Ken
Intermediate & Advanced SEO | | KenSchaefer0 -
Why isn't Google indexing this site?
Hello, Moz Community My client's site hasn't been indexed by Google, although it was launched a couple of months ago. I've ran down the check points in this article https://moz.com/ugc/8-reasons-why-your-site-might-not-get-indexed without finding a reason why. Any sharp SEO-eyes out there who can spot this quickly? The url is: http://www.oldermann.no/ Thank you
Intermediate & Advanced SEO | | Inevo
INEVO, digital agency0 -
When does Google index a fetched page?
I have seen where it will index on of my pages within 5 minutes of fetching, but have also read that it can take a day. I'm on day #2 and it appears that it has still not re-indexed 15 pages that I fetched. I changed the meta-description in all of them, and added content to nearly all of them, but none of those changes are showing when I do a site:www.site/page I'm trying to test changes in this manner, so it is important for me to know WHEN a fetched page has been indexed, or at least IF it has. How can I tell what is going on?
Intermediate & Advanced SEO | | friendoffood0 -
Could this be seen as duplicate content in Google's eyes?
Hi I'm an in-house SEO and we've recently seen Panda related traffic loss along with some of our main keywords slipping down the SERPs. Looking for possible Panda related issues I was wondering if the following could be seen as duplicate content. We've got some very similar holidays (travel company) on our website. While they are different I'm concerned it may be seen as creating content that is too similar: http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays/the-wildlife-and-beaches-of-kenya.aspx http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays/ultimate-kenya-wildlife-and-beaches.aspx http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays/wildlife-and-beach-family-safari.aspx They do all have unique text but as you can see from the titles, they are very similar (note from an SEO point of view the tabbed content is all within the same page at source level). At the top level of the holiday pages we have a filtered search:
Intermediate & Advanced SEO | | KateWaite
http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays.aspx These pages have a unique introduction but the content snippets being pulled into the boxes is drawn from each of the individual holiday pages. I'm just concerned that these could be introducing some duplicating issues. Any thoughts?0 -
Scraping / Duplicate Content Question
Hi All, I understanding the way to protect content such as a feature rich article is to create authorship by linking to your Google+ account. My Question
Intermediate & Advanced SEO | | Mark_Ch
You have created a webpage that is informative but not worthy to be an article, hence no need create authorship in Google+
If a competitor comes along and steals this content word for word, something similar, creates their own Google+ page, can you be penalised? Is there any way to protect yourself without authorship and Google+? Regards Mark0 -
Duplicate page content errors stemming from CMS
Hello! We've recently relaunched (and completely restructured) our website. All looks well except for some duplicate content issues. Our internal CMS (custom) adds a /content/ to each page. Our development team has also set-up URLs to work without /content/. Is there a way I can tell Google that these are the same pages. I looked into the parameters tool, but that seemed more in-line with ecommerce and the like. Am I missing anything else?
Intermediate & Advanced SEO | | taylor.craig0 -
Why the archive sub pages are still indexed by Google?
Why the archive sub pages are still indexed by Google? I am using the WordPress SEO by Yoast, and selected the needed option to get these pages no-index in order to avoid the duplicate content.
Intermediate & Advanced SEO | | MichaelNewman1