Removing duplicated content using only the NOINDEX in large scale (80% of the website).
-
Hi everyone,
I am taking care of the large "news" website (500k pages), which got massive hit from Panda because of the duplicated content (70% was syndicated content). I recommended that all syndicated content should be removed and the website should focus on original, high quallity content.
However, this was implemented only partially. All syndicated content is set to NOINDEX (they thing that it is good for user to see standard news + original HQ content). Of course it didn't help at all. No change after months. If I would be Google, I would definitely penalize website that has 80% of the content set to NOINDEX a it is duplicated. I would consider this site "cheating" and not worthy for the user.
What do you think about this "theory"? What would you do?
Thank you for your help!
-
-
it has been almost a year now from the massive hit. after that, there were also some smaller hits
-
we are putting effort into improvements. that is quite frustrating for me, because I believe that our effort is demolished by old duplicated content (that creates 80% of the website :-))
Yeah, we will need to take care about the link-mess...
Thank you! -
-
Yeah, this strategy will be definitely part of the guidelines for the editors.
One last question: do you know some good resources I can use as an inspiration?
Thank you so much..
-
We deleted thousands of pages every few months.
Before deleting anything we identified valuable pages that continued to receive traffic from other websites or from search. These were often updated and kept on the site. Everything else was 301 redirected to the "news homepage" of the site. This was not a news site, it was a very active news section on an industry portal site.
You have set 410 for those pages and remove all internal links to them and google was ok with that?
Our goal was to avoid internal links to pages that were going to be deleted. Our internal "story recommendation" widgets would stop showing links to pages after a certain length of time. Our periodic purges were done after that length of time.
We never used hard coded links in stories to pages that were subject to being abandoned. Instead we simply linked to category pages where something relevant would always be found.
Develop a strategy for internal linking that will reduce site maintenance and focus all internal links to pages that are permanently maintained.
-
Yaikes! Will you guys still pay for it if it's removed? If so, then combining below comments with my thoughts - I'd delete it, since it's old and not time relevant.
-
Yeah, paying ... we actually pay for this content (earlier management decisions :-))
-
EGOL your insights are very appreciated :-)!
I agree with you. Makes total sense.
So you didn't experience any problems removing outdated content (or "content with no traffic value") from your website? You have set 410 for those pages and remove all internal links to them and google was ok with that?
Redirecting useless content - you mean set 301 to the most relevant page that is bringing traffic?
Thank you sir
-
But I still miss the point of paying for the content that is not accessible from SE
- "paying"?
Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?
- correct
-
HI Dimitrii,
thank you very much for your opinion. The idea of canonical links is very interesting. We may try that in the "first" phase. But I still miss the point of paying for the content that is not accessible from SE.
Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?
-
Just seeing the other responses. Agree with what EGOL mentions. A content audit would be even better to see if there was any value at all on those pages (GA traffic, links, etc). Odds are though that there was not any and you already killed all of it with the noindex tag in place.
-
Couple of things here.
-
If a second Panda update has not occurred since the changes that were made then you may not get credit for the noindexed content. I don't think this is "cheating" as with the noindex, it just told Google to take 350K of its pages out of the index. The noindex is one of the best ways to get your content out of Google's index.
-
If you have not spent time improving the non-syndicated content then you are missing the more important part and that is to improve the quality of the content that you have.
A side point to consider here, is your crawl budget. I am assuming that the site still internally links to these 350K pages and so users and bots will go to them and have to process etc. This is mostly a waste of time. As all of these pages are out of Google's index thanks to the noindex tag, why not take out all internal links to those pages (i.e. from sitemaps, paginated index pages, menus, internal content) so that you can have the user and Google focus on the quality content that is left over. I would then also 404/410 all those low quality pages as they are now out of Google's index and not linked internally. Why maintain the content?
-
-
Good point! News gotta be new
-
If there are 500,000 pages of "news" then a lot of that content is "history" instead of "news". Visitors are probably not consuming it. People are probably not searching for it. And actively visited pages on the site are probably not linking to it.
So, I would use analytics to determine if these "history" pages are being viewed, are pulling in much traffic, have very many links, and I would delete and redirect them if they are not important to the site any longer. This decision is best made at the page level.
For "unique content" pages that appear only on my site, I would assess them at regular intervals to determine which ones are pulling in traffic and which ones are not. Some sites place news in folders according to their publication dates and that facilitates inspecting old content for its continued value. These pages can then be abandoned and redirected once their content is stale and not being consumed. Again, this can best be done at the page level.
I used to manage a news section and every few months we would assess, delete and redirect, to keep the weight of the site as low as possible for maximum competitiveness.
-
Hi there.
NOINDEX !== no crawling. and surely it doesn't equal NOFOLLOW. what you probably should be looking at is canonical links.
My understanding is (and i can be completely wrong) that when you get hit by Panda for duplicate content and then try to recover, Google checks your website for the same duplicate content - it's still crawlable, all the links are still "followable", it's still scraped content, you aren't telling crawlers that you took it from somewhere else (by canonicalizing), it's just not displayed in SERPs. And yes, 80% of content being noindex probably doesn't help either.
So, I think that what you need to do is either remove that duplicate content whatsoever, or use canonical links to originals or (bad idea, but would work) block all those links in robots.txt (at least this way those pages will become uncrawlable whatsoever). All this still is unreputable techniques though, kinda like polishing the dirt.
Hope this makes sense.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I stop serious traffic lost on my website
I need help resolving technical SEO issues on my website CamRojud. I have tried allSEO tactics but no improvement yet. Can someone in the forum guide me through please.
White Hat / Black Hat SEO | | Dawodus0 -
How to Increase US traffic for other GEO based website
Can you help me to understand why my traffic from the US is not increasing for landing pages/product pages whereas for our blog it has grown 2X to 3X past 2 quarters? I am afraid that I don't get any right answer for this. Could you or someone help me to discover the answer? Also, what should I do to rank in the US for a particular KW, we rank on 2 positions for KW "Hackathon" in India but for the US the no is 56. I don't understand what to do and the best possible way to rank in the US.
White Hat / Black Hat SEO | | Rajnish_HE0 -
Google URL Shortener- Should I use one or multiple???
I have a client with a number of YouTube videos. I'm using Google URL Shortner to allow the link to show in the YouTube text (as its a long URL). Many of these links go to the same page ex .com/services-page Should I use a single short URL for each video linking to the .com/services-page or should they be unique each time? If unique, would Google possibly think I'm trying to manipulate results? Thanks in advance. I'm just not sure on this one and hope someone knows best practice on this. Thanks!
White Hat / Black Hat SEO | | mgordon1 -
What sources do you use to keep on top of SEO news?
I want to try building an RSS feed of SEO news... but not wanting to find myself drowning in materials As such, looking for a short list of recommendations for keeping on top of SEO developments – the impetus is that I'm still discovering changes that happened 2, 3, even 5 years ago, and I want to try and catch these things as they happen. Thinking something actually from Google may be on the list, but some of these sources are pretty on top of things! Seroundtable.com also comes to mind. But what do you use to keep informed? Thanks 🙂
White Hat / Black Hat SEO | | ntcma1 -
Update: Copied Website
So I discovered a website the other day that is a complete duplicate of ours: justinchina.co.uk This is our website: petmedicalcenter.com . Thanks to help from Erica, I dug in deeper to see why this was happening. It seems that the justinchinca.co.uk which is hosted by GoDaddy has their A Record pointing at our web host. So that being said, our website does not seem to be hacked which is good news. Would this still cause an issue with our Google rankings? Our host, Host Monster said to contact GoDaddy and GoDaddy said that a domain owner can point their URL to anywhere that they choose. Anyway, any feedback would be helpful. Thanks for everyone thus far that has helped me. Brant
White Hat / Black Hat SEO | | BCB11210 -
Website "A Record" in DNS - Geotargetting
Hi, Our online shop is hosted with a French IP address. It is available in English and Spanish. I have noticed, as to be expected, that we get quite a few french visitors, probably related to our IP address Google must think its geo related. We don't want to particularly target any specific country, but more so english and spanish speakers. Can you have various A records around the world to help with this? Any suggestions or things I could look into?? thanks
White Hat / Black Hat SEO | | bjs20100 -
Tricky Decision to make regarding duplicate content (that seems to be working!)
I have a really tricky decision to make concerning one of our clients. Their site to date was developed by someone else. They have a successful eCommerce website, and the strength of their Search Engine performance lies in their product category pages. In their case, a product category is an audience niche: their gender and age. In this hypothetical example my client sells lawnmowers: http://www.example.com/lawnmowers/men/age-34 http://www.example.com/lawnmowers/men/age-33 http://www.example.com/lawnmowers/women/age-25 http://www.example.com/lawnmowers/women/age-3 For all searches pertaining to lawnmowers, the gender of the buyer and their age (for which there are a lot for the 'real' store), these results come up number one for every combination they have a page for. The issue is the specific product pages, which take the form of the following: http://www.example.com/lawnmowers/men/age-34/fancy-blue-lawnmower This same product, with the same content (save a reference to the gender and age on the page) can also be found at a few other gender / age combinations the product is targeted at. For instance: http://www.example.com/lawnmowers/women/age-34/fancy-blue-lawnmower http://www.example.com/lawnmowers/men/age-33/fancy-blue-lawnmower http://www.example.com/lawnmowers/women/age-32/fancy-blue-lawnmower So, duplicate content. As they are currently doing so well I am agonising over this - I dislike viewing the same content on multiple URLs, and though it wasn't a malicious effort on the previous developers part, think it a little dangerous in terms of SEO. On the other hand, if I change it I'll reduce the website size, and severely reduce the number of pages that are contextually relevant to the gender/age category pages. In short, I don't want to sabotage the performance of the category pages, by cutting off all their on-site relevant content. My options as I see them are: Stick with the duplicate content model, but add some unique content to each gender/age page. This will differentiate the product category page content a little. Move products to single distinct URLs. Whilst this could boost individual product SEO performance, this isn't an objective, and it carries the risks I perceive above. What are your thoughts? Many thanks, Tom
White Hat / Black Hat SEO | | SoundinTheory0 -
How do I find out if a competitor is using black hat methods and what can I do about it?
A competitor of mine has appeared out of nowhere with various different websites targetting slightly different keywords but all are in the same industry. They don't have as many links as me, the site structure and code is truly awful (multiple H1's on same page, tables for non-tabular data etc...) yet they outperform mine and many of my other competitors. It's a long story but I know someone who knows the people who run these sites and from what I can gather they are using black hat techniques. But that is all I know and I would like to find out more so I can report them.
White Hat / Black Hat SEO | | kevin11