Removing duplicated content using only the NOINDEX in large scale (80% of the website).
-
Hi everyone,
I am taking care of the large "news" website (500k pages), which got massive hit from Panda because of the duplicated content (70% was syndicated content). I recommended that all syndicated content should be removed and the website should focus on original, high quallity content.
However, this was implemented only partially. All syndicated content is set to NOINDEX (they thing that it is good for user to see standard news + original HQ content). Of course it didn't help at all. No change after months. If I would be Google, I would definitely penalize website that has 80% of the content set to NOINDEX a it is duplicated. I would consider this site "cheating" and not worthy for the user.
What do you think about this "theory"? What would you do?
Thank you for your help!
-
-
it has been almost a year now from the massive hit. after that, there were also some smaller hits
-
we are putting effort into improvements. that is quite frustrating for me, because I believe that our effort is demolished by old duplicated content (that creates 80% of the website :-))
Yeah, we will need to take care about the link-mess...
Thank you! -
-
Yeah, this strategy will be definitely part of the guidelines for the editors.
One last question: do you know some good resources I can use as an inspiration?
Thank you so much..
-
We deleted thousands of pages every few months.
Before deleting anything we identified valuable pages that continued to receive traffic from other websites or from search. These were often updated and kept on the site. Everything else was 301 redirected to the "news homepage" of the site. This was not a news site, it was a very active news section on an industry portal site.
You have set 410 for those pages and remove all internal links to them and google was ok with that?
Our goal was to avoid internal links to pages that were going to be deleted. Our internal "story recommendation" widgets would stop showing links to pages after a certain length of time. Our periodic purges were done after that length of time.
We never used hard coded links in stories to pages that were subject to being abandoned. Instead we simply linked to category pages where something relevant would always be found.
Develop a strategy for internal linking that will reduce site maintenance and focus all internal links to pages that are permanently maintained.
-
Yaikes! Will you guys still pay for it if it's removed? If so, then combining below comments with my thoughts - I'd delete it, since it's old and not time relevant.
-
Yeah, paying ... we actually pay for this content (earlier management decisions :-))
-
EGOL your insights are very appreciated :-)!
I agree with you. Makes total sense.
So you didn't experience any problems removing outdated content (or "content with no traffic value") from your website? You have set 410 for those pages and remove all internal links to them and google was ok with that?
Redirecting useless content - you mean set 301 to the most relevant page that is bringing traffic?
Thank you sir
-
But I still miss the point of paying for the content that is not accessible from SE
- "paying"?
Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?
- correct
-
HI Dimitrii,
thank you very much for your opinion. The idea of canonical links is very interesting. We may try that in the "first" phase. But I still miss the point of paying for the content that is not accessible from SE.
Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?
-
Just seeing the other responses. Agree with what EGOL mentions. A content audit would be even better to see if there was any value at all on those pages (GA traffic, links, etc). Odds are though that there was not any and you already killed all of it with the noindex tag in place.
-
Couple of things here.
-
If a second Panda update has not occurred since the changes that were made then you may not get credit for the noindexed content. I don't think this is "cheating" as with the noindex, it just told Google to take 350K of its pages out of the index. The noindex is one of the best ways to get your content out of Google's index.
-
If you have not spent time improving the non-syndicated content then you are missing the more important part and that is to improve the quality of the content that you have.
A side point to consider here, is your crawl budget. I am assuming that the site still internally links to these 350K pages and so users and bots will go to them and have to process etc. This is mostly a waste of time. As all of these pages are out of Google's index thanks to the noindex tag, why not take out all internal links to those pages (i.e. from sitemaps, paginated index pages, menus, internal content) so that you can have the user and Google focus on the quality content that is left over. I would then also 404/410 all those low quality pages as they are now out of Google's index and not linked internally. Why maintain the content?
-
-
Good point! News gotta be new
-
If there are 500,000 pages of "news" then a lot of that content is "history" instead of "news". Visitors are probably not consuming it. People are probably not searching for it. And actively visited pages on the site are probably not linking to it.
So, I would use analytics to determine if these "history" pages are being viewed, are pulling in much traffic, have very many links, and I would delete and redirect them if they are not important to the site any longer. This decision is best made at the page level.
For "unique content" pages that appear only on my site, I would assess them at regular intervals to determine which ones are pulling in traffic and which ones are not. Some sites place news in folders according to their publication dates and that facilitates inspecting old content for its continued value. These pages can then be abandoned and redirected once their content is stale and not being consumed. Again, this can best be done at the page level.
I used to manage a news section and every few months we would assess, delete and redirect, to keep the weight of the site as low as possible for maximum competitiveness.
-
Hi there.
NOINDEX !== no crawling. and surely it doesn't equal NOFOLLOW. what you probably should be looking at is canonical links.
My understanding is (and i can be completely wrong) that when you get hit by Panda for duplicate content and then try to recover, Google checks your website for the same duplicate content - it's still crawlable, all the links are still "followable", it's still scraped content, you aren't telling crawlers that you took it from somewhere else (by canonicalizing), it's just not displayed in SERPs. And yes, 80% of content being noindex probably doesn't help either.
So, I think that what you need to do is either remove that duplicate content whatsoever, or use canonical links to originals or (bad idea, but would work) block all those links in robots.txt (at least this way those pages will become uncrawlable whatsoever). All this still is unreputable techniques though, kinda like polishing the dirt.
Hope this makes sense.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moving content form Non-performing site to performing site - wihtout 301 Redirection
I have 2 different websites: one have good amount of traffic and another have No Traffic at all. I have a website that has lots of valuable content But no traffic. And I want to move the content of non-performing site to performing site. (Don't want to redirect) My only concern is duplicate content. I was thinking of setting the pages to "noindex" on the original website and wait until they don't appear in Google's index. Then I'd move them over to the performing domain to be indexed again. So, I was wondering If it will create any copied content issue or not? What should i have to take care of when I am going to move content from one site to another?
White Hat / Black Hat SEO | | HuptechWebseo0 -
Hacked Websites (Doorways) Ranking First Page of Google
Hello Moz community! I could really use your help with some suggestions here with some recent changes I've noticed in the Google serps for terms I've been currently working on. Currently one of the projects I am working on is for an online pharmacy and noticed that the SERPs are being now taken up by hacked websites which look like doorways to 301 redirect to an online pharmacy the hacker wants the traffic to go to. Seems like they may be wordpress sites that are hacked and have unrelated content on their websites compared to online pharmacies. We've submitted these issues as spam to Google and within chrome as well but haven't heard back. When searching terms like "Canadian Pharmacy Viagra" and other similar terms we see this issue. Any other recommendations on how we can fix this issue? Thanks for your time and attached is a screenshot of the results we are seeing for one of our searches. 1Orus
White Hat / Black Hat SEO | | monarkg0 -
Why website isn't showing on results?
Hello Moz! Just got a quick question - we have a clientcalled and for some reason they just aren't showing up in the search results. It's not a new domain and hasn't been penalised (or has reason for penalty). All the content is fresh and has no bad back links to the site. It is a new website and has been indexed by Google but for even for branded search terms, it just doesn't show up anywhere on page 1 (i think page 4). Any help or advise is great appreciated is it's doing my head in. We are using www.google.com.au. Kindest Regards
White Hat / Black Hat SEO | | kymodo0 -
20-30% of our ecommerce categories contain no extra content, could this be a problem
Hello, About 20-30% of our ecommerce categories have no content beyond the products that are in them. Could this be a problem with Panda? Thanks!
White Hat / Black Hat SEO | | BobGW0 -
Content within a toggle, Juice or No Juice?
Greetings Mozzers, I recently added a significant amount of information within a single page utilizing toggles to hide the content from a user and for them to see it they must click to reveal. Since technically the code is reading "display:none" to start, would that be considered "Black Hat" or "Not There" to crawlers? It isn't displayed in any sort of spammy way. It is more for the UX of the visitor that toggles were utilized. Thoughts and advice is greatly appreciated!
White Hat / Black Hat SEO | | MonsterWeb280 -
Finding and Removing bad backlinks
Ok here goes. Over the past 2 years our traffic and rankings have slowly declined, most importantly, for keywords that we ranked #1 and #2 at for years. With the new Penguin updates this year, we never saw a huge drop but a constant slow loss. My boss has tasked me with cleaning up our bad links and reshaping our link profile so that it is cleaner and more natural. I currently have access to Google Analytics and Webmaster Tools, SEOMoz, and Link Builder. 1)What is the best program or process for identifying bad backlinks? What exactly am I looking for? Too many links from one domain? Links from Low PR or low “Trust URL” sites? I have gotten conflicting information reading about all this on the net, with some saying that too many good links(high PR) can be unnatural without some lower level PR links, so I just want to make sure that I am not asking for links to be removed that we need to create or maintain our link profile. 2)What is the best program or process for viewing our link profile and what exactly am I looking for? What constitutes a healthy link profile after the new google algorithm updates? What is the best way to change it? 3)Where do I start with this task? Remove spammy links first or figure out or profile first and then go after bad links? 4)We have some backlinks that are to our old .aspx that we moved to our new platform 2 years ago, there are quite a few (1000+). Some of these pages were redirected and some the redirects were broken at some point. Is there any residual juice in these backlinks still? Should we fix the broken redirects, or does it do nothing? My boss says the redirects wont do anything now that google no longer indexes the old pages but other people have said differently. Whats the deal should we still fix the redirects even though the pages are no longer indexed? I really appreciate any advice as basically if we cant get our site and sales turned around, my job is at stake. Our site is www.k9electronics.com if you want to take a look. We just moved hosts so there are some redirect issues and other things going on we know about.
White Hat / Black Hat SEO | | k9byron0 -
How to Not Scrap Content, but still Being a Hub
Hello Seomoz members. I'm relatively new to SEO, so please forgive me if my questions are a little basic. One of the sites I manage is GoldSilver.com. We sell gold and silver coins and bars, but we also have a very important news aspect to our site. For about 2-3 years now we have been a major hub as a gold and silver news aggregator. At 1.5 years ago (before we knew much about SEO), we switched from linking to the original news site to scraping their content and putting it on our site. The chief reason for this was users would click outbound to read an article, see an ad for a competitor, then buy elsewhere. We were trying to avoid this (a relatively stupid decision with hindsight). We have realized that the Search Engines are penalizing us, which I don't blame them for, for having this scraped content on our site. So I'm trying to figure out how to move forward from here. We would like to remain a hub for news related to Gold and Silver and not be penalized by SEs, but we also need to sell bullion and would like to avoid loosing clients to competitors through ads on the news articles. One of the solutions we are thinking about is perhaps using an iFrame to display the original url, but within our experience. An example is how trap.it does this (see attached picture). This way we can still control the experience some what, but are still remaining a hub. Thoughts? Thank you, nick 3dLVv
White Hat / Black Hat SEO | | nwright0 -
Does your website get downgraded if you link to a lower quality site?
My site has a pr of 4. My friends site has a pr of 2 but I think that he is doing some black hat seo techniques. I wanted to know whether the search engines would ding me for linking to (i.e., validating) a lower quality site.
White Hat / Black Hat SEO | | jamesjd70