Removing duplicated content using only the NOINDEX in large scale (80% of the website).
-
Hi everyone,
I am taking care of the large "news" website (500k pages), which got massive hit from Panda because of the duplicated content (70% was syndicated content). I recommended that all syndicated content should be removed and the website should focus on original, high quallity content.
However, this was implemented only partially. All syndicated content is set to NOINDEX (they thing that it is good for user to see standard news + original HQ content). Of course it didn't help at all. No change after months. If I would be Google, I would definitely penalize website that has 80% of the content set to NOINDEX a it is duplicated. I would consider this site "cheating" and not worthy for the user.
What do you think about this "theory"? What would you do?
Thank you for your help!
-
-
it has been almost a year now from the massive hit. after that, there were also some smaller hits
-
we are putting effort into improvements. that is quite frustrating for me, because I believe that our effort is demolished by old duplicated content (that creates 80% of the website :-))
Yeah, we will need to take care about the link-mess...
Thank you! -
-
Yeah, this strategy will be definitely part of the guidelines for the editors.
One last question: do you know some good resources I can use as an inspiration?
Thank you so much..
-
We deleted thousands of pages every few months.
Before deleting anything we identified valuable pages that continued to receive traffic from other websites or from search. These were often updated and kept on the site. Everything else was 301 redirected to the "news homepage" of the site. This was not a news site, it was a very active news section on an industry portal site.
You have set 410 for those pages and remove all internal links to them and google was ok with that?
Our goal was to avoid internal links to pages that were going to be deleted. Our internal "story recommendation" widgets would stop showing links to pages after a certain length of time. Our periodic purges were done after that length of time.
We never used hard coded links in stories to pages that were subject to being abandoned. Instead we simply linked to category pages where something relevant would always be found.
Develop a strategy for internal linking that will reduce site maintenance and focus all internal links to pages that are permanently maintained.
-
Yaikes! Will you guys still pay for it if it's removed? If so, then combining below comments with my thoughts - I'd delete it, since it's old and not time relevant.
-
Yeah, paying ... we actually pay for this content (earlier management decisions :-))
-
EGOL your insights are very appreciated :-)!
I agree with you. Makes total sense.
So you didn't experience any problems removing outdated content (or "content with no traffic value") from your website? You have set 410 for those pages and remove all internal links to them and google was ok with that?
Redirecting useless content - you mean set 301 to the most relevant page that is bringing traffic?
Thank you sir
-
But I still miss the point of paying for the content that is not accessible from SE
- "paying"?
Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?
- correct
-
HI Dimitrii,
thank you very much for your opinion. The idea of canonical links is very interesting. We may try that in the "first" phase. But I still miss the point of paying for the content that is not accessible from SE.
Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?
-
Just seeing the other responses. Agree with what EGOL mentions. A content audit would be even better to see if there was any value at all on those pages (GA traffic, links, etc). Odds are though that there was not any and you already killed all of it with the noindex tag in place.
-
Couple of things here.
-
If a second Panda update has not occurred since the changes that were made then you may not get credit for the noindexed content. I don't think this is "cheating" as with the noindex, it just told Google to take 350K of its pages out of the index. The noindex is one of the best ways to get your content out of Google's index.
-
If you have not spent time improving the non-syndicated content then you are missing the more important part and that is to improve the quality of the content that you have.
A side point to consider here, is your crawl budget. I am assuming that the site still internally links to these 350K pages and so users and bots will go to them and have to process etc. This is mostly a waste of time. As all of these pages are out of Google's index thanks to the noindex tag, why not take out all internal links to those pages (i.e. from sitemaps, paginated index pages, menus, internal content) so that you can have the user and Google focus on the quality content that is left over. I would then also 404/410 all those low quality pages as they are now out of Google's index and not linked internally. Why maintain the content?
-
-
Good point! News gotta be new
-
If there are 500,000 pages of "news" then a lot of that content is "history" instead of "news". Visitors are probably not consuming it. People are probably not searching for it. And actively visited pages on the site are probably not linking to it.
So, I would use analytics to determine if these "history" pages are being viewed, are pulling in much traffic, have very many links, and I would delete and redirect them if they are not important to the site any longer. This decision is best made at the page level.
For "unique content" pages that appear only on my site, I would assess them at regular intervals to determine which ones are pulling in traffic and which ones are not. Some sites place news in folders according to their publication dates and that facilitates inspecting old content for its continued value. These pages can then be abandoned and redirected once their content is stale and not being consumed. Again, this can best be done at the page level.
I used to manage a news section and every few months we would assess, delete and redirect, to keep the weight of the site as low as possible for maximum competitiveness.
-
Hi there.
NOINDEX !== no crawling. and surely it doesn't equal NOFOLLOW. what you probably should be looking at is canonical links.
My understanding is (and i can be completely wrong) that when you get hit by Panda for duplicate content and then try to recover, Google checks your website for the same duplicate content - it's still crawlable, all the links are still "followable", it's still scraped content, you aren't telling crawlers that you took it from somewhere else (by canonicalizing), it's just not displayed in SERPs. And yes, 80% of content being noindex probably doesn't help either.
So, I think that what you need to do is either remove that duplicate content whatsoever, or use canonical links to originals or (bad idea, but would work) block all those links in robots.txt (at least this way those pages will become uncrawlable whatsoever). All this still is unreputable techniques though, kinda like polishing the dirt.
Hope this makes sense.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Changing Links to Spans with Robots.txt Blocked Redirects using Linkify/jQuery
Hi, I was recently penalized most likely because Google started following javascript links to bad neighborhoods that were not no-followed. The first thing I did was remove the Linkify plugin from my site so that all those links would disappear, but now I think I have a solution that works with Linkify without creating crawlable links. I did the following: I blocked access to the Linkify scripts using robots.txt so that Google won't execute the scripts that create the links. This has worked for me in the past with banner ads linking to other sites of mine. At least it appears to work because those sites did not get links from pages running those banners in search console. I created a /redirect/ directory that redirects all offsite URLs. I put a robots.txt block on this directory. I configured the Linkify plugin to parse URLs into span elements instead of a elements and add no follow attributes. They still have an href attribute, but the URLs in the href now point to the redirect directory and the span onclick event redirects the user. I have implemented this solution on another site of mine and I am hoping this will make it impossible for Google to categorize my pages as liking to any neighborhoods good or bad. Most of the content is UGC, so this should discourage link spam while giving users clickable URLs and still letting people post complaints about people that have profiles on adult websites. Here is a page where the solution has been implemented https://cyberbullyingreport.com/bully/predators-watch-owner-scott-breitenstein-of-dayton-ohio-5463.aspx, the Linkify plugin can be found at https://soapbox.github.io/linkifyjs/, and the custom jQuery is as follows: jQuery(document).ready(function ($) { 2 $('p').linkify({ tagName: 'span', attributes: { rel: 'nofollow' }, formatHref: function (href) { href = 'https://cyberbullyingreport.com/redirect/?url=' + href; return href; }, events:{ click: function (e) { var href = $(this).attr('href'); window.location.href = href; } } }); 3 });
White Hat / Black Hat SEO | | STDCarriers0 -
Our forum links are redirecting to high spammy & NSFW sites: Any impact on main website?
Hi all, We have a discussion forum like subdomain.website.com. Some spammers have created many links with our subdomain URL which are redirecting to high spammy and NSFW sites (Not sure how they did). We are trying to stop the redirects. So far many visitors and bots have recorded visits to these spammy sites with our URL. Will this impact our website anyhow ? I noticed that our website spam score has been increased and not sure if this is coincidental or penalized. Ranking even dropped without manual actions. I wonder how much of this subdomain activity will impact main website? Please advise.
White Hat / Black Hat SEO | | vtmoz1 -
Question RE: Links in Headers, Footers, Content, and Navigation
This question is regarding this Whiteboard Friday from October 2017 (https://moz.com/blog/links-headers-footers-navigation-impact-seo). Sorry that I am a little late to the party, but I wanted to see if someone could help out. So, in theory, if header links matter less than in-content links, and links lower on the page have their anchor text value stripped from them, is there any point of linking to an asset in the content that is also in the header other than for user experience (which I understand should be paramount)? Just want to be clear.Also, if in-content links are better than header links, than hypothetically an industry would want to find ways to organically link to landing pages rather than including that landing page in the header, no? Again, this is just for a Google link equity perspective, not a user experience perspective, just trying to wrap my head around the lesson. links-headers-footers-navigation-impact-seo
White Hat / Black Hat SEO | | 3VE0 -
International Website Targeting
Hello fellow Mozzers, had a quick question. So we have a new eCommerce client that is interested in launching a website in multiple countries. According to their vision, they want a US site, UK site, Japan site, etc and so on. I have a few concerns about doing it this way. First, there is the issue with the sites being the same. They only difference will be that they have a different domain, such as domain.co.jp for the Japan-based site, domain.co.uk for UK, etc. Even if we target different countries in webmaster, won't the sites still compete with one another and potentially get tagged as duplicates? I'm thinking there has to be a better way to have a site targeted at the world, without having to clone and duplicate and relaunch. Anyone have experience with this?
White Hat / Black Hat SEO | | David-Kley0 -
Can I use content from an existing site that is not up anymore?
I want to take down a current website and create a new site or two (with new url, ip, server). Can I use the content from the deleted site on the new sites since I own it? How will Google see that?
White Hat / Black Hat SEO | | RoxBrock0 -
How do you change the 6 links under your website in Google?
Hello everyone, I have no idea how to ask this question, so I'm going to give it a shot and hopefully someone can help me!! My company is called Eteach, so when you type in Eteach into Google, we come in the top position (phew!) but there are 6 links that appear underneath it (I've added a picture to show what I mean). How do you change these links?? I don't even know what to call them, so if there is a particular name for these then please let me know! They seem to be an organic rank rather than PPC...but if I'm wrong then do correct me! Thanks! zorIsxH.jpg
White Hat / Black Hat SEO | | Eteach_Marketing0 -
What EMD Meta Title should we use and what about getting links to the same C-Block IP?
Situation: Recently I encountered two problems with both internal and external SEO for my company websites.
White Hat / Black Hat SEO | | TT_Vakantiehuizen
This Dutch company has four websites on one server. Three closely related EMD(Exact Match Domain) websites and one overarching website. (Holiday homes rental websites) Vakantiehuizen-Verhuur.nl (overarching)
Vakantiehuizen-Frankrijk.nl (EMD)
Vakantiehuizen-Italie.nl (EMD)
Vakantiehuizen-Spanje.nl (EMD) Question 1:
What would be a preferable Meta Title for the EMD websites (homepage/subpages)? Keep in mind that the domains are EMD. The homepage will target the most important keywords and should not compete with subpages. Options for the homepage:
1. Vakantiehuizen Frankrijk | Alle vakantiehuizen in Frankrijk op een rij!
2. Vakantiehuizen Frankrijk | Vakantiehuizen-Frankrijk.nl onderdeel van Vakantiehuizen-Verhuur.nl
3. Suggestions? Options for the subpages:
1. Vakantiehuis Normandie | Vakantiehuizen Frankrijk
2. Vakantiehuis Normandie | Vakantiehuizen-Frankrijk.nl
3. Suggestions? And concerning the keywords in the beginning; is it wise to use both plural and singular terms in the meta title? For Example:
Hotel New York. Best hotels in New York | Company Name Question 2: Many SEOs state that getting (too many) links from the same C-Block IP is bad practice and should be avoided. Is this also applicable if one website links out to different websites with the same C-Block IP? Thus, website A, B and C (on the same server) link to website D (different server) could be seen as spam but is this the same when website D links to website A, B and C?0 -
Is using twiends.com to get twitter followers considered black hatting?
Hi, I've been struggling to get followers on Google Plus and Twitter, and recently stumbled upon twiends.com. It offers an easy service that allows you to get twitter followers very quickly. Is this considered black hating? Even if Google doesn't consider the followers as valid, am I likely to be punished if using their service? Even if it doesn't help rankings, it is nice to have lots of followers so that they will see my tweets which has the potential to drive more traffic to my site, and give awareness to my business. What are your thoughts?
White Hat / Black Hat SEO | | eugenecomputergeeks0