Removing duplicated content using only the NOINDEX in large scale (80% of the website).
-
Hi everyone,
I am taking care of the large "news" website (500k pages), which got massive hit from Panda because of the duplicated content (70% was syndicated content). I recommended that all syndicated content should be removed and the website should focus on original, high quallity content.
However, this was implemented only partially. All syndicated content is set to NOINDEX (they thing that it is good for user to see standard news + original HQ content). Of course it didn't help at all. No change after months. If I would be Google, I would definitely penalize website that has 80% of the content set to NOINDEX a it is duplicated. I would consider this site "cheating" and not worthy for the user.
What do you think about this "theory"? What would you do?
Thank you for your help!
-
-
it has been almost a year now from the massive hit. after that, there were also some smaller hits
-
we are putting effort into improvements. that is quite frustrating for me, because I believe that our effort is demolished by old duplicated content (that creates 80% of the website :-))
Yeah, we will need to take care about the link-mess...
Thank you! -
-
Yeah, this strategy will be definitely part of the guidelines for the editors.
One last question: do you know some good resources I can use as an inspiration?
Thank you so much..
-
We deleted thousands of pages every few months.
Before deleting anything we identified valuable pages that continued to receive traffic from other websites or from search. These were often updated and kept on the site. Everything else was 301 redirected to the "news homepage" of the site. This was not a news site, it was a very active news section on an industry portal site.
You have set 410 for those pages and remove all internal links to them and google was ok with that?
Our goal was to avoid internal links to pages that were going to be deleted. Our internal "story recommendation" widgets would stop showing links to pages after a certain length of time. Our periodic purges were done after that length of time.
We never used hard coded links in stories to pages that were subject to being abandoned. Instead we simply linked to category pages where something relevant would always be found.
Develop a strategy for internal linking that will reduce site maintenance and focus all internal links to pages that are permanently maintained.
-
Yaikes! Will you guys still pay for it if it's removed? If so, then combining below comments with my thoughts - I'd delete it, since it's old and not time relevant.
-
Yeah, paying ... we actually pay for this content (earlier management decisions :-))
-
EGOL your insights are very appreciated :-)!
I agree with you. Makes total sense.
So you didn't experience any problems removing outdated content (or "content with no traffic value") from your website? You have set 410 for those pages and remove all internal links to them and google was ok with that?
Redirecting useless content - you mean set 301 to the most relevant page that is bringing traffic?
Thank you sir
-
But I still miss the point of paying for the content that is not accessible from SE
- "paying"?
Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?
- correct
-
HI Dimitrii,
thank you very much for your opinion. The idea of canonical links is very interesting. We may try that in the "first" phase. But I still miss the point of paying for the content that is not accessible from SE.
Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?
-
Just seeing the other responses. Agree with what EGOL mentions. A content audit would be even better to see if there was any value at all on those pages (GA traffic, links, etc). Odds are though that there was not any and you already killed all of it with the noindex tag in place.
-
Couple of things here.
-
If a second Panda update has not occurred since the changes that were made then you may not get credit for the noindexed content. I don't think this is "cheating" as with the noindex, it just told Google to take 350K of its pages out of the index. The noindex is one of the best ways to get your content out of Google's index.
-
If you have not spent time improving the non-syndicated content then you are missing the more important part and that is to improve the quality of the content that you have.
A side point to consider here, is your crawl budget. I am assuming that the site still internally links to these 350K pages and so users and bots will go to them and have to process etc. This is mostly a waste of time. As all of these pages are out of Google's index thanks to the noindex tag, why not take out all internal links to those pages (i.e. from sitemaps, paginated index pages, menus, internal content) so that you can have the user and Google focus on the quality content that is left over. I would then also 404/410 all those low quality pages as they are now out of Google's index and not linked internally. Why maintain the content?
-
-
Good point! News gotta be new
-
If there are 500,000 pages of "news" then a lot of that content is "history" instead of "news". Visitors are probably not consuming it. People are probably not searching for it. And actively visited pages on the site are probably not linking to it.
So, I would use analytics to determine if these "history" pages are being viewed, are pulling in much traffic, have very many links, and I would delete and redirect them if they are not important to the site any longer. This decision is best made at the page level.
For "unique content" pages that appear only on my site, I would assess them at regular intervals to determine which ones are pulling in traffic and which ones are not. Some sites place news in folders according to their publication dates and that facilitates inspecting old content for its continued value. These pages can then be abandoned and redirected once their content is stale and not being consumed. Again, this can best be done at the page level.
I used to manage a news section and every few months we would assess, delete and redirect, to keep the weight of the site as low as possible for maximum competitiveness.
-
Hi there.
NOINDEX !== no crawling. and surely it doesn't equal NOFOLLOW. what you probably should be looking at is canonical links.
My understanding is (and i can be completely wrong) that when you get hit by Panda for duplicate content and then try to recover, Google checks your website for the same duplicate content - it's still crawlable, all the links are still "followable", it's still scraped content, you aren't telling crawlers that you took it from somewhere else (by canonicalizing), it's just not displayed in SERPs. And yes, 80% of content being noindex probably doesn't help either.
So, I think that what you need to do is either remove that duplicate content whatsoever, or use canonical links to originals or (bad idea, but would work) block all those links in robots.txt (at least this way those pages will become uncrawlable whatsoever). All this still is unreputable techniques though, kinda like polishing the dirt.
Hope this makes sense.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Technical : Duplicate content and domain name change
Hi guys, So, this is a tricky one. My server team just made quite a big mistake :We are a big We are a big magento ecommerce website, selling well, with about 6000 products. And we are about to change our domaine name for administrative reasons. Let's call the current site : current.com and the future one : future.com Right, here is the issue Connecting to the search console, I saw future.com sending 11.000 links to current.com. At the same time DA was hit by 7 points. I realized future.com was uncorrectly redirected and showed a duplicated site or current.com. We corrected this, and future.com now shows a landing page until we make the domain name change. I was wondering what is the best way to avoid the penalty now and what can be the consequences when changing domain name. Should I set an alias on search console or something ? Thanks
White Hat / Black Hat SEO | | Kepass0 -
Spammy Website Framed My Site & Stole Rankings
Hi One of the pages of my website was starting to gather some real traction in Google rankings and hit 3500 visitors per day. This is the page: http://www.naturallivingideas.com/drinking-apple-cider-vinegar-benefits/ On 11th January search traffic to this page fell to virtually 0. The rest of the site rankings were unaffected. Yesterday I tried searching for some of the main keywords I was ranking for and instead of my search listing, this was appearing: Image: http://www.naturallivingideas.com/wp-content/uploads/2017/01/weird-rankings.jpg It is in the exact position I was ranking in, but instead this evn . moe site has stolen my ranking. Upon opening the website, it is simply my original article page in an iframe. If you look at the source code of the offending website, you will see what I mean. Hopefully now you are getting a 403 forbidden error as my host blocked referrals from that site but they still hold my rankings. Has anyone ever seen this before? How was this done? And how can I get my ranking back? Thanks in advance, James
White Hat / Black Hat SEO | | JamesPenn0 -
How do i check content is fresh or duplicate?
Hello there, As per google we need Fresh content For our website, i have content writer, but if i want to check it is duplicate before Submitting any where , Then How can i check ?? please any body let me know. Thanks,
White Hat / Black Hat SEO | | poojaverify060 -
Is there any SEO impact to using "www" vs. non-"www" preferred domain name?
My client has been using "www" with his domain and before I took over, has used it in marketing etc. I typically don't use "www" in my wordpress setup, and set non-www as the preferred domain in google analytics and google search console. Does it make any difference? Especially when www resolves to non-www? I appreciate some guidance with this.
White Hat / Black Hat SEO | | chill9860 -
Which Large Guest Blog Network is Cutts talking about?
Did you guys see Cutts Last tweet? Today we took action on a large guest blog network. A reminder about the spam risks of guest blogging: goo.gl/cnkoFA Matt Cutts (@mattcutts) Which Large Guest Blog Network is he talking about?
White Hat / Black Hat SEO | | Felip30 -
Some pages of my website http://goo.gl/1vGZv stopped crawling in Google
hi , i have 5 years old website and some page of my website http://goo.gl/1vGZv stopped indexing in Google . I have asked Google webmaster to remove low quality link via disavow tool . What to do ?
White Hat / Black Hat SEO | | unitedworld0 -
What happens when content on your website (and blog) is an exact match to multiple sites?
In general, I understand that having duplicate content on your website is a bad thing. But I see a lot of small businesses (specifically dentists in this example) who hire the same company to provide content to their site. They end up with the EXACT same content as other dentists. Here is a good example: http://www.hodnettortho.com/blog/2013/02/valentine’s-day-and-your-teeth-2/ http://www.braces2000.com/blog/2013/02/valentine’s-day-and-your-teeth-2/ http://www.gentledentalak.com/blog/2013/02/valentine’s-day-and-your-teeth/ If you google the title of that blog article you find tons of the same article all over the place. So, overall, doesn't this make the content on these blogs irrelevant? Does this hurt the SEO on these sites at all? What is the value of having completely unique content on your site/blog vs having duplicate content like this?
White Hat / Black Hat SEO | | MorganPorter0 -
Website Vulnerability Leading to Doorway Page Spam. Need Help.
Keywords he is ranking for , houston dwi lawyer, houston dwi attorney and etc.. Client was acquired in June and since then we have done nothing but build high quality links to the website. None of our clients were dropped/dinged or impacted by the panda/penguin updates in 2012 or updates previously published via Google. Which proves we do quality SEO work. We went ahead and started duplicating links which worked for other legal clients and 5 months later this client is either dropping or staying in local maps results and we are performing very badly in organic results. Some more history..... When he first engaged our company we switched his website from a CMS called plone to word press. During our move I ran some searches to figure out which pages we needed to 301 and we came across many profile pages or member pages created on the clients CMS (PLONE). These pages were very spammy and linked to other plone sites using car model,make,year type keywords (ex:jeep cherokee dealerships). I went through these sites to see if they were linking back and could not find any back links to my clients website. Obviously nobody authorized these pages, they all looked very hackish and it seemed as though there was a vulnerability on his plone CMS installation which nobody caught. Fast forward 5 months and the newest OSE update is showing me a good 50+ back links with unrelated anchor text back links. These anchor text links are the same color as the background and can only be found if you hover your mouse over certain areas of the site. All of these sites are built on Plone and allot of them are linked to other businesses or community websites. These websites obviously have no clue they have been hacked or are being used for black hat purposes. There are dozens of unrelated anchor text links being used on external websites which are pointing back to our clients website. Examples: <a class="clickable title link-pivot" title="See top linking pages that use this anchor text">autex Isuzu, </a><a class="clickable title link-pivot" title="See top linking pages that use this anchor text">Toyota service department ratings, </a><a class="clickable title link-pivot" style="color: #5e5e5e; font-family: Helvetica, Arial, sans-serif; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;" title="See top linking pages that use this anchor text">die cast BMW and etc..</a> Obviously the first step is to use the disavow link tool, which will be completed this week. The second step is to take some feedback from the SEO community. It seems like these pages are automatically created using some type of bot. It will be very tedious if we have to continually remove these links. I hope there is a way to notify Google that these websites are all plone and have a vulnerability, which black hats are using to harm the innocent... If i cannot get Google to handle this, then the only other option is to start fresh with a new domain name. What would you do in this situation. Your help is greatly appreciated. Thank you
White Hat / Black Hat SEO | | waqid0