Panda Updates - robots.txt or noindex?
-
Hi,
I have a site that I believe has been impacted by the recent Panda updates. Assuming that Google has crawled and indexed several thousand pages that are essentially the same and the site has now passed the threshold to be picked out by the Panda update, what is the best way to proceed?
Is it enough to block the pages from being crawled in the future using robots.txt, or would I need to remove the pages from the index using the meta noindex tag? Of course if I block the URLs with robots.txt then Googlebot won't be able to access the page in order to see the noindex tag.
Anyone have and previous experiences of doing something similar?
Thanks very much.
-
This is a good read. http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world I think you should be careful with robot.txt because blocking access to the bot will not cause them to remove the content from their index. They will simply include a message saying not quite sure what's on this page.. I would use noindex to clear out the index first before attempting robot.txt exclusion.
-
Yes, both because if a page is linked to on another site google with spider that other site and follow your link without hitting the robots.txt and the page could get indexed if there is not a noindex on it.
-
Indeed try both.
Irving +1
-
both. block the lowest quality lowest traffic pages with nodindex and block the folder in robots.txt
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I noindex WooCommerce subcategories?
What's the best practice these days for handling indexing of WooCommerce product subcategories? Example: in the sitemap we have:
Intermediate & Advanced SEO | | btetrault
/product-category-a/
/product-category-a/subcategory-1/
/product-category-a/subcategory-2/
etc. Should the /subcategory-*/ be noindexed, canonical to parent, or stay as indexed? Thanks!2 -
Browser caching for SEO - Hinders updates
Hi Guys Just seeking some advise. We know Google is very keen about site speed and the one of the best ways to manage this is to cache images use CDNs etc.. However what I am finding is that we have rapid site speed but any new updates take a few refreshes or we have to wait for the ISP to clear their DNS for the updates to show. I have put the meta tag for non-caching and on cPanel I have developer mode active on the caching sessions which in theory will not store anything in the cache for 6 hours. Does anyone know of anything else which can force a wordpress site on an update or image/post/page or datasbase for the browser to be flushed? I think possibly this will only be good as other users browsers may have similar issues.
Intermediate & Advanced SEO | | Cocoonfxmedia0 -
Noindex search pages?
Is it best to noindex search results pages, exclude them using robots.txt, or both?
Intermediate & Advanced SEO | | YairSpolter0 -
Panda 4.0 Update Affected Site - What should be a the minimum Code to Text Ratio we should aim for ?
Hi All, My eCommerce site got hit badly with the Panda 4.0 update so we have been doing some site auditing and analysis identifying issues which need addressing. We have thin/duplicate issues which I am quite sure was part of the reason we were affected by this even though we use rel=next and rel=prev along with having a separate view all page although we don't concanical tag to this page as I dont' think users would benefit from seeing to many items on one page. This led me to look at our Code to Content Ratio. We have now managed to increase it from 9% to approx 18-22% on popular pages by getting rid of unnecessary code etc. My question is , is there an ideal percentage the code to content ratio should be ?.. and what should I be aiming for ? Also any other Panda 4.0 advice would also be appreciated thanks Sarah
Intermediate & Advanced SEO | | SarahCollins0 -
Robots.txt: how to exclude sub-directories correctly?
Hello here, I am trying to figure out the correct way to tell SEs to crawls this: http://www.mysite.com/directory/ But not this: http://www.mysite.com/directory/sub-directory/ or this: http://www.mysite.com/directory/sub-directory2/sub-directory/... But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way: disallow: /directory/sub-directory/ disallow: /directory/sub-directory2/ disallow: /directory/sub-directory/sub-directory/ disallow: /directory/sub-directory2/subdirectory/ etc... I would end up having thousands of definitions to disallow all the possible sub-directory combinations. So, is the following way a correct, better and shorter way to define what I want above: allow: /directory/$ disallow: /directory/* Would the above work? Any thoughts are very welcome! Thank you in advance. Best, Fab.
Intermediate & Advanced SEO | | fablau1 -
Accidental Noindex/Mis-Canonicalisation - Please help!
Hi everybody, I was hoping somebody might be able to help as this is an issue my team and I have never come across before. A client of ours recently migrated to a new site design. 301 redirects were properly implemented and the transition was fairly smooth. However, we realised soon after that a sub-section of pages had either one or both of the following errors: They featured a canonical tag pointing to the wrong page They featured the 'meta noindex' tag After realising this, both the canonicals and the noindex tags were immediately removed. However, Google crawled the site while these were in place and the pages subsequently dropped out of Google's index. We re-submitted the affected pages to Google's index and used WMT to 'Fetch' the pages as Google. We have also since 'allowed' the pages in the robots.txt file as an extra measure. We found that the pages which just had the noindex tag were immediately re-indexed, while the pages which featured the noindex tag and which were mis-canonicalised are still not being re-indexed. Can anyone think of a reason why this might be the case? One of the pages which featured both tags was one of our most important organic landing pages, so we're eager to resolve this. Any help or advice would be appreciated. Thanks!
Intermediate & Advanced SEO | | robmarsden0 -
Were small sites hit by Panda?
It seems that primarily large sites were hit by Panda, but does any one know of / own a small site that was hit by Panda?
Intermediate & Advanced SEO | | nicole.healthline0 -
How can scraper sites be successful post Panda?
I read this article on SEJ: http://www.searchenginejournal.com/scrapers-and-the-panda-update/34192/ And, I'm a bit confused as to how a scraper site can be successful post Panda? Didn't panda specifically target sites that have duplicate content & shouldn't scraper sites actually be suffering?
Intermediate & Advanced SEO | | nicole.healthline0