Blocked by robots
-
my client GWT has a number of notices for "blocked by meta-robots" - these are all either blog posts/categories/or tags
his former seo told him this: "We've activated following settings:
- Use noindex for Categories
- Use noindex for Archives
- Use noindex for Tag Archives
to reduce keyword stuffing & duplicate post tags
Disabling all 3 noindex settings above may remove google blocks but also will send too many similar tags, post archives/category. "is this guy correct?
what would be the problem with indexing these?
am i correct in thinking they should be indexed?
thanks
-
As far as the upgrading of php on a server - this is for a different client, I seem to recall?
I would have a real problem with a developer saying they weren't going to upgrade because it might break things. Of course it might break things, but there are industry-standard approaches to dealing with this
For example, create a duplicate version of the site on a server instance that is using the newer version of php, and do a full Quality Assurance analysis on the dev site to find and fix anything that has issues with the new php version. Then deploy back to the live site with the php upgrade.
This is standard operating procedure and is necessary because there will come a time when any older server software will no longer be supported and therefore becomes a security risk as it will be unpatched. Planning for these kinds of upgrades should be included in any website operational plan.
Also, their solution to move WordPress to a subdomain is no protection whatsoever for the fact they have an extremely vulnerable, version.
First, the site is just as vulnerable to being hacked again as it is still unpatched. Being on a subdomain has no effect on this. Also, they have ruined the SEO value of that blog by moving it to a subdomain instead of fixing the issue and keeping it as a subdirectory of the prime site. And depending on the type of vulnerability exploited, it may still be possible for a hacker to get into the server via the vulnerable WP, then traverse from the subdomain to the prime site and cause harm there as well.
In the short term, if there truly aren't resources to properly do QA (Quality Assurance) on a dev site running an updated version of PHP, the alternative would be to move the WordPress install to it's own server or VPS running a current version of PHP, upgrade it and security patch it, then use a reverse proxy setup to have it show up as blog.domain.com (or even move it back to domain,com/blog).
This would at least allow for a properly secured WordPress that could also use current and new plugins. This would, however be at the expense of a slightly more complicated setup of the reverse proxy.
Hope that answers your question?
Paul
-
Sorry, Erik - I didn't' forget about you, but was dealing with an ethical dilemma.
Unfortunately, the business of the site you're dealing with is so completely against the terms of service of the Search Engines and against what I believe to be good, sustainable SEO, that I've decided I can't, in good conscience, do anything to help them.
Sorry this leaves you no assistance, but I would suggest strongly you not rely heavily on this client for ongoing revenues. They are just begging to get hammered by Google, if that's not what's happening already.
Paul
-
i'm happy for all the help so i'm not complaining here but i think you forgot about me paul.
also i need to know why my client is so adamant about not wanting to upgrade his php from 5.1.6 to 5..2.4 saying it could hinder his sites overall functionality. any idea why?
i want to update his WP to newest version and it requires php to be updated so we are running old plugins and old WP - his blog was hacked so his webguys moved the location from site.com/blog to blog.site.com
i feel handcuffed - no reason to run WP if you cant use plugins right?
-
Sorry I missed this, Erik. Happy to have a look in the next day or two.
Paul
-
First, to be clear, the Webmaster Tools notifications are just that. Google isn't indicating any kind of a problem, Erik. It's just declaring what it has found in the site's robot.txt file.
There's no way to give a definitive answer without seeing the actual website structure, but in general, it is VERY common and good practice to no-index the categories and tags on CMS-based websites. Usually, you want some form of the archives to be indexed, but it's usually the individual pages that are most important. (e.g. not date-based archives.)
The problem with allowing all of these to be indexed is that to a search engine, they will all look like duplicate content of other pages on the website. This will cause the search engine crawler to have to work much harder to find all the content on your website, and ad a result may quit part way though.
In addition,much of the content it finds it will consider to be duplicative of other pages on the website, and therefore will have a hard time knowing which version is actually the most valuable result to return. And as a result will split the authority of each of the pages, making them MUCH harder to rank.
This is a standard challenge of any CMS based website, because they display the same content organized by what are referred to as different taxonomies (different ways of categorizing or linking the same information).
Again, without seeing the actual site I can't say for sure, but short answer is that those three directives are very common for CMS- based websites and are very likely correct.
Hope that helps?
Paul
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to stop robots.txt restricting access to sitemap?
I'm working on a site right now and having an issue with the robots.txt file restricting access to the sitemap - with no web dev to help, I'm wondering how I can fix the issue myself? The robots.txt page shows User-agent: * Disallow: / And then sitemap: with the correct sitemap link
Technical SEO | | Ad-Rank0 -
IIS 7.5 - Duplicate Content and Totally Wrong robot.txt
Well here goes! My very first post to SEOmoz. I have two clients that are hosted by the same hosting company. Both sites have major duplicate content issues and appear to have no internal links. I have checked this both here with our awesome SEOmoz Tools and with the IIS SEO Tool Kit. After much waiting I have heard back from the hosting company and they say that they have "implemented redirects in IIS7.5 to avoid duplicate content" based on the following article: http://blog.whitesites.com/How-to-setup-301-Redirects-in-IIS-7-for-good-SEO__634569104292703828_blog.htm. In my mind this article covers things better: www.seomoz.org/blog/what-every-seo-should-know-about-iis. What do you guys think? Next issue, both clients (as well as other sites hosted by this company) have a robot.txt file that is not their own. It appears that they have taken one client's robot.txt file and used it as a template for other client sites. I could be wrong but I believe this is causing the internal links to not be indexed. There is also a site map, again not for each client, but rather for the client that the original robot.txt file was created for. Again any input on this would be great. I have asked that the files just be deleted but that has not occurred yet. Sorry for the messy post...I'm at the hospital waiting to pick up my bro and could be called to get him any minute. Thanks so much, Tiff
Technical SEO | | TiffenyPapuc0 -
Can I rely on just robots.txt
We have a test version of a clients web site on a separate server before it goes onto the live server. Some code from the test site has some how managed to get Google to index the test site which isn't great! Would simply adding a robots text file to the root of test simply blocking all be good enough or will i have to put the meta tags for no index and no follow etc on all pages on the test site also?
Technical SEO | | spiralsites0 -
Site blocked by robots.txt and 301 redirected still in SERPs
I have a vanity URL domain that 301 redirects to my main site. That domain does have a robots.txt to disallow the entire site as well. However, for a branded enough search that vanity domain still shows up in SERPs and has the new Google message of: A description for this result is not available because of this site's robots.txt I get why the message is there - that's not my , my question is shouldn't a 301 redirect trump this domain showing in SERPs, ever? Client isn't happy about it showing at all. How can I get the vanity domain out of the SERPs? THANKS in advance!
Technical SEO | | VMLYRDiscoverability0 -
Blocking https from being crawled
I have an ecommerce site where https is being crawled for some pages. Wondering if the below solution will fix the issue www.example.com will be my domain In the nav there is a login page www.example.com/login which is redirecting to the https://www.example.com/login If I just disallowed /login in the robots file wouldn't it not follow the redirect and index that stuff? The redirect part is what I am questioning.
Technical SEO | | Sean_Dawes0 -
Blocked by meta-robots but there is no robots file
OK, I'm a little frustred here. I've waited a week for the next weekly index to take place after changing the privacy setting in a wordpress website so Google can index, but I still got the same problem. Blocked by meta-robots, no index, no follow. But I do not see a robot file anywhere and the privacy setting in this Wordpress site is set to allow search engines to index this site. Website is www.marketalert.ca What am I missing here? Why can't I index the rest of the website and is there a faster way to test this rather than wait another week just to find out it didn't work again?
Technical SEO | | Twinbytes0 -
Is it terrible to not have robots.txt ?
I was under the impression that you really should have a robots.txt page, and not having one is pretty bad. However, hubspot (which I'm not impressed with) does not have the capability of properly implementing one. Will this hurt the site?
Technical SEO | | StandUpCubicles1 -
Robot.txt pattern matching
Hola fellow SEO peoples! Site: http://www.sierratradingpost.com robot: http://www.sierratradingpost.com/robots.txt Please see the following line: Disallow: /keycodebypid~* We are trying to block URLs like this: http://www.sierratradingpost.com/keycodebypid~8855/for-the-home~d~3/kitchen~d~24/ but we still find them in the Google index. 1. we are not sure if we need to specify the robot to use pattern matching. 2. we are not sure if the format is correct. Should we use Disallow: /keycodebypid*/ or /*keycodebypid/ or even /*keycodebypid~/? What is even more confusing is that the meta robot command line says "noindex" - yet they still show up. <meta name="robots" content="noindex, follow, noarchive" /> Thank you!
Technical SEO | | STPseo0