Blocked by robots
-
my client GWT has a number of notices for "blocked by meta-robots" - these are all either blog posts/categories/or tags
his former seo told him this: "We've activated following settings:
- Use noindex for Categories
- Use noindex for Archives
- Use noindex for Tag Archives
to reduce keyword stuffing & duplicate post tags
Disabling all 3 noindex settings above may remove google blocks but also will send too many similar tags, post archives/category. "is this guy correct?
what would be the problem with indexing these?
am i correct in thinking they should be indexed?
thanks
-
As far as the upgrading of php on a server - this is for a different client, I seem to recall?
I would have a real problem with a developer saying they weren't going to upgrade because it might break things. Of course it might break things, but there are industry-standard approaches to dealing with this
For example, create a duplicate version of the site on a server instance that is using the newer version of php, and do a full Quality Assurance analysis on the dev site to find and fix anything that has issues with the new php version. Then deploy back to the live site with the php upgrade.
This is standard operating procedure and is necessary because there will come a time when any older server software will no longer be supported and therefore becomes a security risk as it will be unpatched. Planning for these kinds of upgrades should be included in any website operational plan.
Also, their solution to move WordPress to a subdomain is no protection whatsoever for the fact they have an extremely vulnerable, version.
First, the site is just as vulnerable to being hacked again as it is still unpatched. Being on a subdomain has no effect on this. Also, they have ruined the SEO value of that blog by moving it to a subdomain instead of fixing the issue and keeping it as a subdirectory of the prime site. And depending on the type of vulnerability exploited, it may still be possible for a hacker to get into the server via the vulnerable WP, then traverse from the subdomain to the prime site and cause harm there as well.
In the short term, if there truly aren't resources to properly do QA (Quality Assurance) on a dev site running an updated version of PHP, the alternative would be to move the WordPress install to it's own server or VPS running a current version of PHP, upgrade it and security patch it, then use a reverse proxy setup to have it show up as blog.domain.com (or even move it back to domain,com/blog).
This would at least allow for a properly secured WordPress that could also use current and new plugins. This would, however be at the expense of a slightly more complicated setup of the reverse proxy.
Hope that answers your question?
Paul
-
Sorry, Erik - I didn't' forget about you, but was dealing with an ethical dilemma.
Unfortunately, the business of the site you're dealing with is so completely against the terms of service of the Search Engines and against what I believe to be good, sustainable SEO, that I've decided I can't, in good conscience, do anything to help them.
Sorry this leaves you no assistance, but I would suggest strongly you not rely heavily on this client for ongoing revenues. They are just begging to get hammered by Google, if that's not what's happening already.
Paul
-
i'm happy for all the help so i'm not complaining here but i think you forgot about me paul.
also i need to know why my client is so adamant about not wanting to upgrade his php from 5.1.6 to 5..2.4 saying it could hinder his sites overall functionality. any idea why?
i want to update his WP to newest version and it requires php to be updated so we are running old plugins and old WP - his blog was hacked so his webguys moved the location from site.com/blog to blog.site.com
i feel handcuffed - no reason to run WP if you cant use plugins right?
-
Sorry I missed this, Erik. Happy to have a look in the next day or two.
Paul
-
First, to be clear, the Webmaster Tools notifications are just that. Google isn't indicating any kind of a problem, Erik. It's just declaring what it has found in the site's robot.txt file.
There's no way to give a definitive answer without seeing the actual website structure, but in general, it is VERY common and good practice to no-index the categories and tags on CMS-based websites. Usually, you want some form of the archives to be indexed, but it's usually the individual pages that are most important. (e.g. not date-based archives.)
The problem with allowing all of these to be indexed is that to a search engine, they will all look like duplicate content of other pages on the website. This will cause the search engine crawler to have to work much harder to find all the content on your website, and ad a result may quit part way though.
In addition,much of the content it finds it will consider to be duplicative of other pages on the website, and therefore will have a hard time knowing which version is actually the most valuable result to return. And as a result will split the authority of each of the pages, making them MUCH harder to rank.
This is a standard challenge of any CMS based website, because they display the same content organized by what are referred to as different taxonomies (different ways of categorizing or linking the same information).
Again, without seeing the actual site I can't say for sure, but short answer is that those three directives are very common for CMS- based websites and are very likely correct.
Hope that helps?
Paul
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocking certain countries via IP address location
We are a US based company that ships only to US and Canada. We've had two issues arise recently from foreign countries (Russia namely) that caused us to block access to our site from anyone attempting to interact with our store from outside of the US and Canada. 1. The first issue we encountered were fraudulent orders originating from Russia (using stolen card data) and then shipping to a US based International shipping aggregator. 2. The second issue was a consistent flow of Russian based "new customer" entries. My question to the MOZ community is this: are their any unintended consequences, from an SEO perspective, to blocking the viewing of our store from certain countries.
Technical SEO | | MNKid150 -
2 sitemaps on my robots.txt?
Hi, I thought that I just could link one sitemap from my site's robots.txt but... I may be wrong. So, I need to confirm if this kind of implementation is right or wrong: robots.txt for Magento Community and Enterprise ...
Technical SEO | | Webicultors
Sitemap: http://www.mysite.es/media/sitemap/es.xml
Sitemap: http://www.mysite.pt/media/sitemap/pt.xml Thanks in advance,0 -
How to block text on a page to be indexed?
I would like to block the spider indexing a block of text inside a page , however I do not want to block the whole page with, for example , a noindex tag. I have tried already with a tag like this : chocolate pudding chocolate pudding However this is not working for my case, a travel related website. thanks in advance for your support. Best regards Gianluca
Technical SEO | | CharmingGuy0 -
Robots.txt questions...
All, My site is rather complicated, but I will try to break down my question as simply as possible. I have a robots.txt document in the root level of my site to disallow robot access to /_system/, my CMS. This looks like this: # /robots.txt file for http://webcrawler.com/
Technical SEO | | Horizon
# mail webmaster@webcrawler.com for constructive criticism **User-agent: ***
Disallow: /_system/ I have another robots.txt file in another level down, which is my holiday database - www.mysite.com/holiday-database/ - this is to disallow access to /holiday-database/ControlPanel/, my database CMS. This looks like this: **User-agent: ***
Disallow: /ControlPanel/ Am I correct in thinking that this file must also be in the root level, and not in the /holiday-database/ level? If so, should my new robots.txt file look like this: # /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism **User-agent: ***
Disallow: /_system/
Disallow: /holiday-database/ControlPanel/ Or, like this: # /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism **User-agent: ***
Disallow: /_system/
Disallow: /ControlPanel/ Thanks in advance. Matt0 -
Linking C Class Blocks Problem
Hi 🙂 I've just discovered that my client, who has a medical practice, has created a series of micro sites about their doctors (around 10 or so). The problem is that they're on a shared host with the same C-class, providing no real link benefit at all. Would it be best to: A) Look for seperate C class hosts for each site & migrate B) Recreate the pages on the main site & 301 all doctor micro sites to new pages C) Leave as is and pursue other link building activites? Has anyone run into a similar issue before? Thanks a bunch! Woj
Technical SEO | | wojkwasi0 -
How to Block Urls with specific components from Googlebot
Hello, I have around 100,000 Error pages showing in Google Webmaster Tools. I want to block specific components like com_fireboard, com_seyret,com_profiler etc. Few examples: http://www.toycollector.com/videos/generatersslinks/index.php?option=com_fireboard&Itemid=824&func=view&catid=123&id=16494 http://www.toycollector.com/index.php?option=com_content&view=article&id=6932:tomica-limited-nissan-skyline-r34--nissan-skyline-gt-r-r34-vspec&catid=231&Itemid=634 I tried blocking using robots.txt. Just used this Disallow: /com_fireboard/
Technical SEO | | TheMartingale
Disallow: /com_seyret/ But its not working. Can anyone suggest me to solve this problem. Many Thanks Shradda0 -
Is it terrible to not have robots.txt ?
I was under the impression that you really should have a robots.txt page, and not having one is pretty bad. However, hubspot (which I'm not impressed with) does not have the capability of properly implementing one. Will this hurt the site?
Technical SEO | | StandUpCubicles1