Wordpress error
-
On our Google Webmaster Tools I'm getting a Severe Health Warning regarding our Robot.txt file reading:
User-agent: *
Crawl-delay: 20User-agent: 008
Disallow: /I'm wondering how I can fix this and stop it happening again.
The site was hacked about 4 months ago but I thought we'd managed to clear things up.
Colin
-
This will be my first post on SEOmoz so bear with me
The way I understand it is that robots read the robots.txt file from top to bottom, and once they find a rule that applies to them they stop reading and begin crawling. So basically the robots.txt written as:
User-agent:*
Disallow:
Crawl-delay: 20
User-agent: 008
Disallow: /
would not have the desired result as user-agent 008 would first read the top guideline:
User-agent: *
Disallow:
Crawl-delay: 20
and then begin crawling your site, as it is first being told that All user-agents are disallowed to crawl no pages or directories.
The corrected way to write this would be:
User-agent: 008
Disallow: /
User-agent: *
Disallow:
Crawl-delay: 20
-
Hi Peter,
I've tested the robot.txt file in Webmaster Tools and it now seems to be working as it should and it seems Google is seeing the same file as I have on the server.
I'm afraid this side of things isn't' my area of expertise so it's been a bit of a minefield.
I've taken a subscription with sucuri.net and taken various other steps that hopefully will hel;p with security. But who knows?
Thanks,
Colin
-
Google is seeing the same Robots.txt content (in GWT) that you show in the physical file, right? I just want to make sure that, when the site was hacked, no changes were made that are showing different versions of files to Google. It sounds like that's not the case here, but it definitely can happen.
-
Blog isn't' showing now and my hosts say that the index.php file is missing from the directory but I can see it.
Strange.
Have contacted them again to see what the problem can be.
Bit of a wasted Saturday!
-
Thanks Keith. Just contacting out hosts.
Nightmare!
-
Looks like a 403 permissions problem, that's a server side error... Make sure you have the correct permissions set on the blog folder in IIS Personally I always host on Linux...
-
Mind you the whole blog is now showing an error message and cant' be viewed so looks like an afternoon of trial and error!
-
Thanks very much Keith. I've just edited the file as suggested.
I see the error but as I am the web guy I cant' figure out how to get rid of it.
I think it might be a plugin that's causing it so I'm going to disable the and re-able them one as a time.
I've just PM'd you by the way.
Thanks for your help Keith.
Colin
-
Use this:
**User-agent: * Disallow: /blog/wp-admin/ Disallow: /blog/wp-includes/ Sitemap: http://nile-cruises-4u.co.uk/sitemap.xml**
Any FYI, you have the following error on your blog:
Warning: is_readable() [function.is-readable]: open_basedir restriction in effect. File(D:\home\nile-cruises-4u.co.uk\wwwroot\blog/wp-content/plugins/D:\home\nile-cruises-4u.co.uk\wwwroot\blog\wp-content\plugins\websitedefender-wordpress-security/languages/WSDWP_SECURITY-en_US.mo) is not within the allowed path(s): (D:\home\nile-cruises-4u.co.uk\wwwroot) in D:\home\nile-cruises-4u.co.uk\wwwroot\blog\wp-includes\l10n.php on line **339 **
Get your web guy to look at that, it appears at the top of every blog page for me...
Hope that helps,
Keith
-
Thanks Keith.
Only part of our site is WP based. Would that be a problem using the example you kindly suggested?
-
I gave you an example of a basic robots.txt file that I use on one of my Wordpress sites above, I would suggest using that for now.
I would not bother messing around with crawl delay in robots.txt as Peter said above there are better ways to achieve this... Plus I doubt you need it any way.
Google caches the robots.txt info for about 24hrs normally in my experience... So it's possible the old cached version is still being used by Google.
-
Hi Guys,
Thanks so much for your help. As you say Troy, that's defintely not what I want.
I assumed when we were hacked (twice in 8 months) that it might have been a competitor as we are in a very competitive niche. Might be very wrong there but we have certainly lost our top ranking on Google.co.uk for our main key phrases and our now at about position 7 for the same key phrases after about 3 years at number 1.
So when I saw on Google Webmaster Tools yesterday that we had a severe health warning and that the Googlebot was being prevented crawling our site I thought it might be the aftereffects of the hack.
Today even though I changed the robot.txt file yesterday GWT is showing 1000 pages with errors, 285 Access Denied and 719 Not Found and this message: Googlebot is blocked from http://nile-cruises-4u.co.uk/
I've just tested the robot.txt via GWT and now get this message:
AllowedDetected as a directory; specific files may have different restrictionsSo maybe the pages will be able to access by Googlebot shortly and the Access Denied message will disappear.I've chaged the robot.txt file to
User-agent: *
Crawl-delay: 20But should I change it to a better version? Sorry guys, I'm an online travel agent and not great on coding and really techie stuff. Although I'm learning pretty quickly about the bad stuff!I seem to have a few problems getting this sorted and wonder if this is a part of why our page position is dropping? -
I would simplify your robots.txt to read something like:
**User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Sitemap: http://www.your-domain.com/sitemap.xml**
-
That's odd: "008" appears to be the user agent for "80legs", a custom crawler platform. I'm seeing it in other Robots.txt files.
-
I'm not 100% sure what he's seeing, but when I plug his robots.txt into the robots analysis tool, I get this back:
Googlebot blocked by line 5: Disallow: /
Detected as a directory; specific files may have different restrictions
However, when I gave the top "**User-agent: ***" the "Disallow: " it seemed to fix the problem. Like, it didn't understand that the **Disallow: / **was meant only for the 008 user-agent?
-
Not honestly sure what User-agent "008" is, but that seems harmless. Why the crawl delay? There are better ways to handle that than Robots.txt, if a crawler is giving you trouble.
Was there a specific message/error in GWT?
-
I think, if you have a robots.txt reading what you show above:
User-agent: * Crawl-delay: 20
User-agent: 008 Disallow: /
That just basically says, "Don't crawl my site at all" (The "Disallow: /" means, I'm not allowing anything to be crawled by any search engine that pays attention to robots.txt at all)
So...I'm guessing that's not what you want?
(Bah..ignore. "User-agent". I'm a fool)
Actually, this seems to have solved your issue...make sure you explicitly tell all other User-agents that they are allowed:
User-agent: * Disallow: Crawl-delay: 20
User-agent: 008 Disallow: /
The extra "Disallow:" under User-agent: * says "I'm not going to disallow anything to most user-agents." Then the Disallow under user-agent 008 seems to only apply to them.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best Practices to Design Site Mock Up Using Wordpress Rather than Wireframes?
We are in the process of redesigning our real estate website. Our designer/developer is very quick and confident on Wordpress. He suggests designing directly on Wordpress and bypassing wireframes and a mock ups. He is very confident in his Wordpress abilities. Is it a mistake to take this approach? He has also asked that we select a real estate theme at this point. I would think that the theme would be selected after the wireframes and mock ups get done. But there are certainly different approaches. Are there best practices for redesigning a webiste; any suggestions? Are there significant risks/disadvantages to bypassing wireframes/mock ups? Thanks,
Intermediate & Advanced SEO | | Kingalan1
Alan Rosinsky0 -
Low on Google ranking despite error-free!?
Hi all, I'm following up on a recent post i've made about our indexing and especially ranking problems in Google: http://moz.com/community/q/seo-impact-classifieds-website Thanks to all good comments we managed to get rid of most of our crawl errors and as a result our high priority /duplicated content decreased from +22k to 270. In short, we created canonical urls, run an xml sitemap, used url parameters in GWT, created h1 and meta description for each ad posted by users etc. I then used google fetch a few times (3 weeks ago and last week) both for desktop and mobile version for re-approval. Nothing really improves in google rankings (all our core keywords are ranked +50)since months now: yet yahoo and bing organic traffic went up and is 3x higher than google's. In the meanwhile we're running paid campagins on facebook and adwords since months already to keep traffic consistent, yet this is eating up our budget, even though our ctr and conversion rates are good. I realize we might have to create more content on-site and through social media, but right now our social media traffic is already around 50% and we are using more of twitter and google+ as well since recently. Our organic traffic is only 14%; with google only a third of that. In the end, I believe this breakdown should look more something like organic 50%-70%, (paid)social,referral and direct traffic. 50%-30%... I can't believe we are hit by a penalty although this looks like it is the case. Especially while yahoo and bing traffic goes up and google does not. Should I wait for a signal once our site is "approved" again through GWT fetch? Or am i missing something that i need to check as well to improve these rankings? Thanks for your help! Ivor ps: ask me for additional stats or info in a pm if needed!
Intermediate & Advanced SEO | | ivordg0 -
Rich Snippets Not Displaying - Price Error?
We recently implemented Schema.org/product on our site (www.evo.com). In the Google Webmaster Tools Structured Data report we’re getting lots of errors: http://screencast.com/t/Z3QJBctjUvP which I believe is preventing our rich snippets (price, availability, ratings) from showing in search results. When I click into the “Product” data type on the Structured Data report I see that there’s 2 errors: missing price and missing best or worst rating: http://screencast.com/t/SuHVYFLFO5D We are adding the itemprop=“bestRating” code which should take care of the ‘missing best or worst rating’ error. The missing price error is what I want to ask about. There’s a couple strange things here (using this URL as example : http://www.evo.com/skis/line-sir-francis-bacon.aspx - which has been indexed since the code was added): 1) The Webmaster Tools report is finding the schema.org/offer data type and is recognizing the InStock and OutOfStock property of this: http://screencast.com/t/xtHouzeL37q BUT price is not being detected. 2) When I enter the URL into the Structured Data Testing Tool it does detect price: https://www.google.com/webmasters/tools/richsnippets?url=http://www.evo.com/skis/line-sir-francis-bacon.aspx 3) When I fetch the page as GoogleBot itemprop=“price”is present: http://screencast.com/t/Hnqda95N My hunch is that the reason our Rich Snippets are not showing is because of the “price” error. The “?” by the error in WMT says: “This property is missing in the html markup or was not properly highlighted in the Data Highlighter. This can prevent the rich snippet from appearing” Does anyone have an idea why we’re getting the “price” error – or anything else that could prevent our Rich Snippets from displaying? Thanks so much! http://screencast.com/t/SuHVYFLFO5D
Intermediate & Advanced SEO | | evoNick0 -
404 Errors with my RSS Feed/sitemap
In my google webmasters I just started getting 404 errors that I'm not sure how to redirect. I'm getting quite a few that are ending in /feed/ for instance /nyc-accident-injury/feed/
Intermediate & Advanced SEO | | jsmythd
contact-us-thank-you/feed/ and then also a problem with my sitemap I guess? With /site-map/?postsort=tags The domain is pulversthompson.com0 -
WordPress redesign: using posts as pages?
Starting a redesign for an attorney who is currently using WordPress with an old framework that is no longer being supported, so I'm going to install a new WP and start from scratch. The site consists of about 30 static pages (practice areas, attorney profiles, etc.) and they write about 5 blog posts per month. I've always differentiated between posts and pages for WP sites I've done in the past, but this time around I thought it might be more clean (less files, and easier for their webmaster to make routine edits) if I just brought over the static pages as posts. However, the recent webinar on the Yoast SEO plugin mentioned using the month/day in the permalink structure for posts to avoid duplicate content issues. That would go against how I was thinking of setting it up, because I would have just generated the URL off the page title and make a separate category for "pages". Just wondering if anyone's used posts as pages before. While this seems like it would make things easier for the webmaster, I'm not sure it maximizes potential for SEO. Thanks.
Intermediate & Advanced SEO | | c2g0 -
Wordpress or Joomla? Discussion
Hi All I'm about to start on a new project where I've been having lots of discussions with the developers involved on the merits of both wordpress and joomla. I'm experienced with wordpress but haven't really done too much with Joomla. I've found some general info on Joomla online, most issues seems to be around duplicate content, but can't seem to find too much else. Therefore I thought I'd throw it out there for discussion as I'd love to hear from those of you who have used both CMS's and the drawbacks/ pitfalls or plus points in both. The project is based around a non transactional site, offering a service, but no product. There's lots of thought leadership type content planned, either through interviews, surveys, articles, video etc, and some linkbait etc. Lot of content will also be newsworthy so keep Google news etc in the back of your mind too. Lots of social integration too... Looking forward to hearing what you might have to say Mozzers.
Intermediate & Advanced SEO | | PerchDigital1 -
Generating 404 Errors but the Pages Exist
Hey I have recently come across an issue with several of a sites urls being seen as a 404 by bots such as Xenu, SEOMoz, Google Web Tools etc. The funny thing is, the pages exist and display fine. This happens on many of the pages which use the Modx CMS, but the index is fine. The wordpress blog in /blog/ all works fine. The only thing I can think of is that I have a conflict in the htaccess, but troubleshooting this is difficult, any tool I have found online seem useless. Have tried to rollback to previous versions but still does not work. Anyone had any experience of similar issues? Many thanks K.
Intermediate & Advanced SEO | | Found0 -
Does Google penalize for having a bunch of Error 404s?
If a site removes thousands of pages in one day, without any redirects, is there reason to think Google will penalize the site for this? I have thousands of subcategory index pages. I've figured out a way to reduce the number, but it won't be easy to put in redirects for the ones I'm deleting. They will just disappear. There's no link juice issue. These pages are only linked internally, and indexed in Google. Nobody else links to them. Does anyone think it would be better to remove the pages gradually over time instead of all at once? Thanks!
Intermediate & Advanced SEO | | Interesting.com0