Google (GWT) says my homepage and posts are blocked by Robots.txt
-
I guys.. I have a very annoying issue..
My Wordpress-blog over at www.Trovatten.com has some indexation-problems..
Google Webmaster Tools data:
GWT says the following: "Sitemap contains urls which are blocked by robots.txt." and shows me my homepage and my blogposts..This is my Robots.txt: http://www.trovatten.com/robots.txt
"User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/Do you have any idea why it says that the URL's are being blocked by robots.txt when that looks how it should?
I've read a couple of places that it can be because of a Wordpress Plugin that is creating a virtuel robots.txt, but I can't validate it..1. I have set WP-Privacy to crawl my site
2. I have deactivated all WP-plugins and I still get same GWT-Warnings.Looking forward to hear if you have an idea that might work!
-
Do you know which plugin (or combination) was the trouble?
I use a lot of wordpress, and this is very interesting.
-
You are absolutely right.
The problem was that a plugin I installed messed with my robots.txt
-
I am going to disagree with the above.
The command <meta < span="">name="robots" content="noodp, noydir" /> has nothing to do with denying any access to the robots.</meta <>
It is used to prevent the engines from displaying meta descriptions from DMOZ and the Yahoo directory. Without this line, the search engines might choose to use those descriptions, rather than the descriptions you have as meta descriptions.
-
Hey Frederick,
Here's your current meta data for robots on your home page (in the section):
name="robots" content="noodp, noydir" />
Should be something like this:
name="robots" content="INDEX,FOLLOW" />
I don't think it's the robots.txt that's the issue, but rather the meta-robots in the head of the site.
Hope this helps!
Thanks,
Anthony
[moderator's note: this answer was actually not the correct answer for this question, please see responses below]
-
I have tweak around with an XML SItemap-generater and I think it works. I'll give an update in a couple of hours!
Thansk!
-
Thanks for your comment Stubby and you are probably right.
But the problem is the Disallowing and not the sitemaps.. And based on my Robots.txt should everything be crawable.
What I'm worried about is that the virtuel Robots.txt that WP-generates is trouble.
-
Is Yoast generating another sitemap for you?
You have a sitemap from a different plugin, but Yoast can also generate sitemaps, so perhaps you have 2 - and one of the sitemaps lists the items that you are disallowing.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Homepage was removed from google and got deranked
Hello experts I have a problem. The main page of my homepage got deranked severely and now I am not sure how to get the rank back. It started when I accidentally canonicalized the main page "https://kv16.dk" to a page that did not exist. 4 months later the page got deranked, and you were not able to see the "main page" in the search results at all, not even when searching for "kv16.dk". Then we discovered the canonicalization mistake and fixed it, and were able to get the main page back in the search results when searching for "kv16.dk". At first after we made the correction, some weeks passed by, and the ranking didn't get better. Google search console recommended uploading a sitemap, do we did that. However in this sitemap there was a lot of "thin content sites", for all the wordpress attachments. E.g. for every image in an article. more exactly there were 91 of these attachment sites, and the rest of the page consists of only two pages "main page" and an extra landing page. After that google begun recommending the attachment urls in some searches. We tried fixing it by redirecting all the attachments to their simple form. E.g. if it was an attachment page for an image we redirected strait to the image. Google has not yet removed these attachment pages, so the question is if you think it will help to remove the attachments via google search console, or will that not help at all? For example when we search "kv16" an attachment URL named "birksø" is one of the first results
Technical SEO | | Christian_T0 -
Disallow wildcard match in Robots.txt
This is in my robots.txt file, does anyone know what this is supposed to accomplish, it doesn't appear to be blocking URLs with question marks Disallow: /?crawler=1
Technical SEO | | AmandaBridge
Disallow: /?mobile=1 Thank you0 -
Can google bots read my internal post links if they are all listed in a javascript accordian where I list my sources?
I post a JavaScript accordion drop down tab [ a collapsible content area ] at the end of all my posts. I labeled the accordion "Show Article Sources"., and when a user clicks it, then the accordion expands open and it shows all the sources I listed for my article. And this is where I post all of my articles links that I reference per each article. But I read somewhere that google crawlers can not read text in a drop down JavaScript tab. So I am wondering now if this is true because that would mean I have no internal linking SEO going on since it cant read the links? ..... if it is true, then I should remove the accordion from all my articles and some how include the links I reference in the actual body text so I can get SEO benefits from external linking similar content? If that's true, what is an aesthetic way to do this, any example links? Tips ? Thoughts ?
Technical SEO | | ianizaguirre0 -
One robots.txt file for multiple sites?
I have 2 sites hosted with Blue Host and was told to put the robots.txt in the root folder and just use the one robots.txt for both sites. Is this right? It seems wrong. I want to block certain things on one site. Thanks for the help, Rena
Technical SEO | | renalynd270 -
Robots.txt usage
Hey Guys, I am about make an important improvement to our site's robots.txt we have large number of properties on our site and we have different views for them. List, gallery and map view. By default list view shows up and user can navigate through gallery view. We donot want gallery pages to get indexed and want to save our crawl budget for more important pages. this is one example of our site: http://www.holiday-rentals.co.uk/France/r31.htm When you click on "gallery view" URL of this site will remain same in your address bar: but when you mouse over the "gallery view" tab it will show you URL with parameter "view=g". there are number of parameters: "view=g, view=l and view=m". http://www.holiday-rentals.co.uk/France/r31.htm?view=l http://www.holiday-rentals.co.uk/France/r31.htm?view=g http://www.holiday-rentals.co.uk/France/r31.htm?view=m Now my question is: I If restrict bots by adding "Disallow: ?view=" in our robots.txt will it effect the list view too? Will be very thankful if yo look into this for us. Many thanks Hassan I will test this on some other site within our network too before putting it to important one's. to measure the impact but will be waiting for your recommendations. Thanks
Technical SEO | | holidayseo0 -
What to do with Deleted Posts?
Hi Guys, To give way to the implementation of Google Panda, I have deleted some of my wordpress blog posts with low quality content. Now, I am seeing some errors in my Webmasters Tools. What needs to be done so that these errors will be fixed? Thanks....
Technical SEO | | Trigun0 -
Look of google results
Can anyone tell me why some google results show the main page and then a listing of all subsequent pages (i.e. results for SEOMOZ) while others just show the main page with nothing under it. I have two different sites (one personal the other biz) and they both show their search results differently. Is it something in the site creation or how it is crawled by google? Thanks. bKs3C
Technical SEO | | STF0 -
Robots.txt
My campaign hse24 (www.hse24.de) is not being crawled any more ... Do you think this can be a problem of the robots.txt? I always thought that Google and friends are interpretating the file correct, seen that he site was crawled since last week. Thanks a lot Bernd NB: Here is the robots.txt: User-Agent: * Disallow: / User-agent: Googlebot User-agent: Googlebot-Image User-agent: Googlebot-Mobile User-agent: MSNBot User-agent: Slurp User-agent: yahoo-mmcrawler User-agent: psbot Disallow: /is-bin/ Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-DE-Site/de_DE/-/EUR/hse24_Storefront-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-AT-Site/de_DE/-/EUR/hse24_Storefront-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-CH-Site/de_DE/-/CHF/hse24_Storefront-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-DE-Site/de_DE/-/EUR/hse24_DisplayProductInformation-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-AT-Site/de_DE/-/EUR/hse24_DisplayProductInformation-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-CH-Site/de_DE/-/CHF/hse24_DisplayProductInformation-Start Allow: /is-bin/intershop.static/WFS/HSE24-Site/-/Editions/ Allow: /is-bin/intershop.static/WFS/HSE24-Site/-/Editions/Root%20Edition/units/HSE24/Beratung/
Technical SEO | | remino630