Should comments and feeds be disallowed in robots.txt?
-
Hi
My robots file is currently set up as listed below.
From an SEO point of view is it good to disallow feeds, rss and comments?
I feel allowing comments would be a good thing because it's new content that may rank in the search engines as the comments left on my blog often refer to questions or companies folks are searching for more information on. And the comments are added regularly.
What's your take? I'm also concerned about the /page being blocked. Not sure how that benefits my blog from an SEO point of view as well. Look forward to your feedback.
Thanks.
Eddy
User-agent: Googlebot Crawl-delay: 10 Allow: /* User-agent: * Crawl-delay: 10 Disallow: /wp- Disallow: /feed/ Disallow: /trackback/ Disallow: /rss/ Disallow: /comments/feed/ Disallow: /page/ Disallow: /date/ Disallow: /comments/ # Allow Everything Allow: /*
-
If I were going to disallow something I would go with noindex tags. The robots file is perfect with just those 2 lines.
Then, there are some plugins that will help you avoid any SEO issue like SEO by Yoast. Personally I like to noindex,follow tags, categories, and archive pages, that's it. But again, noindex, follow with a robots tag on the page, not using the robots.txt. SEO by Yoast will make that as easy as it can ever be with just a small configuration steps.
Give it a try, you can always disable plugins
Wish you the best!
-
Wordpress is a funny platform, you would think that there isn't much to disallow but there probably is quite a bit. I agree with Federico - you should allow comments, feed, and rss.
I'm not going to make blind assumptions here, so you should check your log files to see what's being constantly crawled, feel free to read this http://moz.com/blog/server-log-essentials-for-seo.
FYI - This is a big job. Shout if you need help.
P.S - Hostgator's Cpanel will allow you to archive raw server logs, make sure you check that option from now on or they'll be overwritten!
-
Thanks for the info!
I contacted Hostgator to fix the robots file because it had been blocking Google's bot for some time now. So that's the robot file they uploaded.
Yes I use wordpress, and apparently some stupid plugin had originally blocked google before hostgator fixed the robots file yesterday.
So to confirm you don't think anything else should be disallowed except for the /wp-admin directory. With the feeds, comments, etc, there isn't any SEO concerns like duplicate content or anything else that may work against me that should be blocked.
Is this safe to assume?
Thanks again!
Eddy
-
Who wrote that robots.txt?
You shouldn't disallow the comments, or feed or almost anything.
I notice you are using wordpress, so if you just want to avoid the admin being indexed (which will isn't going to be as Google does not have access anyway), your robots.txt should look like this:
User-Agent:*
Disallow: /wp-admin/
That's it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site not showing up in search - was hacked - huge comment spam - cannot connect Webmaster tools
Hi Moz Community A new client approached me yesterday for help with their site that used to rank well for their designated keywords, but now is not doing well. Actually, they are not on Google at all. It's like they were removed by Google. There are not reference to them when searching with "site: url". I investigated further and discovered the likely problem . . . 26 000 spam comments! All these comments have been removed now. I clean up this Wordpress site pretty well. However, I want to connect it now to Google webmaster tools. I have admin access to the WP site, but not ftp. So I tried using Yoast to connect. Google failed to verify the site. So the I used a file uploading console to upload the Google html code instead. I check that the code is there. And Google still fails to verify the site. It is as if Google is so angry with this domain that they have wiped it completely from search and refuse to have any dealings with it at all. That said, I did run the "malware" check or "dangerous content" check with them that did not bring back any problems. I'm leaning towards the idea that this is a "cursed" domain in Google and that my client's best course of action is to build her business around and other domain instead. And then point that old domain to the new domain, hopefully without attracting any bad karma in that process (advice on that step would be appreciated). Anyone have an idea as to what is going on here?
Intermediate & Advanced SEO | | AlistairC0 -
Robots.txt Disallowed Pages and Still Indexed
Alright, I am pretty sure I know the answer is "Nothing more I can do here." but I just wanted to double check. It relates to the robots.txt file and that pesky "A description for this result is not available because of this site's robots.txt". Typically people want the URL indexed and the normal Meta Description to be displayed but I don't want the link there at all. I purposefully am trying to robots that stuff outta there.
Intermediate & Advanced SEO | | DRSearchEngOpt
My question is, has anybody tried to get a page taken out of the Index and had this happen; URL still there but pesky robots.txt message for meta description? Were you able to get the URL to no longer show up or did you just live with this? Thanks folks, you are always great!0 -
Default Robots.txt in WordPress - Should i change it??
I have a WordPress site as using theme Genesis i am using default robots.txt. that has a line Allow: /wp-admin/admin-ajax.php, is it okay or any problem. Should i change it?
Intermediate & Advanced SEO | | rootwaysinc0 -
Robots.txt: how to exclude sub-directories correctly?
Hello here, I am trying to figure out the correct way to tell SEs to crawls this: http://www.mysite.com/directory/ But not this: http://www.mysite.com/directory/sub-directory/ or this: http://www.mysite.com/directory/sub-directory2/sub-directory/... But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way: disallow: /directory/sub-directory/ disallow: /directory/sub-directory2/ disallow: /directory/sub-directory/sub-directory/ disallow: /directory/sub-directory2/subdirectory/ etc... I would end up having thousands of definitions to disallow all the possible sub-directory combinations. So, is the following way a correct, better and shorter way to define what I want above: allow: /directory/$ disallow: /directory/* Would the above work? Any thoughts are very welcome! Thank you in advance. Best, Fab.
Intermediate & Advanced SEO | | fablau1 -
A Keyword Occupied Google Top 7 Ranking. Please Comment........
Hello Everyone, When the whole world is debating on EMD, whether one should use it or avoid. Many bloggers from India still crack a very good traffic from EMD only. Recently, I was researching and found a very impressive link. Keyword: " sad shayari hindi" Google India Search Top 7 position occupied by a single domain with multiple URLs. I would like to request everyone to check the screenshot and comment. VJSQkuy
Intermediate & Advanced SEO | | pushkar630 -
Issue with Robots.txt file blocking meta description
Hi, Can you please tell me why the following error is showing up in the serps for a website that was just re-launched 7 days ago with new pages (301 redirects are built in)? A description for this result is not available because of this site's robots.txt – learn more. Once we noticed it yesterday, we made some changed to the file and removed the amount of items in the disallow list. Here is the current Robots.txt file: # XML Sitemap & Google News Feeds version 4.2 - http://status301.net/wordpress-plugins/xml-sitemap-feed/ Sitemap: http://www.website.com/sitemap.xml Sitemap: http://www.website.com/sitemap-news.xml User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Other notes... the site was developed in WordPress and uses that followign plugins: WooCommerce All-in-One SEO Pack Google Analytics for WordPress XML Sitemap Google News Feeds Currently, in the SERPs, it keeps jumping back and forth between showing the meta description for the www domain and showing the error message (above). Originally, WP Super Cache was installed and has since been deactivated, removed from WP-config.php and deleted permanently. One other thing to note, we noticed yesterday that there was an old xml sitemap still on file, which we have since removed and resubmitted a new one via WMT. Also, the old pages are still showing up in the SERPs. Could it just be that this will take time, to review the new sitemap and re-index the new site? If so, what kind of timeframes are you seeing these days for the new pages to show up in SERPs? Days, weeks? Thanks, Erin ```
Intermediate & Advanced SEO | | HiddenPeak0 -
Dofollow blog comments to encourage commenting and subscriptions?
We publish really solid content on our blog, but are having trouble acquiring comments and subscribers due to the dull nature of our industry. So we are considering dofollowing blog comments as incentive. Of course, the comment will be moderated. Do you think this is a good idea?
Intermediate & Advanced SEO | | Choice0 -
Robots.txt disallow subdomain
Hi all, I have a development subdomain, which gets copied to the live domain. Because I don't want this dev domain to get crawled, I'd like to implement a robots.txt for this domain only. The problem is that I don't want this robots.txt to disallow the live domain. Is there a way to create a robots.txt for this development subdomain only? Thanks in advance!
Intermediate & Advanced SEO | | Partouter0