Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Is blocking RSS Feeds with robots.txt necessary?
-
Is it necessary to block an rss feed with robots.txt?
It seems they are automatically not indexed (http://googlewebmastercentral.blogspot.com/2007/12/taking-feeds-out-of-our-web-search.html)
And, google says here that it's important not to block RSS feeds
(http://googlewebmastercentral.blogspot.com/2009/10/using-rssatom-feeds-to-discover-new.html)
I'm just checking!
-
Hi Michelleh,
There's no need to block RSS feeds as they are used for discovery (Gbot). Here's a quirky fact: RSS feeds actually combat the scraper sites as they have absolute URLs which clearly link back to your site They're going to scrape your content anyhow, let's hope they choose RSS!
How does G know it's an RSS feed? Let's look at some of the markup on RSS pages:
<rss <span="">version</rss>="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel></channel>
Either this or something similar will be in the HTML that defines an XML/RSS/Atom/XSL document/markup - this is easily read by Google. Not going to get too far into it but you can start reading more here:
http://en.wikipedia.org/wiki/RSS
Does Google index the XML file type? **Yes. **
Does that help?
-
How do they know it is an RSS feed? Does google not index the xml filetype?
-
If google says not to block it then don't block it. They may not index the RSS but they can still crawl the RSS.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
If I get spammy backlinks removed is it still necessary to disavow?
Now there is some conflicting beliefs here and I want to know what you think. If I got a high spam website to remove my backlink, is a disavow through search console still necessary ? Keep in mind if it helps even in the slightest to improve rankings im for it!
Technical SEO | | Colemckeon1 -
Google Search console says 'sitemap is blocked by robots?
Google Search console is telling me "Sitemap contains URLs which are blocked by robots.txt." I don't understand why my sitemap is being blocked? My robots.txt look like this: User-Agent: *
Technical SEO | | Extima-Christian
Disallow: Sitemap: http://www.website.com/sitemap_index.xml It's a WordPress site, with Yoast SEO installed. Is anyone else having this issue with Google Search console? Does anyone know how I can fix this issue?1 -
Log in, sign up, user registration and robots
Hi all, We have an accommodation site that asks users only to register when they want to book a room, in the last step. Though this is the ideal situation when you have tons of users, nowadays we are having around 1500 - 2000 per day and making tests we found out that if we ask for a registration (simple, 1 click FB) we mail them all and through a good customer service we are increasing our sales. That is why, we would like to ask users to register right after the home page ie Home/accommodation or and all the rest. I am not sure how can I make to make that content still visible to robots.
Technical SEO | | Eurasmus.com
Will the authentication process block google crawling it? Maybe something we can do? We are not completely sure how to proceed so any tip would be appreciated. Thank you all for answering.3 -
Blocking Affiliate Links via robots.txt
Hi, I work with a client who has a large affiliate network pointing to their domain which is a large part of their inbound marketing strategy. All of these links point to a subdomain of affiliates.example.com, which then redirects the links through a 301 redirect to the relevant target page for the link. These links have been showing up in Webmaster Tools as top linking domains and also in the latest downloaded links reports. To follow guidelines and ensure that these links aren't counted by Google for either positive or negative impact on the site, we have added a block on the robots.txt of the affiliates.example.com subdomain, blocking search engines from crawling the full subddomain. The robots.txt file is the following code: User-agent: * Disallow: / We have authenticated the subdomain with Google Webmaster Tools and made certain that Google can reach and read the robots.txt file. We know they are being blocked from reading the affiliates subdomain. However, we added this affiliates subdomain block a few weeks ago to the robots.txt, but links are still showing up in the latest downloads report as first being discovered after we added the block. It's been a few weeks already, and we want to make sure that the block was implemented properly and that these links aren't being used to negatively impact the site. Any suggestions or clarification would be helpful - if the subdomain is being blocked for the search engines, why are the search engines following the links and reporting them in the www.example.com subdomain GWMT account as latest links. And if the block is implemented properly, will the total number of links pointing to our site as reported in the links to your site section be reduced, or does this not have an impact on that figure?From a development standpoint, it's a much easier fix for us to adjust the robots.txt file than to change the affiliate linking connection from a 301 to a 302, which is why we decided to go with this option.Any help you can offer will be greatly appreciated.Thanks,Mark
Technical SEO | | Mark_Ginsberg0 -
Allow or Disallow First in Robots.txt
If I want to override a Disallow directive in robots.txt with an Allow command, do I have the Allow command before or after the Disallow command? example: Allow: /models/ford///page* Disallow: /models////page
Technical SEO | | irvingw0 -
Removing robots.txt on WordPress site problem
Hi..am a little confused since I ticked the box in WordPress to allow search engines to now crawl my site (previously asked for them not to) but Google webmaster tools is telling me I still have robots.txt blocking them so am unable to submit the sitemap. Checked source code and the robots instruction has gone so a little lost. Any ideas please?
Technical SEO | | Wallander0 -
Subdomain Removal in Robots.txt with Conditional Logic??
I would like to see if there is a way to add conditional logic to the robots.txt file so that when we push from DEV to PRODUCTION and the robots.txt file is pushed, we don't have to remember to NOT push the robots.txt file OR edit it when it goes live. My specific situation is this: I have www.website.com, dev.website.com and new.website.com and somehow google has indexed the DEV.website.com and NEW.website.com and I'd like these to be removed from google's index as they are causing duplicate content. Should I: a) add 2 new GWT entries for DEV.website.com and NEW.website.com and VERIFY ownership - if I do this, then when the files are pushed to LIVE won't the files contain the VERIFY META CODE for the DEV version even though it's now LIVE? (hope that makes sense) b) write a robots.txt file that specifies "DISALLOW: DEV.website.com/" is that possible? I have only seen examples of DISALLOW with a "/" in the beginning... Hope this makes sense, can really use the help! I'm on a Windows Server 2008 box running ColdFusion websites.
Technical SEO | | ErnieB0 -
Should I set up a disallow in the robots.txt for catalog search results?
When the crawl diagnostics came back for my site its showing around 3,000 pages of duplicate content. Almost all of them are of the catalog search results page. I also did a site search on Google and they have most of the results pages in their index too. I think I should just disallow the bots in the /catalogsearch/ sub folder, but I'm not sure if this will have any negative effect?
Technical SEO | | JordanJudson0