Adding your sitemap to robots.txt
-
Hi everyone,
Best practice question:
When adding your sitemap to your robots.txt file, do you add the whole sitemap at once or do you add different subcategories (products, posts, categories,..) separately?
I'm very curious to hear your thoughts!
-
Just add the sitemap index file to your robots.txt and let them figure it out from there. You basically just want to point them to your sitemaps and they're able to do that from just the sitemap index. So there's not really a need to list all of them in there.
-
From a crawlability point of view, it does not matter. Search engines have no more problems crawling multiple sitemap files than they do crawling one very large XML sitemap file.
An advantage of splitting out your XML sitemaps is that if your site is very large, you are less likely to run into the 50 MB / 50,000 URL limit. If the site is quite small, you obviously won't benefit from this.
If you use multiple sitemaps, you may already know that you don't have to list them all in robots.txt. You can use a sitemap index file to point to your subcategory sitemaps (e.g. posts.xml etc.) Any modifications to the 'child' XML sitemaps do not need to be updated in robots.txt - you only need to remember to add/remove them from the XML index file and Google/Bing Search Console.
Since many site applications automatically generate XML sitemaps grouped by posts, categories and products etc., we find it's easier to use this default configuration - and simply add the sitemap index URL to robots.txt.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Indexed But Not Submitted to Sitemap
Hi guys, In Google's webmaster tool it says that the URL has been indexed but not submitted to the sitemap. Is it necessary that the URL be submitted to the sitemap if it has already been indexed? Appreciate your help with this. Mark
Technical SEO | | marktheshark100 -
Google Search Console - Sitemap
Hi all, Quick question. I'm trying to update my sitemap via Google Search Console using a sitemap.xml file that I've created with ScreamingFrog. However, when trying to submit it, it seems that Google only allows sitemaps that are located at a path within your domain (i.e. www.example.com/sitemap.xml) as opposed to being able to directly upload a sitemap.xml file.Is there any way that I can easily upload my sitemap.xml file? Or is there any easy way that I can upload the file to a path on my domain so I can upload via the URL?Any insight would be much appreciated!Best,Sung
Technical SEO | | hdeg0 -
SEO trending down after adding content to website
Hi
Technical SEO | | swat1827
Looking for some guidance. I added about 14 pages of unique content and did all of the on page SEO work using Yoast - have 'good' status on all of them some of the website architecture was changed - mainly on one page. That being said, we got a significant bump the day I implemented, however every day thereafter we have had very bad results. Worse than we had before for about 3 days now. I did resubmit the updated sitemap to GWT and I'm showing no crawl errors. Also, curious if my Robots.txt file could be the issue. All it contains is User-agent: *
Disallow: /wp-admin/ Any insight or advise is greatly appreciated!
Thanks for your time0 -
How do I setup sitemaps for an international website?
I am adding translated versions of my sites to a subdomain for example es.example.com. Will I add each subdomain into Google Webmaster Tools? Will each need its own sitemap?
Technical SEO | | EcommerceSite0 -
Would adding an SSL certificate help my website?
SSL certificates can obviously be a used as a ranking factor by Google, but would a site with no need for an SSL certificate notice a gain by adding one? Is it possible to demonstrate you have an SSL certificate without having some https pages on your site?
Technical SEO | | sthompson0 -
Robots.txt to disallow /index.php/ path
Hi SEOmoz, I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?. I don't use that extension, but would it cause me any problems from an SEO perspective? How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
Technical SEO | | Mikkehl0 -
HTML Sitemap Pagination?
Im creating an a to z type directory of internal pages within a site of mine however there are cases where there are over 500 links within the pages. I intend to use pagination (rel=next/prev) to avoid too many links on the page but am worried about indexation issues. should I be worried?"
Technical SEO | | DMGoo0 -
Subdomain Removal in Robots.txt with Conditional Logic??
I would like to see if there is a way to add conditional logic to the robots.txt file so that when we push from DEV to PRODUCTION and the robots.txt file is pushed, we don't have to remember to NOT push the robots.txt file OR edit it when it goes live. My specific situation is this: I have www.website.com, dev.website.com and new.website.com and somehow google has indexed the DEV.website.com and NEW.website.com and I'd like these to be removed from google's index as they are causing duplicate content. Should I: a) add 2 new GWT entries for DEV.website.com and NEW.website.com and VERIFY ownership - if I do this, then when the files are pushed to LIVE won't the files contain the VERIFY META CODE for the DEV version even though it's now LIVE? (hope that makes sense) b) write a robots.txt file that specifies "DISALLOW: DEV.website.com/" is that possible? I have only seen examples of DISALLOW with a "/" in the beginning... Hope this makes sense, can really use the help! I'm on a Windows Server 2008 box running ColdFusion websites.
Technical SEO | | ErnieB0