How ro write a robots txt file to point to your site map
-
Good afternoon from still wet & humid wetherby UK...
I want to write a robots text file that instruct the bots to index everything and give a specific location to the sitemap. The sitemap url is:http://business.leedscityregion.gov.uk/CMSPages/GoogleSiteMap.aspx
Is this correct:
User-agent: *
Disallow:
SITEMAP: http://business.leedscityregion.gov.uk/CMSPages/GoogleSiteMap.aspxAny insight welcome
-
Thank you so much for all your replies
[CASE CLOSED] -
Ryan's answer is correct. I just wanted to jump in to say that I know from first hand experience that Google and Bing are both able to read the sitemap file even if it is a different extension and even if you can't name it sitemap.xml.
-
Yes, your example is correct.
A great page for learning about robots.txt is: http://en.wikipedia.org/wiki/Robots_exclusion_standard#Sitemap
I will share the official method of declaring your sitemap location involves only the first letter being capitalized (i.e. Sitemap not SITEMAP) but I am almost certain it does not make a difference.
A few other suggestions which are best practices but do not have to be followed:
-
use all lowercase letters in URLs
-
name the sitemap file "sitemap" not "GoogleSiteMap"
-
submit XML sitemaps when possible. I am again almost certain Google can read other versions so if all you care about is Google then it's fine but otherwise I would suggest just using xml files.
example: business.leedscityregion.gov.uk/cmspages/sitemap.xml
Some other helpful links:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=183668
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does my "spam" site affect my other sites on the same IP?
I have a link directory called Liberty Resource Directory. It's the main site on my dedicated IP, all my other sites are Addon domains on top of it. While exploring the new MOZ spam ranking I saw that LRD (Liberty Resource Directory) has a spam score of 9/17 and that Google penalizes 71% of sites with a similar score. Fair enough, thin content, bunch of follow links (there's over 2,000 links by now), no problem. That site isn't for Google, it's for me. Question, does that site (and linking to my own sites on it) negatively affect my other sites on the same IP? If so, by how much? Does a simple noindex fix that potential issues? Bonus: How does one go about going through hundreds of pages with thousands of links, built with raw, plain text HTML to change things to nofollow? =/
Technical SEO | | eglove0 -
Robots.txt on refinements
In dealing with Panda do you think it is a good idea to put all refinements for category pages in the robots.txt file? We already have a lot as noindex, follow but I am wondering if it would be better to address from a crawl perspective as the pages are probably thin duplicate content to Google.
Technical SEO | | Gordian0 -
Mobile site content and main site content
Help, pls! I have one main site and a mobile version of that site (m.domain.com). The main site has more pages, more content, different named urls. The main site has consistently done well in Google. The mobile site has not: the mobile site is buried. I am working on adding more content to the mobile site, but am concerned about duplicate content. Could someone pls tell me the best way to deal with these two versions of our site? I can't use rel=canonical because the urls do not correspond to the same names on the main site, or can I? Does this mean I need to change the url names, offer different content (abridged), etc? I really am at a loss as to how to interpret Google's rules for this. Could someone please tell me what I am doing wrong? Any help or tips would GREATLY appreciated!!!!! Thanks!
Technical SEO | | lfrazer0 -
Question about construction of our sitemap URL in robots.txt file
Hi all, This is a Webmaster/SEO question. This is the sitemap URL currently in our robots.txt file: http://www.ccisolutions.com/sitemap.xml As you can see it leads to a page with two URLs on it. Is this a problem? Wouldn't it be better to list both of those XML files as separate line items in the robots.txt file? Thanks! Dana
Technical SEO | | danatanseo0 -
Redirecting the .com of our site
Hey guys, A company I consult for has a different site for its users depending on the geography. Example: When a visitor goes to www.company.com if the user is from the EU, it gets redirected to http://eu.company.com If the user is from the US, it goes to http://us.company.com And so on. I have two questions: Does having a redirect on the .com will influence rankings on each specific sub-site? I suspect it will affect the .com since it will simply not get indexed but not sure if affects the sub domains. The content on this sub-sites are not different (I´m still trying to figure out why they are using the sub-domains). Will they get penalized for duplicate content? Thanks!
Technical SEO | | FDSConsulting0 -
Allow or Disallow First in Robots.txt
If I want to override a Disallow directive in robots.txt with an Allow command, do I have the Allow command before or after the Disallow command? example: Allow: /models/ford///page* Disallow: /models////page
Technical SEO | | irvingw0 -
Video submission sites
Hello, What are the top 5 sites for video submissions ? Any suggestions about which points should be taken into consideration when submitting videos ? Thanks
Technical SEO | | seoug_20050 -
Invisible robots.txt?
So here's a weird one... Client comes to me for some simple changes, turns out there are some major issues with the site, one of which is that none of the correct content pages are showing up in Google, just ancillary (outdated) ones. Looks like an issue because even the main homepage isn't showing up with a "site:domain.com" So, I add to Webmaster Tools and, after an hour or so, I get the red bar of doom, "robots.txt is blocking important pages." I check it out in Webmasters and, sure enough, it's a "User agent: * Disallow /" ACK! But wait... there's no robots.txt to be found on the server. I can go to domain.com/robots.txt and see it but nothing via FTP. I upload a new one and, thankfully, that is now showing but I've never seen that before. Question is: can a robots.txt file be stored in a way that can't be seen? Thanks!
Technical SEO | | joshcanhelp0