XML and Disallow
-
I was just curious about any potential side effects of a client Basically utilizing a catch-all solution through the use of a spider for generating their XML Sitemap and then disallowing some of the directories in the XML sitemap in the robots.txt.
i.e.
XML contains 500 URLs
50 URLs contain /dirw/
I don't want anything with /dirw/ indexed just because they are fairly useless. No content, one image.They utilize the robots.txt file to " disallow: /dirw/ "
Lets say they do this for maybe 3 separate directories making up roughly 30% of the URL's in the XML sitemap.
I am just advising they re-do the sitemaps because that shouldn't be too dificult but I am curious about the actual ramifications of this other than "it isn't a clear and concise indication to the SE and therefore should be made such" if there are any.
Thanks!
-
Hi Thomas,
I don't think that technically there is a problem with adding url's to a sitemap & then blocking part of them with robots.txt.
I wouldn't do it however - and I would give the same advice as you did: regenerate the sitemap without this content. Main reason would be that it goes against the main goals of a sitemap: helping bots to crawl your site and to provide valuable metadata (https://support.google.com/webmasters/answer/156184?hl=en). Another advantage is that Google indicates the % of url's of each sitemap which is index. From that perspective, url's which are blocked for indexing have no use in a sitemap. Normally webmaster tools will generate errors, to let you know that there are issues with the sitemap.
If you take it one step further, Google could consider you a bit of a lousy webmaster, if you keep these url's in the sitemap. Not sure if this is the case, but for something which can easily be corrected, not sure if I would take this risk (even if it's a very minor one).
There are crawlers (like screamingfrog) which can generate sitemaps, while respecting the directives of the robots.txt - this would in my opinion be a better option.
rgds,
Dirk
-
For syntax I think you'll want:
User-agent: *
Disallow: /dirw/If the content of /dirw/ isn't worthwhile to the engines then it should be fine to disallow. It's important to note though that Google asks for CSS and Javascript to not be disallowed. Run the site through their Page Speed tool to see how this setup currently impacts that interaction. Cheers!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Spotify XML Sitemap
All, Working on an SEO work up for a Spotify site. Looks like they are using a sitemap that links to additional pages. A problem, none of the links are actually linked within the sitemap. This feels like a strong error. https://lubricitylabs.com/sitemap.xml Thoughts?
Intermediate & Advanced SEO | | dmaher0 -
Should you bother disallowing low quality links with brand/non-commercial anchor text?
Hi Guys, Doing a link audit and have come across lots of low quality web directories pointing to the website. Most of the anchor text of these directories are the websites URL and not comercial/keyword focused anchor text. So if thats the case should we even bother doing a link removal request via google webmaster tools for these links, as the anchor text is non-commercial? Cheers.
Intermediate & Advanced SEO | | spyaccounts140 -
Href lang in image or video XML sitemaps
Does anyone know if it is possible/recommended/not recommended to use href lang in image or video XML sitemaps? This had not crossed my mind until recently, but a client asked me this question and I couldn't find any information on this topic.
Intermediate & Advanced SEO | | ChrisKing0 -
XML Sitemap & Bad Code
I've been creating sitemaps with XML Sitemap Generator, and have been downloading them to edit on my pc. The sitemaps work fine when viewing in a browser, but when I download and open in Dreamweaver, the urls don't work when I cut and paste them in the Firefox URL bar. I notice the codes are different. For example, an "&" is produced like this..."&". Extra characters are inserted, producing the error. I was wondering if this is normal, because as I said, the map works fine when viewing online.
Intermediate & Advanced SEO | | alrockn0 -
XML Sitemaps - how to create the perfect XML Sitemap
Hello, We have a site that is not updated very often - currently we have a script running to create/update the XML sitemap every time a page is added/edited or deleted. I have a few questions about best practices for creating XML sitemaps. 1. If the site is not updated for months on end - is it a bad idea to force the script to update i.e. changing the dates once a month? Will google noticed nothing has changed just the date i.e. all the content on the site is exactly the same. Will they start penalising you for updating an XML sitemap when there is nothing new about the website?
Intermediate & Advanced SEO | | JohnW-UK
2. Is it worth automating the XML file to link into Bing/Google to update via webmaster tools - as I say even if the site is never updated?
3. Is the use of "priorities" necessary?
4. The changefreq - does that mean Google/Bing expects to see a new file ever month?
5. The ordering of the pages - the script seems pretty random and put the pages in a random order - should we make it order the pages with the most important ones first? Should the home page always be first?
6. Below is a sample of how our XML sitemap appears - is there anything that we should change? i.e. all marked up properly? This XML file does not appear to have any style information associated with it. The document tree is shown below.
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><url><loc>http://www.domain.com</loc>
<lastmod>2013-11-06</lastmod>
<changefreq>monthly</changefreq></url>
<url><loc>http://www.domain.com/contact/</loc>
<lastmod>2013-11-06</lastmod>
<changefreq>monthly</changefreq></url>
<url><loc>http://www.domain.com/sitemap/</loc>
<lastmod>2013-11-06</lastmod>
<changefreq>monthly</changefreq></url></urlset> Hope someone can help enlighten us to best practices0 -
XML Sitemap Indexation Rate Decrease
On September 28th, 2013 I saw my indexation rate decrease on my XML sitemap that I've submitted through GWT. I've since scraped my sitemap and removed all 404, 400 errors (which only made up ~5% of the entire sitemap). Any idea why Google randomly started indexing less of my XML sitemap on that date? I updated my sitemap 2 week before that date and had an indexation rate of ~85% - no I'm below 35%. Thoughts, idea, experiences? Thanks!
Intermediate & Advanced SEO | | RobbieWilliams0 -
Should comments and feeds be disallowed in robots.txt?
Hi My robots file is currently set up as listed below. From an SEO point of view is it good to disallow feeds, rss and comments? I feel allowing comments would be a good thing because it's new content that may rank in the search engines as the comments left on my blog often refer to questions or companies folks are searching for more information on. And the comments are added regularly. What's your take? I'm also concerned about the /page being blocked. Not sure how that benefits my blog from an SEO point of view as well. Look forward to your feedback. Thanks. Eddy User-agent: Googlebot Crawl-delay: 10 Allow: /* User-agent: * Crawl-delay: 10 Disallow: /wp- Disallow: /feed/ Disallow: /trackback/ Disallow: /rss/ Disallow: /comments/feed/ Disallow: /page/ Disallow: /date/ Disallow: /comments/ # Allow Everything Allow: /*
Intermediate & Advanced SEO | | workathomecareers0 -
Should i remove sitemap from the mainsite at a webshop (footer link) and only submit .XML in Webmaster tools?
Case: Webshop with over 2000 products. I want to make a logical sitemap for Google to follow. What is best practice at this field? Should i remove the on-page sitemap there is in html with links (is shown as a footer link called "sitemap") and only have the domain.com/sitemap.xml ? Links for great articles about making sitemaps are appreciated to. The system is Magento, if that changes anything.
Intermediate & Advanced SEO | | Mickelp0