Robots.txt and canonical tag
-
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said -
If you have a robots.txt disallow in place for a page, the canonical tag will never be seen.
Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
-
Thanks Ryan for explaining things very clearly.
-
What we know is there have been many cases where a page that is blocked in robots.txt has appeared in search results. The explanation provided is that robots.txt blocks crawlers during normal site visits, but not necessarily on visits where they are following links from other sites.
-
If spiders follow links to an article on my site, will they read the contents then ? If the canonical tag is on article page itself, will canonical tag will be seen ?
-
Daylan offered a great answer but I would like to add one exception. When crawlers from the major SEs visit your site they will honor your robots.txt file but sometimes they will follow links from other sites to an article on your site, and during that particular visit they will not see the robots.txt file and index your page.
This is one of the reasons why your robots.txt file should be used as minimally as possible, and when it is used you should have a backup process in place such as the canonical or noindex tag on a page.
-
Thanks Daylan for your quick response. I just wanted a second opinion that canonical tag will never be seen if a page is disallowed.
-
Thats correct in most cases:
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: *
Disallow: /The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
Robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
More information available here about:
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonical tag use for ecommerce product page detail
Hi, I have a category page I want to rank. This page has 24 different products quite similar but not exactly the same.
Technical SEO | | amastone
I want to use canonical tag in any product to the parent category.
Is this a right use of the canonical?
Category page I'm talking about is : Finger bits If I understand how to use canonical tags I can improve all my category pages. thanks marco0 -
One server, two domains - robots.txt allow for one domain but not other?
Hello, I would like to create a single server with two domains pointing to it. Ex: domain1.com -> myserver.com/ domain2.com -> myserver.com/subfolder. The goal is to create two separate sites on one server. I would like the second domain ( /subfolder) to be fully indexed / SEO friendly and have the robots txt file allow search bots to crawl. However, the first domain (server root) I would like to keep non-indexed, and the robots.txt file disallowing any bots / indexing. Does anyone have any suggestions for the best way to tackle this one? Thanks!
Technical SEO | | Dave1000 -
Canonical and Alternate REL
Hi I have a website which is mostly dynamic content from a database. In the header of the site I have a function which outputs the rel="canonical" link and in some cases the canonical is the page the user is visiting and not another page on the site but I still show it in the source. However we have just recently launched our mobile website which is stored on an M DOT domain (i.e. m.mydomain.com) which has different URL's to my main website so following Google's recommendations we have created rel="alternate" links on my desktop site to point to the equivalent mobile pages and on the mobile pages I have created rel="canonical" links which point back to the relevant desktop site keeping everything tidy.
Technical SEO | | yousayjump
My question is, is there an issue with having both a rel="canonical" and rel="alternate" in the source of of a single page on my desktop site? Is it conflicting or detrimental in anyway? Thanks for reading0 -
Is there any value in having a blank robots.txt file?
I've read an audit where the writer recommended creating and uploading a blank robots.txt file, there was no current file in place. Is there any merit in having a blank robots.txt file? What is the minimum you would include in a basic robots.txt file?
Technical SEO | | NicDale0 -
Meta Title Tags
Hi, Are Meta Title Tag deemed by google to be unique if I use the same phrases by in a different order. For example 3 different pages <colgroup><col width="475"></colgroup>
Technical SEO | | Studio33
| Online Invoicing Software | Online Invoicing | Invoicing Software |
| Online Invoicing | Invoicing Software | Online Invoicing Software |
| Invoicing Software | Online Invoicing Software | Online Invoicing | You will not it is the same keywords just in a different order. Is this unique enough or will google not be happy about it. Thanks Andrew0 -
Logos and H1 Tags
Would you ever wrap a Logo in an H1 tag? The logo is an image, but is in an area that would cause it to make the most sense when forming my page into a proper hierarchy format. Thanks in advance for any help!
Technical SEO | | smilingbunny0 -
Tags - what keywords should i add ?
Hello 🙂 When I am adding tags to my post, what keywords should i use for tags and how many tags i should add per post ? Should i use keywords from title of post , focus keyword or something related to post ? my blog is http://www.dota2club.com/ Thank you !!!!
Technical SEO | | wolfinjo0 -
Robots.txt file question? NEver seen this command before
Hey Everyone! Perhaps someone can help me. I came across this command in the robots.txt file of our Canadian corporate domain. I looked around online but can't seem to find a definitive answer (slightly relevant). the command line is as follows: Disallow: /*?* I'm guessing this might have something to do with blocking php string searches on the site?. It might also have something to do with blocking sub-domains, but the "?" mark puzzles me 😞 Any help would be greatly appreciated! Thanks, Rob
Technical SEO | | RobMay0