Robots.txt and canonical tag
-
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said -
If you have a robots.txt disallow in place for a page, the canonical tag will never be seen.
Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
-
Thanks Ryan for explaining things very clearly.
-
What we know is there have been many cases where a page that is blocked in robots.txt has appeared in search results. The explanation provided is that robots.txt blocks crawlers during normal site visits, but not necessarily on visits where they are following links from other sites.
-
If spiders follow links to an article on my site, will they read the contents then ? If the canonical tag is on article page itself, will canonical tag will be seen ?
-
Daylan offered a great answer but I would like to add one exception. When crawlers from the major SEs visit your site they will honor your robots.txt file but sometimes they will follow links from other sites to an article on your site, and during that particular visit they will not see the robots.txt file and index your page.
This is one of the reasons why your robots.txt file should be used as minimally as possible, and when it is used you should have a backup process in place such as the canonical or noindex tag on a page.
-
Thanks Daylan for your quick response. I just wanted a second opinion that canonical tag will never be seen if a page is disallowed.
-
Thats correct in most cases:
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: *
Disallow: /The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
Robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
More information available here about:
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Multiple robots.txt files on server
Hi! I have previously hired a developer to put up my site and noticed afterwards that he did not know much about SEO. This lead me to starting to learn myself and applying some changes step by step. One of the things I am currently doing is inserting sitemap reference in robots.txt file (which was not there before). But just now when I wanted to upload the file via FTP to my server I found multiple ones - in different sizes - and I dont know what to do with them? Can I remove them? I have downloaded and opened them and they seem to be 2 textfiles and 2 dupplicates. Names: robots.txt (original dupplicate)
Technical SEO | | mjukhud
robots.txt-Original (original)
robots.txt-NEW (other content)
robots.txt-Working (other content dupplicate) Would really appreciate help and expertise suggestions. Thanks!0 -
Tag archives in wordpress
I have duplicate content issue on my site, because i allow to index tags in my wordpress. And the content overlaps on them. What could be a solution to this? How do i fight it, if still want my tag pages to be indexed in Google, but i don't want to to influence my traffic negatively? Currently i have 596 tags! 🙂 Site:
Technical SEO | | pycckuu
richclubgirl.com My idea was to put canonical tag for the post i want to rank from the most popular tag pages (with biggest page authority). Would love to hear from You!1 -
Canonical URL
I previously set the canonical Url in google web masters to the non www version, when I check my on page opt, it tells me that I have a critical issue with this. Should I change it in google web masters back to the www version? if so is there the possibility of negative results? Or is there a better way to deal with this? Note, I have inbound links pointing to both types.
Technical SEO | | bronxpad0 -
Canonical warnings
[1] My site development tool (XSP) has recently added the canonical reference as an auto-generated tag, so every page of my site now has it. Why is SEOmoz warning me that I have hundreds of pages of canonicals if it's supposed to be a GOOD thing? [2] Google is still seeing the pages without the canonical tag because that's how they were indexed. Will they eventually get purged from their index, or should I be proactive about that, and if so, how? Thanks for any input.
Technical SEO | | PatioLifeStyle0 -
Canonical Tag Here?
Hello, I have a client who I have taken on (different to my other client in another question), My client has a ecommerce website and in nearly all of his products (around 30-40) he has a little information checklist like.. Made in the UK
Technical SEO | | Prestige-SEO
Prices from 9.99
Top quality
Free delivery on orders over.. This is the duplicate content, what is the best practise for this as the SEOmoz crawler is giving me a multiple of errors.0 -
Robots.txt versus sitemap
Hi everyone, Lets say we have a robots.txt that disallows specific folders on our website, but a sitemap submitted in Google Webmaster Tools that lists content in those folders. Who wins? Will the sitemap content get indexed even if it's blocked by robots.txt? I know content that is blocked by robot.txt can still get indexed and display a URL if Google discovers it via a link so I'm wondering if that would happen in this scenario too. Thanks!
Technical SEO | | anthematic0 -
Should rel canonical tags include the root domain
It does sound like a silly question but bear with me a little... I recently installed on my Joomla website a module that automatically creates rel canonical tags for pages that contain lists that can be sorted by different criteria: (price, alphabetic order, etc...) I know that a proper canonical tag should look like this: However, the module I'm using creates the following structure Will this work? I mean, will it be "understood" by the bots? To see what the module actually does, you can visit the following link http://www.quipeutlefaire.fr/fr/index.php?sort=price&sort_direction=desc&limit=10&limitstart=0&option=com_auctions&category=240 In the source code you will see that the canonical tag is Which is the original "unsorted" page. Thanks in advance for your help
Technical SEO | | QPLF0 -
Canonical Tag
Does it do anything to place the Canonical tag on the unique page itself? I thought this was only to be used on the offending pages that are the copies. Thanks
Technical SEO | | poolguy0