Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt and canonical tag
-
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said -
If you have a robots.txt disallow in place for a page, the canonical tag will never be seen.
Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
-
Thanks Ryan for explaining things very clearly.
-
What we know is there have been many cases where a page that is blocked in robots.txt has appeared in search results. The explanation provided is that robots.txt blocks crawlers during normal site visits, but not necessarily on visits where they are following links from other sites.
-
If spiders follow links to an article on my site, will they read the contents then ? If the canonical tag is on article page itself, will canonical tag will be seen ?
-
Daylan offered a great answer but I would like to add one exception. When crawlers from the major SEs visit your site they will honor your robots.txt file but sometimes they will follow links from other sites to an article on your site, and during that particular visit they will not see the robots.txt file and index your page.
This is one of the reasons why your robots.txt file should be used as minimally as possible, and when it is used you should have a backup process in place such as the canonical or noindex tag on a page.
-
Thanks Daylan for your quick response. I just wanted a second opinion that canonical tag will never be seen if a page is disallowed.
-
Thats correct in most cases:
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: *
Disallow: /The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
Robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
More information available here about:
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What is the correct Canonical tag on m.site?
We have 2 separate sites for desktop (www.example.com) and mobile (m.example.com) As per the guideline, we have added Rel=alternate tag on www.example.com to point to mobile URL(m.example.com) and Rel=canonical tag on m.example.com to point to Desktop site(www.example.com).However, i didn't find any guideline on what canonical tag we should add ifFor Desktop sitewww.example.com/PageA - has a canonical tag to www.example.com/PageBOn this page, we have a Rel=alternate tag m.example.com/pageAWhat will be the canonical we should add for the mobile version of Page Am.example.com/PageA - Canonical tag point to www.example.com/PageA -or www.example.com/PageB?Kalpesh
Technical SEO | | kguard0 -
Robots.txt in subfolders and hreflang issues
A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations: UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
Technical SEO | | lauralou82
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txt We've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US. They have the following hreflang tags across all pages: We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously). Search Console says there are no hreflang tags at all. Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location. Any suggestions how we can remove UK listings from Google US and vice versa?0 -
Canonical tag use for ecommerce product page detail
Hi, I have a category page I want to rank. This page has 24 different products quite similar but not exactly the same.
Technical SEO | | amastone
I want to use canonical tag in any product to the parent category.
Is this a right use of the canonical?
Category page I'm talking about is : Finger bits If I understand how to use canonical tags I can improve all my category pages. thanks marco0 -
Do you need a canonical tag for search and filter pages?
Hi Moz Community, We've been implementing new canonical tags for our category pages but I have a question about pages that are found via search and our filtering options. Would we still need a canonical tag for pages that show up in search + a filter option if it only lists one page of items? Example below. www.uncommongoods.com/search.html/find/?q=dog&exclusive=1 Thanks!
Technical SEO | | znotes0 -
Isnt it better to have headlines in H1 and H2 tags instead of p tags?
I am working with a simple site http://http://lightsigns.com/Uniko_Manufacturing_Limited.html They seek more SEO traffic. However, the two big headlines that read "Wholesale Supply to the Sign and Display Industries" which is on line 241 and 242 of the source code, its in a p tag, i.e. <p <span class="webkit-html-tag">style</p <span>="padding-top: 0pt; " class="paragraph_style_1">Wholesale Supply to the and <p <span class="webkit-html-tag">style</p <span>="padding-bottom: 0pt; " class="paragraph_style_1">Sign and Display Industries Likewise, the product titles are in p tags, also. For example, on the Slide-in Light Box product page, http://lightsigns.com/Slide_In_light_box.html , I have done keyword research and no one is using the words slide in light box.Plus, it is also a p tag, ie. line 43 reads style="padding-bottom: 0pt; padding-top: 0pt; " class="paragraph_style">Slide-in Light Box If I suggest that they make an H2 tag with SEO-optimized keywords such as Display Light Box - Slide-In LIght Box, would this indeed help SEO? In summary, is it correct to say that H1 and H2 tags are stronger signals to the search bots of what the page is about?
Technical SEO | | BridgetGibbons1 -
• symbol in title tag
We have a few title tags with a circular dot symbol, which is created by the code "•" Humans see a dot, but googlebot sees • Does this negatively impact our SEO, or is googlebot aware that **• == *** to human eyes
Technical SEO | | lighttable0 -
Can I Disallow Faceted Nav URLs - Robots.txt
I have been disallowing /*? So I know that works without affecting crawling. I am wondering if I can disallow the faceted nav urls. So disallow: /category.html/? /category2.html/? /category3.html/*? To prevent the price faceted url from being cached: /category.html?price=1%2C1000
Technical SEO | | tylerfraser
and
/category.html?price=1%2C1000&product_material=88 Thanks!0 -
Google Off/On Tags
I came across this article about telling google not to crawl a portion of a webpage, but I never hear anyone in the SEO community talk about them. http://perishablepress.com/press/2009/08/23/tell-google-to-not-index-certain-parts-of-your-page/ Does anyone use these and find them to be effective? If not, how do you suggest noindexing/canonicalizing a portion of a page to avoid duplicate content that shows up on multiple pages?
Technical SEO | | Hakkasan1