Robots.txt file
-
Does it serve any purpose if we omit robots.txt file ? I wonder if spider has to read all the pages, why do we insert robots.txt file ?
-
As Ryan said, robots.txt file is very useful when you wanna block (disallow) some pages. Indeed, if you don't want that spider crawls your page you must use robots.txt (noindex tags will let bot crawls, but not index, your page). I have got a small website but i dropped robots.txt in my folder. Maybe write just Allow: / could be useless, but you can say: "I respect protocols"
-
A good source to learn about the robots.txt file is here: http://www.robotstxt.org/
The robots.txt file is completely optional. I don't use the file at all on small sites.
The file offers a means to block crawlers which choose to honor the file's instructions from crawling all or part of a site. It also provides the location of a sitemap.
To that end, sitemaps are completely unnecessary for SEO assuming your site has proper navigation. Even if you choose to use a sitemap, you can offer the location via WMT rather then the robots.txt file.
With respect to blocking areas of your site, the primary use would be for CMS, forums, ecommerce or other sites where the software was limited and does not allow the site owner to use noindex on all pages.
As a rule, robots.txt should simply never be used except as a means of last resort. In my experience the file is overused by site owners and SEOs. One exception where I use a robots.txt is during a site's development when I do not wish the site to be crawled at all.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Two Robots.txt files
Hi there Can somebody please help me that one of my client site have two robot.txt files (please see below). One txt file is blocked few folders and another one is blocked completely all the Search engines. Our tech team telling that due to some technical reasons they using second one which placed in inside the server and search engines unable to see this file. www.example.co.uk/robots.txt - Blocked few folderswww.example.co.uk/Robots.txt - Blocked all Search Engines I hope someone can give me the help I need in this one. Thanks in advance! Cheers,
On-Page Optimization | | TrulyTravel
Satla0 -
Meta Robots index & noindex Both Implemented on Website
I don't want few of the pages of website to get indexed by Google, thus I have implemented meta robots noindex code on those specific pages. Due to some complications I am not able to remove meta robots index from header of every page Now, on specific pages I have both codes 'index & noindex' implemented. Question is: Will Google crawl/index pages which have noindex code along with index code? Thanks!
On-Page Optimization | | Exa0 -
Can I robots.txt an entire site to get rid of Duplicate content?
I am in the process of implementing Zendesk and will have two separate Zendesk sites with the same content to serve two separate user groups (for the same product-- B2B and B2C). Zendesk does not allow me the option to changed canonicals (nor meta tags). If I robots.txt one of the Zendesk sites, will that cover me for duplicate content with Google? Is that a good option? Is there a better option. I will also have to change some of the canonicals on my site (mysite.com) to use the zendesk canonicals (zendesk.mysite.com) to avoid duplicate content. Will I lose ranking by changing the established page canonicals on my site go to the new subdomain (only option offered through Zendesk)? Thank you.
On-Page Optimization | | RoxBrock0 -
Is it redundant to include a redirect to my canonical domain (www) in my .htaccess file since I already have the correct rel="canonical" in my header?
I've been reading the benefits of each practice, but not found anyone mentioning whether it's really necessary to do both? Personally I try to stay clear of .htaccess rewrites unless it's absolutely necessary, since because I've read they can slow down a website.
On-Page Optimization | | HOPdigital0 -
Login webpage blocked by robots
Hi, the SEOMOZ crawl diagnostics shows that this page: www.tarifakitesurfcamp.com/wp-login.php is blocked (noindex, nofollow) Is there any problem with that?
On-Page Optimization | | juanmiguelcr0 -
Disallow a spammed sub-page from robots.txt
Hi, I have a sub-page on my website with a lot of spam links pointing on it. I was wondering if Google will ignore that spam links on my site if i go and hide this page using the robots.txt Does that will get me out of Google's randar on that page or its useless?
On-Page Optimization | | Lakiscy0 -
Robots.txt: excluding URL
Hi, spiders crawl some dynamic urls in my website (example: http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/ + http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/open=true) as different pages, resulting duplicate content of course. What is syntax for disallow these kind of urls in robots.txt? Thanks so much
On-Page Optimization | | anakyn0 -
Image file name, is it important
If I use the same image all over my site, do I need to change the file name to avoid duplicate? Different alt text will be use on those images
On-Page Optimization | | BigBlaze2050