How do I use the Robots.txt "disallow" command properly for folders I don't want indexed?
-
Today's sitemap webinar made me think about the disallow feature, seems opposite of sitemaps, but it also seems both are kind of ignored in varying ways by the engines.
I don't need help semantically, I got that part. I just can't seem to find a contemporary answer about what should be blocked using the robots.txt file.
For example, I have folders containing site comps for clients that I really don't want showing up in the SERPS. Is it better to not have these folders on the domain at all?
There are also security issues I've heard of that make sense, simply look at a site's robots file to see what they are hiding. It makes it easier to hunt for files when they know the directory the files are contained in. Do I concern myself with this?
Another example is a folder I have for my xml sitemap generator. I imagine google isn't going to try to index this or count it as content, so do I need to add folders like this to the disallow list?
-
Hi,
Usin;
User-agent: *
Disallow: /folder/subfolderis fine, however if you have information stored in your website that you certainly want crawled make sure it is in your site map and use ...
User-agent: *
allow: /folder/subfolderadding a no follow attribute to all of your pages wont be practical, if a spam crawler ignores the robots.txt it will ignore your no follow attribute. If anything new occurs with robots.txt check large website's robots.txt as they always update to new trends i.e
Hope this helps:)
-
Hi Jay,
There's actually a recent similar discussion at http://www.seomoz.org/q/what-reasons-exist-to-use-noindex-robots-txt regarding deciding what to block via robots.
For site comps for clients, you could also password-protect those to help hide them, or do a different domain that you have entirely excluded in robots. I've also seen services like Basecamp used for posting comps. It all depends on how much you want to hide the comps.
You do want your sitemap itself to be crawled, but I'm presuming this is in the root directory so that shouldn't be a problem. Folders like your sitemap generator and other purely-framework folders can certainly be disallowed. Blocking the files that list the version of your website (if you're using a CMS) can help prevent people from searching for opportunities to hack that version and finding your site.
Also, just do a site:domain.com search on your domain, see what's indexed, see what content from there you don't want indexed, and use that as a starting point.
Are you running on a content management system, or a custom site? For a CMS, here are example robots.txt files for several popular CMSs. http://www.stayonsearch.com/robots-txt-guide
-
You may also want to think about slapping a robots noindex on the individual pages as well.
-
You can type the following syntax:
after User-agent: *
Disallow: /foldername/subfoldername
also, you can name your sitemaps in the robots.txt file.
They can be defined as
Sitemap: http://www.yourdomain.com/sitemap.xml
If you have multiple sitemaps, you can have multiple sitemaps listed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Are "appliance repair" and "appliance repair los angeles" consider the same keyword?
Hello, I know that you can't optimize two pages for 1 keyword because Google will get confused and will rather prefer my competitor. But I can't get if it will consider "appliance repair" and "appliance repair los angeles" same keywords? The homepage of my website, https://www.ifixappliancesla.com, is optimized for "appliance repair", one of the inner pages is optimized for "appliance repair los angeles". None of them shows on the first page in local SERPs for any of those quires. I am wondering if this is because Google sees it as both pages are optimized for "appliance repair"?
Technical SEO | | VELV0 -
Specific pages won't index
I have a few pages on my site that Google won't index, and I can't understand why. I've looked into possible issues with Robots, noindex, redirects, canonicals, and Search Console rules. I've got nothing. Example: I want this page to index https://tour.franchisebusinessreview.com/services/franchisee-satisfaction-surveys/ When I Google the full URL, I get results including the non-subdomain homepage, and various pages on the subdomain, including a child page of the page I want, but not the page itself. Any ideas? Thanks for the help!
Technical SEO | | ericstites0 -
Do you get penalized in search results when you use a heading tag, but it's not technically a heading (used for emphasis)?
Do you get penalized in search results when you use a heading tag, but it's not technically a heading? My clients are using heading tags for text they want to emphasize and make stand out. Does this affect search rankings for SEO?
Technical SEO | | jthompson05130 -
Best way to create robots.txt for my website
How I can create robots.txt file for my website guitarcontrol.com ? It is having login and Guitar lessons.
Technical SEO | | zoe.wilson170 -
Using Wix and the keyword grader tool isn't working.
After reading other posts about Wix and SEO I think I need to change the web design provider to something I have more control of SEO options. Does anyone have any suggestions of something I can use?
Technical SEO | | benjacksoncook0 -
Yoast's Magento Guide "Nofollowing unnecessary link" is that really a good idea?
I have been following Yoast's Magento guide here: https://yoast.com/articles/magento-seo/ Under section 3.2 it says: Nofollowing unnecessary links Another easy step to increase your Magento SEO is to stop linking to your login, checkout, wishlist, and all other non-content pages. The same goes for your RSS feeds, layered navigation, add to wishlist, add to compare etc. I always thought that nofollowing internal links is a bad idea as it just throwing link juice out the window. Why would Yoast recommend to do this? To me they are suggesting link sculpting via nofollowing but that has not worked since 2009!
Technical SEO | | PaddyDisplays0 -
Lots of Pages Dropped Out of Google's Index?
Until yesterday, my website had about 1200 pages indexed in Google. I did lots of changes: removed low quality content, rewrote passable content to make it better, wrote high quality content, got lots of likes and shares on social networks, etc. Now this morning I see that out of 1252 pages submitted, only 691 are indexed. Is that a temporary situation related to the recent updates? Anyone seeing this? What should I interpret about this?
Technical SEO | | sbrault740 -
Robots.txt file
How do i get Google to stop indexing my old pages and start indexing my new pages even months down the line? Do i need to install a Robots.txt file on each page?
Technical SEO | | gimes0