Do I need robots.txt and meta robots?
-
If I can manage to tell crawlers what I do and don't want them to crawl for my whole site via my robots.txt file, do I still need meta robots instructions?
-
Older information, but mostly still relevant:
-
Although robots.txt and meta robots appear to do similar things, they both serve different functions.
Block with Robots.txt - This tells the engines to not crawl the given URL but tells them that they may keep the page in the index and display it in in results.
Block with Meta NoIndex - This tells engines they can visit but they are not allowed to display the URL in results. (this is a suggestion only - Google may still choose to show the URL)
Source: http://www.seomoz.org/learn-seo/robotstxt
The disadvantage of robots.txt is that it blocks Google from crawling the page, meaning no link juice can flow through the page, and if Google discovers the URL through other means (external links) it may show the URL anyway in search results, usually without a meta description.
The advantage of robots.txt is it can improve crawl efficiency - useful if you find Google crawling a bunch of unnecessary pages and eating up your crawl allowance.
Most of the time, I only use robots.txt to solve problems that I can't solve at the page level. I usually prefer to keep pages out of the index using a meta NOINDEX, FOLLOW tag.
-
If you want the stub listing removed as well, this is quite straight forward once you have it blocked in Robots. Instructions here: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1663419
Just checking though: If the content you are trying to remove is something private that should be hidden (as opposed to just low value stuff that you don't want cluttering the SERPS) then this isn't the right way to go about it. If that is the case reply back.
-
Hello Mat,
As far as I know if I blocked a url using robots.txt.For that page I will get only url in serps but i want to remove url from serps also.How to do that?
-
In short, no. You only need to include the instruction in one or the other. Most people find that the robots.txt file is the preferred solution because it will only take a few lines to specify which parts of a well structured site should and should not be crawled.
-
What do you mean by meta robots instructions? Are you referring to the meta tags that go on each individual page? In that case, no, you don't necessarily need them. Robots assume a page should be crawled unless told otherwise. I'd still do it for pages that you don't want indexed and/or followed because a lot of times, robots, especially Google, seem to ignore these directives.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocking pages from Moz and Alexa robots
Hello, We want to block all pages in this directory from Moz and Alexa robots - /slabinventory/search/ Here is an example page - https://www.msisurfaces.com/slabinventory/search/granite/giallo-fiesta/los-angeles-slabs/msi/ Let me know if this is a valid disallow for what I'm trying to. User-agent: ia_archiver
Technical SEO | | Pushm
Disallow: /slabinventory/search/* User-agent: rogerbot
Disallow: /slabinventory/search/* Thanks.0 -
Robots.txt and Multiple Sitemaps
Hello, I have a hopefully simple question but I wanted to ask to get a "second opinion" on what to do in this situation. I am working on a clients robots.txt and we have multiple sitemaps. Using yoast I have my sitemap_index.xml and I also have a sitemap-image.xml I do put them in google and bing by hand but wanted to have it added into the robots.txt for insurance. So my question is, when having multiple sitemaps called out on a robots.txt file does it matter if one is before the other? From my reading it looks like you can have multiple sitemaps called out, but I wasn't sure the best practice when writing it up in the file. Example: User-agent: * Disallow: Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-content/plugins/ Sitemap: http://sitename.com/sitemap_index.xml Sitemap: http://sitename.com/sitemap-image.xml Thanks a ton for the feedback, I really appreciate it! :) J
Technical SEO | | allstatetransmission0 -
Robots.txt - What is the correct syntax?
Hello everyone I have the following link: http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167 I want to prevent google from indiexing everything that is related to "view=send_friend" The problem is that its giving me dublicate content, and the content of the links has no SEO value of any sort. My problem is how i disallow it correctly via robots.txt I tried this syntax: Disallow: /view=send_friend/ However after doing a crawl on request the 200+ dublicate links that contains view=send_friend is still present in the CSV crawl report. What is the correct syntax if i want to prevent google from indexing everything that is related to this kind of link?
Technical SEO | | teleman0 -
Site Got Hacked! Need Help!
Hi Guys. One of my friend's site got hacked 2 weeks ago, because of bad php script hole and Google indexed the pages which got hacked and all the Title Tags and Descriptions are indexed in the Google which is very embarssing situation. All adult content texts. Right now we have solved the problem and closed the hole submitted the new sitemap, but Google is no longer coming back and refreshining the SERP. We have been waiting for 3 weeks for now? What should we do? Methods we tried so far: 1.Cleaned all meta tags generate new sitemap and submitted that to Google 2.Built some backlinks 3.Built some social bookmarks Thanks!
Technical SEO | | DigitalJungle0 -
Need Help with MAGENTO - URL rewrite
Hello... Hopefully a Magento expert will stumble across this question and help me out. I have noticed that my site is no longer as prominent as it once was for specific product pages... I am looking for help in rewriting the URL's for the product pages. I want it to have xyz.com/product (which exists if you hard code it into the site) If you wind up on the product by clicking throught the categories the url looks like: xyz.com/category/subcategory/product. Does anyone know how to make it so when you land on a product page it is just xyz.com/product ? My Site is : http://goo.gl/JgK1e Thanks for the help...
Technical SEO | | Prime850 -
Well, I need some help, advice, something.
Hey all, I'm new to the SEOmoz thing but I like it so far. I think I have my site listing so messed up that it's effecting my rank. I have 3 domains. 1.) rt112media.com 2.) route112media.com 3.) route112.net. Each domain was purchased through GoDaddy.com and still remain there. I have my own hosting account which I was registered as rt112media.com with route112media.com and route112.net listed as add on domains. Technically, I would like for my main site to be route112media.com for everything. However when I registered the site as rt112media.com I didn't know the issues I would have as far as different domains so I registered with rt112media.com as my main domain name. Anyways, as of now I have rt112media.com as my main domain through my cpanel hosting.I have both domains route112media.com and route112.net set for 301 wildcard redirects to rt112media.com on my hosting account and my GoDaddy account. When I started my WMT account I didn't really know which domain to use cause I figured I could link them all to one. So, I signed up as routet12media.com. After a little while I realized it was not recieving anything because everything was being redirected to rt112media.com Anyways both addresses have been crawled and indexed so they are showing as two. So, I requested to change the route112media.com address to rt112media.com in WMT. That was about 2 weeks ago and it is still pending request. I'm not having further problems with WMT because of the www.rt112media.com vs http://rt112media.com. I am the verified owner of both but I can not switch the www.rt112media account to show the non www. account as the main one because I have the other pending. My site is still being crawled as 2 versions rt112media.com and route112media.com. So what is my best option? And what would be the worst cause scenario if I wanted to start completely over using route112media.com as my main domain with hosting and all. Sorry this was so long I just wanted to explain my situation. I'm lost. Any advice would be appreciated! http:/rt112media.com
Technical SEO | | Route112Media0 -
Meta tags - better NOT to have?
OK ok . . . the SEOMox report card told me it's actually better NOT to have meta tag keywords on my page, because my competitors can then look at my page to see what words I am trying to target . . . That makes since, but is also painfully counter intuitive. I thought I would just double check and make sure . .. NO META TAGS KEYWORDS? and if so . . .. what (if anything) should I have in the meta tags?
Technical SEO | | damon12120 -
Robots.txt
My campaign hse24 (www.hse24.de) is not being crawled any more ... Do you think this can be a problem of the robots.txt? I always thought that Google and friends are interpretating the file correct, seen that he site was crawled since last week. Thanks a lot Bernd NB: Here is the robots.txt: User-Agent: * Disallow: / User-agent: Googlebot User-agent: Googlebot-Image User-agent: Googlebot-Mobile User-agent: MSNBot User-agent: Slurp User-agent: yahoo-mmcrawler User-agent: psbot Disallow: /is-bin/ Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-DE-Site/de_DE/-/EUR/hse24_Storefront-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-AT-Site/de_DE/-/EUR/hse24_Storefront-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-CH-Site/de_DE/-/CHF/hse24_Storefront-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-DE-Site/de_DE/-/EUR/hse24_DisplayProductInformation-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-AT-Site/de_DE/-/EUR/hse24_DisplayProductInformation-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-CH-Site/de_DE/-/CHF/hse24_DisplayProductInformation-Start Allow: /is-bin/intershop.static/WFS/HSE24-Site/-/Editions/ Allow: /is-bin/intershop.static/WFS/HSE24-Site/-/Editions/Root%20Edition/units/HSE24/Beratung/
Technical SEO | | remino630