Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Blocking other engines in robots.txt

Technical SEO

875

Romancing last edited by

If your primary target of business is not in China is their any benefit to blocking Chinese search robots in robots.txt?
1 Reply Last reply
Reply Quote 0
RyanKent last edited by

I don't see any benefit to blocking search engines with robots.txt with the exception of Bing or Google as necessary.

Robots.txt is strictly a suggestion to those crawlers who care enough to respect your wishes.

The only benefit it can offer is IF a crawler chooses to respect your wishes, then your site will have a bit less traffic volume during the crawl. The reality is any specific crawler from one of the many random companies is going to visit so infrequently it wont make any noticeable difference.
1 Reply Last reply
Reply Quote 2

Got a burning SEO question?

Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.

Start my free trial

Browse Questions

View

From

Sorted by

With category

Explore more categories

Related Questions

Should I add my html sitemap to Robots?

I have already added the .xml to Robots. But should I also add the html version?
Technical SEO | | Trazo

0
Blocking Test Pages Enmasse on Sub-domain

Hello, We have thousands of test pages on a sub-domain of our site. Unfortunately at some point, these pages were visible to search engines and got indexed. Subsequently, we made a change to the robots.txt file for the test sub-domain. Gradually, over a period of a few weeks, the impressions and clicks as reported by Google Webmaster Tools fell off for the test. sub-domain. We are not able to implement the no index tag in the head section of the pages given the limitations of our CMS. Would blocking off Google bot via the firewall enmasse for all the test pages have any negative consequences for the main domain that houses the real live content for our sites (which we would like to of course remain in the Google index). Many thanks
Technical SEO | | CeeC-Blogger

0
Accidentally blocked Googlebot for 14 days

Today after I noticed a huge drop in organic traffic to inner pages of my sites, I looked into the code and realized a bug in last commit cause the server to showing captcha pages to all Googlebot requests from Apr 24. My site has more than 4,000,000 in the index. Before last code change, Googlebot are exempt from being shown the captcha requests so each inner pages are crawled and indexed perfectly with no problem. The bug broke the whitelisting mechanism and treat requests from Google's ip addresses the same as regular users. It leads to the captcha page being crawled when Googlebot visits thousands of my site's inner pages. This makes Google thinks all my inner pages are identical to each other. Google remove all the inner pages from SERP starting from May 5th before when many of those inner pages have good rankings. I formerly thought this was a manual or algorithm penalty but 1. I did not receive a warning message in GWT
2. The ranking for main url is good. I tried with "Fetch as Google" in GWT and realize all Googlebot saw in the past 14 days are the same captcha page for all my inner pages. Now, I have fixed the bug and updated the production site. I just wanted to ask: 1. How long will it take for Google to remove the "duplicated content" flag on my inner pages and show them in SERP again? From my experience, Googlebot revisits urls quite often. But once a url is flagged as "contains similar content", it could be difficult to recover, is it correct? 2. Besides waiting for Google to update its index, what else can I do right now? Thanks in advance for your answers.
Technical SEO | | Bull135

0
Hello every one please give me idea about offpage techniques for keyword ranking for first page on top 5 all search engine ,and it must be durable for long time. If i will not use on page optimization?

keyword ranking by offpage techniques please provide appropriate answer.
Technical SEO | | debal

0
Confirming Robots.txt code deep Directories

Just want to make sure I understand exactly what I am doing If I place this in my Robots.txt Disallow: /root/this/that By doing this I want to make sure that I am ONLY blocking the directory /that/ and anything in front of that. I want to make sure that /root/this/ still stays in the index, its just the that directory I want gone. Am I correct in understanding this?
Technical SEO | | cbielich

0
Robots txt

We have a development site that we want google and other bots to stay out of but we want roger to have access. Currently our robots.txt looks like this: User-agent: *
Disallow: /cgi-bin/
Disallow: /development/ What would i need to addd or change to let him through? Thank you.
Technical SEO | | LadyApollo

0
Robots.txt question

Hello, What does the following command mean - User-agent: * Allow: / Does it mean that we are blocking all spiders ? Is Allow supported in robots.txt ? Thanks
Technical SEO | | seoug_2005

0
Robots.txt file question? NEver seen this command before

Hey Everyone! Perhaps someone can help me. I came across this command in the robots.txt file of our Canadian corporate domain. I looked around online but can't seem to find a definitive answer (slightly relevant). the command line is as follows: Disallow: /*?* I'm guessing this might have something to do with blocking php string searches on the site?. It might also have something to do with blocking sub-domains, but the "?" mark puzzles me 😞 Any help would be greatly appreciated! Thanks, Rob
Technical SEO | | RobMay

0