Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Blocking other engines in robots.txt

Technical SEO

875

Romancing last edited by

If your primary target of business is not in China is their any benefit to blocking Chinese search robots in robots.txt?
1 Reply Last reply
Reply Quote 0
RyanKent last edited by

I don't see any benefit to blocking search engines with robots.txt with the exception of Bing or Google as necessary.

Robots.txt is strictly a suggestion to those crawlers who care enough to respect your wishes.

The only benefit it can offer is IF a crawler chooses to respect your wishes, then your site will have a bit less traffic volume during the crawl. The reality is any specific crawler from one of the many random companies is going to visit so infrequently it wont make any noticeable difference.
1 Reply Last reply
Reply Quote 2

Got a burning SEO question?

Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.

Start my free trial

Browse Questions

View

From

Sorted by

With category

Explore more categories

Related Questions

Blocking Google from telemetry requests

At Magnet.me we track the items people are viewing in order to optimize our recommendations. As such we fire POST requests back to our backends every few seconds when enough user initiated actions have happened (think about scrolling for example). In order to eliminate bots from distorting statistics we ignore their values serverside. Based on some internal logging, we see that Googlebot is also performing these POST requests in its javascript crawling. In a 7 day period, that amounts to around 800k POST requests. As we are ignoring that data anyhow, and it is quite a number, we considered reducing this for bots. Though, we had several questions about this:
1. Do these requests count towards crawl budgets?
2. If they do, and we'd want to prevent this from happening: what would be the preferred option? Either preventing the request in the frontend code, or blocking the request using a robots.txt line? The latter question is given by the fact that a in-app block for the request could lead to different behaviour for users and bots, and may be Google could penalize that as cloaking. The latter is slightly less convenient from a development perspective, as all logic is spread throughout the application. I'm aware one should not cloak, or makes pages appear differently to search engine crawlers. However these requests do not change anything in the pages behaviour, and purely send some anonymous data so we can improve future recommendations.
Technical SEO | | rogier_slag

0
Our original content is being outranked on search engines by smaller sites republishing our content.

We a media site, www.hope1032.com.au that publishes daily content on the WordPress platform using the Yoast SEO plugin. We allow smaller media sites to republish some of our content with canonical field using our URL. We have discovered some of our content is now ranking below Or not visible on some search engines when searching for the article heading. Any thoughts as to why? Have we got an SEO proble? An interesting point is the small amount of content we have republished is not ranking against the original author on search engines.
Technical SEO | | Hope-Media

0
Robots.txt in subfolders and hreflang issues

A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations: UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txt We've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US. They have the following hreflang tags across all pages: We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously). Search Console says there are no hreflang tags at all. Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location. Any suggestions how we can remove UK listings from Google US and vice versa?
Technical SEO | | lauralou82

0
Robots.txt Syntax for Dynamic URLs

I want to Disallow certain dynamic pages in robots.txt and am unsure of the proper syntax. The pages I want to disallow all include the string ?Page= Which is the proper syntax?
Disallow: ?Page=
Disallow: ?Page=*
Disallow: ?Page=
Or something else?
Technical SEO | | btreloar

0
Blocking subdomains without blocking sites...

So let's say I am working for bloggingplatform.com, and people can create free sites through my tools and those sites show up as myblog.bloggingplatform.com. However that site can also be accessed from myblog.com. Is there a way, separate from editing the myblog.com site code or files, for me to tell google to stop indexing myblog.bloggingplatform.com while still letting them index myblog.com without inserting any code into the page load? This is a simplification of a problem I am running across. Basically, Google is associating subdomains to my domain that it shouldn't even index, and it is adversely affecting my main domain. Other than contacting the offending sub-domain holders (which we do), I am looking for a way to stop Google from indexing those domains at all (they are used for technical purposes, and not for users to find the sites). Thoughts?
Technical SEO | | SL_SEM

1
Robots.txt & Mobile Site

Background - Our mobile site is on the same domain as our main site. We use a folder approach for our mobile site abc.com/m/home.html We are re-directing traffic to our mobile site vie device detection and re-direction exists for a handful of pages of our site ie most of our pages do not redirect the user to a mobile equivalent page. Issue – Our mobile pages are being indexed in desktop Google searches Input Required – How should we modify our robots.txt so that the desktop google index does not index our mobile pages/urls User-agent: Googlebot-Mobile Disallow: /m User-agent: `YahooSeeker/M1A1-R2D2` Disallow: /m User-agent: `MSNBOT_Mobile` Disallow: /m Many thanks
Technical SEO | | CeeC-Blogger

0
How can I prevent sh404SEF Anti-flood control from blocking SEOMoz?

I'm using sh404SEF on my Joomla 1.5 website. Last week, I activated the security functions of the tool, which includes an anti-flood control feature. This morning when I looked at my new crawl statistics in SEOMoz, I noticed a significant drop in the number of webpages crawled, and I'm attributing that to the security configurations that I made earlier in the week. I'm looking for a way to prevent this from happening so the next crawl is accurate. I was thinking of using sh404SEFs "UserAgent white list" feature. Does SEOMoz have a UserAgent string that I could try adding to my white list? Is this what you guys recommend as a solution to this problem?
Technical SEO | | JBradySD

0
How can I exclude display ads from robots.txt?

Google has stated that you can do this to get spiders to content only, and faster. Our IT guy is saying it's impossible.
Do you know how to exlude display ads from robots.txt? Any help would be much appreciated.
Technical SEO | | GregBeddor

0