Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Baidu Spider appearing on robots.txt
-
Hi, I'm not too sure what to do about this or what to think of it.
This magically appeared in my companies robots.txt file (literally magically appeared/text is below)
User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: /I know that Baidu is the Google of China, but I'm not sure why this would appear in our robots.txt all of a sudden. Should I be worried about a hack? Also, would I want to disallow Baidu from crawling my companies website?
Thanks for your help,
-Reed -
Thanks for your help Travis, that was a really solid answer.
-
There's a possibility someone in your company saw suspicious traffic from an actor spoofing the Baidu user agent. It can get so aggressive that it will eventually bog down your response time through sheer number of requests. But the problem is that same actor, or someone else with malicious intent can simply spoof another user agent or IP.
But the main problem is, the site is straight e-commerce. It could get international business, so why take such a ham fist approach? Even if blocking Baidu gave the desired result, the dev/admin would still have to block individual IP blocks as they come in. It would make more sense to invest in server resources so it can handle the load, or look into DDos Mitigation.
So yeah, it's strange. Though it's more likely a lack of understanding than anything malicious.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt blocked internal resources Wordpress
Hi all, We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one: User-agent: *
Intermediate & Advanced SEO | | Mat_C
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.php However, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts. Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO? Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index? Thanks for your thoughts!2 -
Block session id URLs with robots.txt
Hi, I would like to block all URLs with the parameter '?filter=' from being crawled by including them in the robots.txt. Which directive should I use: User-agent: *
Intermediate & Advanced SEO | | Mat_C
Disallow: ?filter= or User-agent: *
Disallow: /?filter= In other words, is the forward slash in the beginning of the disallow directive necessary? Thanks!1 -
Homepage appearing instead of subpage
Hi, I have my homepage which has links saying "bike tours and bike tours in France" because in the past I was only doing bike tours in France. I now do tours all over Europe and I have a page about "bike tours in Franc" only. The issue I have is that my page about "bike tours in France" never appears in the search ranking it is always my homepage that does for tjhe keyword "bike tours in France". My guess is that it is due to the links that my homepage has ? How could I make sure my France page appears instead of my homepage ? Thank you,
Intermediate & Advanced SEO | | seoanalytics0 -
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
Intermediate & Advanced SEO | | Gabriele_Layoutweb0 -
Disallow URLs ENDING with certain values in robots.txt?
Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually?
Intermediate & Advanced SEO | | jmorehouse0 -
How to handle a blog subdomain on the main sitemap and robots file?
Hi, I have some confusion about how our blog subdomain is handled in our sitemap. We have our main website, example.com, and our blog, blog.example.com. Should we list the blog subdomain URL in our main sitemap? In other words, is listing a subdomain allowed in the root sitemap? What does the final structure look like in terms of the sitemap and robots file? Specifically: **example.com/sitemap.xml ** would I include a link to our blog subdomain (blog.example.com)? example.com/robots.xml would I include a link to BOTH our main sitemap and blog sitemap? blog.example.com/sitemap.xml would I include a link to our main website URL (even though it's not a subdomain)? blog.example.com/robots.xml does a subdomain need its own robots file? I'm a technical SEO and understand the mechanics of much of on-page SEO.... but for some reason I never found an answer to this specific question and I am wondering how the pros do it. I appreciate your help with this.
Intermediate & Advanced SEO | | seo.owl0 -
How to NOT appear in Google results in other countries?
I have ecommerce sites the only serve US and Canada. Is there a way to prevent a site from appearing in the Google results in foreign countries? The reason I ask is that we also have a lot of informational pages that folks in other countries are visiting, then leaving right after reading. This is making our overall Bounce Rate very high (64%). When we segment the GA data to look at just our US visitors, then the Bounce Rate drops a lot. (to 48%) Thanks!
Intermediate & Advanced SEO | | GregB1230 -
Robots.txt is blocking Wordpress Pages from Googlebot?
I have a robots.txt file on my server, which I did not develop, it was done by the web designer at the company before me. Then there is a word press plugin that generates a robots.txt file. How Do I unblock all the wordpress pages from googlebot?
Intermediate & Advanced SEO | | ENSO0