Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Block Baidu crawler?
-
Hello!
One of our websites receives a large amount of traffic from the Baidu crawler. We do not have any Chinese content or do any business with China since our market is Uk.
Is it a good idea to block the Baidu crawler in the robots.txt or could it have any adverse effects on SEO of our site?
What do you suggest?
-
I'm also trying to get this done as well, not sure if its doable on Volusion(don't use them).
Yandex actually crawls more than Baidu for me, and both don't benefit me at all(sucks when you pay for the bandwidth)
-
Thanks for that I have just looked that up-I didn't realise that this was such a common problem.
-
Hi
Further to Ally's answer, in my experiance Baidu tends to ignor the robot.txt, so just do it on the server side.
S
-
Thanks Ally for your answer, will now block Baidu
-
Hi Stefan,
You can block the Baidu crawler in in the robots.txt.
There should be no adverse affect to your site. As this is not an area you are targeting and has no future long term benerfit to your business. Blocking the crawler will mean that your server has less load to deal with from the unnecessary traffic you have been receiving.
You can block the spiders in the following ways:
- Robots.txt (below is code for Baidu)
User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: /- Blocking Spiders via the Apache Configuration File httpd.conf
See the below article for more details on this method
http://searchenginewatch.com/article/2067357/Bye-bye-Crawler-Blocking-the-Parasites
You may also want to check out:
I hope this helps,
Ally
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz crawler is not able to crawl my website
Hi, i need help regarding Moz Can't Crawl Your Site i also share screenshot that Moz was unable to crawl your site on Mar 26, 2022. Our crawler was not able to access the robots.txt file on your site. This often occurs because of a server error from the robots.txt. Although this may have been caused by a temporary outage, we recommend making sure your robots.txt file is accessible and that your network and server are working correctly. Typically errors like this should be investigated and fixed by the site webmaster.
Technical SEO | | JasonTorney
my robts.txt also ok i checked it
Here is my website https://whiskcreative.com.au
just check it please as soon as possibe0 -
Robot.txt : How to block a specific file type in several subdirectories ?
Hello everyone ! I need help setting up a robot.txt. I'm trying to block all pdf files in particular directories so I'm using this command. In the example below the line is blocking all .gif in the entire site. Block files of a specific file type (for example, .gif) | Disallow: /*.gif$ 2 questions : Can I use this command to specify one particular directory in which I want to block pdf files ? Will this line be recognized by googlebots ? Disallow: /fileadmin/xxxxxxx/xxx/xxxxxxx/*.pdf$ Then I realized that I would have to write as many lines as many directories there are in which I want to block pdf files. Let's say I want to block pdf files in all these 3 directories /fileadmin/directory1 /fileadmin/directory1/sub1 /fileadmin/directory1/sub1/pdf Is there a pattern-matching rule I could use to blocks access to pdf files in all subdirectories instead of writing 3x the above line for each subdirectory ? For exemple : Disallow: /fileadmin/directory1*/ Many thanks in advance for any insight you may have.
Technical SEO | | LabeliumUSA0 -
How to block text on a page to be indexed?
I would like to block the spider indexing a block of text inside a page , however I do not want to block the whole page with, for example , a noindex tag. I have tried already with a tag like this : chocolate pudding chocolate pudding However this is not working for my case, a travel related website. thanks in advance for your support. Best regards Gianluca
Technical SEO | | CharmingGuy0 -
Google insists robots.txt is blocking... but it isn't.
I recently launched a new website. During development, I'd enabled the option in WordPress to prevent search engines from indexing the site. When the site went public (over 24 hours ago), I cleared that option. At that point, I added a specific robots.txt file that only disallowed a couple directories of files. You can view the robots.txt at http://photogeardeals.com/robots.txt Google (via Webmaster tools) is insisting that my robots.txt file contains a "Disallow: /" on line 2 and that it's preventing Google from indexing the site and preventing me from submitting a sitemap. These errors are showing both in the sitemap section of Webmaster tools as well as the Blocked URLs section. Bing's webmaster tools are able to read the site and sitemap just fine. Any idea why Google insists I'm disallowing everything even after telling it to re-fetch?
Technical SEO | | ahockley0 -
I accidentally blocked Google with Robots.txt. What next?
Last week I uploaded my site and forgot to remove the robots.txt file with this text: User-agent: * Disallow: / I dropped from page 11 on my main keywords to past page 50. I caught it 2-3 days later and have now fixed it. I re-imported my site map with Webmaster Tools and I also did a Fetch as Google through Webmaster Tools. I tweeted out my URL to hopefully get Google to crawl it faster too. Webmaster Tools no longer says that the site is experiencing outages, but when I look at my blocked URLs it still says 249 are blocked. That's actually gone up since I made the fix. In the Google search results, it still no longer has my page title and the description still says "A description for this result is not available because of this site's robots.txt – learn more." How will this affect me long-term? When will I recover my rankings? Is there anything else I can do? Thanks for your input! www.decalsforthewall.com
Technical SEO | | Webmaster1230 -
Block Quotes and Citations for duplicate content
I've been reading about the proper use for block quotes and citations lately, and wanted to see if I was interpreting it the right way. This is what I read: http://www.pitstopmedia.com/sem/blockquote-cite-q-tags-seo So basically my question is, if I wanted to reference Amazon or another stores product reviews, could I use the block quote and citation tags around their content so it doesn't look like duplicate content? I think it would be great for my visitors, but also to the source as I am giving them credit. It would also be a good source to link to on my products pages, as I am not competing with the manufacturer for sales. I could also do this for product information right from the manufacturer. I want to do this for a contact lens site. I'd like to use Acuvue's reviews from their website, as well as some of their product descriptions. Of course I have my own user reviews and content for each product on my website, but I think some official copy could do well. Would this be the best method? Is this how Rottentomatoes.com does it? On every movie page they have 2-3 sentences from 50 or so reviews, and not much unique content of their own. Cheers, Vinnie
Technical SEO | | vforvinnie1 -
Does using parentheses affect the crawlers?
Quick question: if you using a parantheses around a keyword, do search bots still recognize the keyword? Fox ex: Welcome to a website about the National Basketball Association (NBA). Will the bots recognize that I'm trying to optimize to NBA and not (NBA)? Is this different for tags vs. actual body copy?
Technical SEO | | BPIAnalytics2 -
How Can I Block Archive Pages in Blogger when I am not using classic/default template
Hi, I am trying to block all the archive pages of my blog as Google is indexing them. This could lead to duplicate content issue. I am not using default blogger theme or classic theme and therefore, I cannot use this code therein: Please suggest me how I can instruct Google not to index archive pages of my blog? Looking for quick response.
Technical SEO | | SoftzSolutions0