Google Crawler Error / restricting crawling
-
Hi
On a Magento Instance we manage there is an advanced search. As part of the ongoing enhancement of the instance we altered the advance search options so there are less and more relevant.
The issue is Google has crawled and catalogued the advanced search with the now removed options in the query string. Google keeps crawling these out of date advanced searches. These stale searches now create a 500 error.
Currently Google is attempting to crawl these pages twice a day.
I have implemented the following to stop this:-
1. Submitted requested the url be removed via Webmaster tools, selecting the directory option using uri:
http://www.domian.com/catalogsearch/advanced/result/
2. Added Disallow to robots.txt
Disallow: /catalogsearch/advanced/result/* Disallow: /catalogsearch/advanced/result/
3. Add rel="nofollow" to the links in the site linking to the advanced search.
Below is a list of the links it is crawling or attempting to crawl, 12 links crawled twice a day each resulting in a 500 status.
Can anything else be done?
-
Seems like you've done everything right. You could also add a Meta robots "NOINDEX, FOLLOW" to those pages.
I'd also double check the referring "linked from" referrer in Webmasters tools just to make sure you haven't missed any live followed links pointing to those pages.
When did you submit the removal request, and what is the status? (approved, denied, pending?) Another question, are those pages in Google's index?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Soft 404 error
Hello friends
Technical SEO | | industriestaedt
This is my site
https://www.alihosseini.org/ In the search console I have a soft 404 error
How can I fix this error?
I use WordPress0 -
500 - server error
Hi All, A site crawl reveals several server errors (status code 500) about a clients wordpress website. My question: what are the most common causes for server errors and what advice can I give about how to fix them? Thanks in advance,
Technical SEO | | WeAreDigital_BE
Jens0 -
Google crawl rate dropped after we activated CloudFront
Hello! Previously we've been using Amazon CloudFront for our static content (js, css etc). But to be able to reduce load on our origin servers and to be able to give our international users a good user experience we decided to deliver a couple of our sites through CloudFront. We noticed very nice drops in page load time, but when checking Google webmaster tools we noticed that all CloudFront-activated sites got a huge drop in pages crawled per day (from avg ~3500 to ~150). Also one of the sites have issues with the Google sitemaps (just marked as "Pending" in GWT) and no new pages or updated pages seems to be updated in the Google SERP. The rest of the sites gets some updates on the Google SERP, but very few compared to before CloudFront activation. Is there anybody here who have experience in full site delivery through CloudFront (or other CDNs) and effects on SEO/Google? Would be very glad for any insights or suggestions. The risk is that we need to remove CloudFront if this just continues.
Technical SEO | | Ludde0 -
404 error
Both SEOmoz and Google webmaster tools are returning over 4000 error 404.The majority or returned error URLs are for images, and all URLs end up with %20target=as shown belowimages/products/detail/AD9058RoundGlassTableChairs.jpg%20target=images/products/detail/BM921ModernRoundDiningTable.jpg%20target=images/products/detail/CR701506CappuccinoCoffeeTableSet.jpg%20target=any suggestions?RegardsTony
Technical SEO | | OCFurniture0 -
Google is indexing my directories
I'm sure this has been asked before, but I was looking at all of Google's results for my site and I found dozens of results for directories such as: Index of /scouting/blog/wp-includes/js/swfupload/plugins Obviously I don't want those indexed. How do I prevent Google from indexing those? Also, it only seems to be doing it with Wordpress, not any of the directories on my main site. (We have a wordpress blog, which is only a portion of the site)
Technical SEO | | UnderRugSwept0 -
Crawl issue
Hi I have a problem with crawl stats. Crawls Only return 3k pages while my site have 27k pages indexed(mostly duplicated content pages), why such a low number of pages crawled any help more than welcomed Dario PS: i have more campaign in place, might that be the reason?
Technical SEO | | Mrlocicero0 -
Block Baidu crawler?
Hello! One of our websites receives a large amount of traffic from the Baidu crawler. We do not have any Chinese content or do any business with China since our market is Uk. Is it a good idea to block the Baidu crawler in the robots.txt or could it have any adverse effects on SEO of our site? What do you suggest?
Technical SEO | | AJPro0 -
Ranking on google.com.au but not google.com
Hi there, we (www.refundfx.com.au) rank on google.com.au for some keywords that we target, but we do not rank at all on google.com, is that because we only use a .com.au domain and not a .com domain? We are an Australian company but our customers come from all over the world so we don't want to miss out on the google.com searches. Any help in this regard is appreciated. Thanks.
Technical SEO | | RefundFX0