Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to prevent Google from crawling our product filter?
-
Hi All,
We have a crawler problem on one of our sites www.sneakerskoopjeonline.nl.
On this site, visitors can specify criteria to filter available products. These filters are passed as http/get arguments. The number of possible filter urls is virtually limitless.
In order to prevent duplicate content, or an insane amount of pages in the search indices, our software automatically adds noindex, nofollow and noarchive directives to these filter result pages. However, we’re unable to explain to crawlers (Google in particular) to ignore these urls.
We’ve already changed the on page filter html to javascript, hoping this would cause the crawler to ignore it. However, it seems that Googlebot executes the javascript and crawls the generated urls anyway.
What can we do to prevent Google from crawling all the filter options?
Thanks in advance for the help.
Kind regards,
Gerwin
-
The following is added to our robots.txt .. now lets wait and see the results
User-agent: * Disallow: /admin/
Disallow: /?
Allow /?product_date=&product_date2=*
Disallow /?product_date=&product_date2=&To check the working of the robots.txt i found a handy website;
-
The url looks like this;
http://www.sneakerskoopjeonline.nl/herensneakers?product_brand=
So just adding;
User-agent: *
Disallow: /*?product_brandShould do the trick?
Most important is that herensneakers itself should be indexed, followed and crawled -
I would use your robots.txt file to prevent them from crawling the specific strings / pages. Go into your Google Webmaster Tools and you can see all the information Google has on your site and any issues, you can also specify robots.txt information in there. That would be the best route as Google is obedient with what is on the robots.txt file. If you want more information about robots.txt, go here.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Would you rate-control Googlebot? How much crawling is too much crawling?
One of our sites is very large - over 500M pages. Google has indexed 1/8th of the site - and they tend to crawl between 800k and 1M pages per day. A few times a year, Google will significantly increase their crawl rate - overnight hitting 2M pages per day or more. This creates big problems for us, because at 1M pages per day Google is consuming 70% of our API capacity, and the API overall is at 90% capacity. At 2M pages per day, 20% of our page requests are 500 errors. I've lobbied for an investment / overhaul of the API configuration to allow for more Google bandwidth without compromising user experience. My tech team counters that it's a wasted investment - as Google will crawl to our capacity whatever that capacity is. Questions to Enterprise SEOs: *Is there any validity to the tech team's claim? I thought Google's crawl rate was based on a combination of PageRank and the frequency of page updates. This indicates there is some upper limit - which we perhaps haven't reached - but which would stabilize once reached. *We've asked Google to rate-limit our crawl rate in the past. Is that harmful? I've always looked at a robust crawl rate as a good problem to have. Is 1.5M Googlebot API calls a day desirable, or something any reasonable Enterprise SEO would seek to throttle back? *What about setting a longer refresh rate in the sitemaps? Would that reduce the daily crawl demand? We could set increase it to a month, but at 500M pages Google could still have a ball at the 2M pages/day rate. Thanks
Intermediate & Advanced SEO | | lzhao0 -
How does google recognize original content?
Well, we wrote our own product descriptions for 99% of the products we have. They are all descriptive, has at least 4 bullet points to show best features of the product without reading the all description. So instead using a manufacturer description, we spent $$$$ and worked with a copywriter and still doing the same thing whenever we add a new product to the website. However since we are using a product datafeed and send it to amazon and google, they use our product descriptions too. I always wait couple of days until google crawl our product pages before i send recently added products to amazon or google. I believe if google crawls our product page first, we will be the owner of the content? Am i right? If not i believe amazon is taking advantage of my original content. I am asking it because we are a relatively new ecommerce store (online since feb 1st) while we didn't have a lot of organic traffic in the past, i see that our organic traffic dropped like 50% in April, seems like it was effected latest google update. Since we never bought a link or did black hat link building. Actually we didn't do any link building activity until last month. So google thought that we have a shallow or duplicated content and dropped our rankings? I see that our organic traffic is improving very very slowly since then but basically it is like between 5%-10% of our current daily traffic. What do you guys think? You think all our original content effort is going to trash?
Intermediate & Advanced SEO | | serkie1 -
Google is displaying wrong address
I have a client whose Google Places listing is not showing correctly. We have control of the page, and have the address verified by postcard. Yet when we view the listing it shows a totally different address that is miles away and on a totally different street. We have relogged into manage the business listing and all of the info is correct. We dragged the marker and submitted it to them that they had things wrong and left a note with the right address. Why would this happen and how can we fix it? Right now they rank highly but with a blatantly wrong address.
Intermediate & Advanced SEO | | Atomicx0 -
Shopify Product Variants vs Separate Product Pages
Let's say I have 10 different models of hats, and each hat has 5 colors. I have two routes I could take: a) Make 50 separate product pages Pros: -Better usability for customer because they can shop for just masks of a specific color. We can sort our collections to only show our red hats. -Help SEO with specific kw title pages (red boston bruins hat vs boston bruins hat). Cons: -Duplicate Content: Hat model in one color will have almost identical description as the same hat in a different color (from a usability and consistency standpoint, we'd want to leave descriptions the same for identical products, switching out only the color) b) Have 10 products listed, each with 5 color variants Pros: -More elegant and organized -NO duplicate Content Cons: -Losing out on color specific search terms -Customer might look at our 'red hats' collection, but shopify will only show the 'default' image of the hat, which could be another color. That's not ideal for usability/conversions. Not sure which route to take. I'm sure other vendors must have faced this issue before. What are your thoughts?
Intermediate & Advanced SEO | | birchlore0 -
Why am I not ranking in Google, but I am in Yahoo and Bing?
The website in question is: www.stbarthexclusives.com Our keywords are currently ranking for both Bing and Yahoo, but we're not appearing anywhere on Google. The website is being crawled successfully, but we still don't have any results. I hoping somebody can point me in the general right direction to fix/correct this problem. Additionally, there's a decent amount of "rel=canonical tags" on the website. If that helps your evaluation. Any advice would be greatly appreciated
Intermediate & Advanced SEO | | Endora0 -
How long is the google sandbox these days?
Hello, I'm putting up a new site for the first time in a while. How long is the Google Sandbox these days, and what has changed about it. Before it was 6 months to 1 year long. Thanks!
Intermediate & Advanced SEO | | BobGW0