Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to prevent Google from crawling our product filter?
-
Hi All,
We have a crawler problem on one of our sites www.sneakerskoopjeonline.nl.
On this site, visitors can specify criteria to filter available products. These filters are passed as http/get arguments. The number of possible filter urls is virtually limitless.
In order to prevent duplicate content, or an insane amount of pages in the search indices, our software automatically adds noindex, nofollow and noarchive directives to these filter result pages. However, we’re unable to explain to crawlers (Google in particular) to ignore these urls.
We’ve already changed the on page filter html to javascript, hoping this would cause the crawler to ignore it. However, it seems that Googlebot executes the javascript and crawls the generated urls anyway.
What can we do to prevent Google from crawling all the filter options?
Thanks in advance for the help.
Kind regards,
Gerwin
-
The following is added to our robots.txt .. now lets wait and see the results
User-agent: * Disallow: /admin/
Disallow: /?
Allow /?product_date=&product_date2=*
Disallow /?product_date=&product_date2=&To check the working of the robots.txt i found a handy website;
-
The url looks like this;
http://www.sneakerskoopjeonline.nl/herensneakers?product_brand=
So just adding;
User-agent: *
Disallow: /*?product_brandShould do the trick?
Most important is that herensneakers itself should be indexed, followed and crawled -
I would use your robots.txt file to prevent them from crawling the specific strings / pages. Go into your Google Webmaster Tools and you can see all the information Google has on your site and any issues, you can also specify robots.txt information in there. That would be the best route as Google is obedient with what is on the robots.txt file. If you want more information about robots.txt, go here.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can you index a Google doc?
We have updated and added completely new content to our state pages. Our old state content is sitting in a our Google drive. Can I make these public to get them indexed and provide a link back to our state pages? In theory it sounds like a great link building strategy... TIA!
Intermediate & Advanced SEO | | LindsayE1 -
Does google ignore ? in url?
Hi Guys, Have a site which ends ?v=6cc98ba2045f for all its URLs. Example: https://domain.com/products/cashmere/robes/?v=6cc98ba2045f Just wondering does Google ignore what is after the ?. Also any ideas what that is? Cheers.
Intermediate & Advanced SEO | | CarolynSC0 -
Is possible to submit a XML sitemap to Google without using Google Search Console?
We have a client that will not grant us access to their Google Search Console (don't ask us why). Is there anyway possible to submit a XML sitemap to Google without using GSC? Thanks
Intermediate & Advanced SEO | | RosemaryB0 -
Google crawling different content--ever ok?
Here are a couple of scenarios I'm encountering where Google will crawl different content than my users on initial visit to the site--and which I think should be ok. Of course, it is normally NOT ok, I'm here to find out if Google is flexible enough to allow these situations: 1. My mobile friendly site has users select a city, and then it displays the location options div which includes an explanation for why they may want to have the program use their gps location. The user must choose the gps, the entire city, or he can enter a zip code, or choose a suburb of the city, which then goes to the link chosen. OTOH it is programmed so that if it is a Google bot it doesn't get just a meaningless 'choose further' page, but rather the crawler sees the page of results for the entire city (as you would expect from the url), So basically the program defaults for the entire city results for google bot, but for for the user it first gives him the initial ability to choose gps. 2. A user comes to mysite.com/gps-loc/city/results The site, seeing the literal words 'gps-loc' in the url goes out and fetches the gps for his location and returns results dependent on his location. If Googlebot comes to that url then there is no way the program will return the same results because the program wouldn't be able to get the same long latitude as that user. So, what do you think? Are these scenarios a concern for getting penalized by Google? Thanks, Ted
Intermediate & Advanced SEO | | friendoffood0 -
Google Cache Is Blank for Text-only
Hi, I'm doing some SEO for www.suprafootwear.com, and for some reason when I go to text-only in google cache, nothing shows up. http://webcache.googleusercontent.com/search?q=cache:suprafootwear.com&es_sm=91&strip=1 That seems to be the case for all of the different pages on the site, but the content is still appearing on the serp. I have never seen this before, and I'm not sure what's happening. Any help would be greatly appreciated. Thanks!
Intermediate & Advanced SEO | | bigwavew0 -
Multiple Authors Google + Authorship
Hello, I took a look through past questions but can't seem to find a definitive answer on setting up Google + Authorship credit (for multiple authors) using a Wordpress blog. Has anyone had experience setting this up? Or could you recommend solid reading/research? I took a look at a couple of Wordpress plug in's but just found them very confusing (so did our IT contact who will ultimately be setting up code for this.) Any direction or advice is appreciated.
Intermediate & Advanced SEO | | SEOSponge0 -
Why Put an H1 Tag On A Product?
Why would you put an H1 tag on a product name? I came across this in another forum and thought I'd float it here.
Intermediate & Advanced SEO | | AWCthreads0 -
Check Google ban on domainname
Hello all, If I wanted to know if a domainname has a google ban on it would the following be a good idea to test it. Place an article on the domain page with unique content and then link to the page so its gets indexed and then link to the article from a well indexed page. If it doesn't get indexed there might be a ban on the page, if it does get indexed there is no ban on the page... Or are there other points I should keep in mind while doing this. All help is very welcome. Cheers, Arnout
Intermediate & Advanced SEO | | hellemans0