Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Blocking URL's with specific parameters from Googlebot
-
Hi,
I've discovered that Googlebot's are voting on products listed on our website and as a result are creating negative ratings by placing votes from 1 to 5 for every product. The voting function is handled using Javascript, as shown below, and the script prevents multiple votes so most products end up with a vote of 1, which translates to "poor".
How do I go about using robots.txt to block a URL with specific parameters only? I'm worried that I might end up blocking the whole product listing, which would result in de-listing from Google and the loss of many highly ranked pages.
DON'T want to block:
http://www.mysite.com/product.php?productid=1234
WANT to block:
http://www.mysite.com/product.php?mode=vote&productid=1234&vote=2
Javacript button code:
onclick="javascript: document.voteform.submit();"
Thanks in advance for any advice given.
Regards,
Asim -
Good to hear, I am glad you perservered
-
Tried them all now and all come back with "Success"... May be I'll post in the WMT Forum and see if anyone can shed light on this problem. Thanks for your help Alan, it's much appreciated.
-
Yes correct, did you try the other formats?
-
Tried "Fetch as Googlebot" in Diagnostics and it came back as "Success" so I guess the robots.txt directive is not working. I'm assuming it should have reported a failure message when attempting to fetch a URL containing "?mode=vote".
-
Wrong place, go to diagnostics, then look for fetch as googlebot
-
I added "Disallow: /mode=vote" to the robots.txt file and also manually entered it on Crawler Access page, then clicked "Test" and no errors were reported. The WMT page states that robots.txt was last downloaded 16 hours ago so I'll wait until it picks the file up again and then check for any errors. Hopefully that will do trick

-
Try this in robots.txt, I did not think that Google allows wild cards but i just read that they do.
Disallow: /*mode=vote*orDisallow: /*mode=voteorDisallow: /*modeThen try in Google WMT to read with googlebot to see if it works.The first in the list seems right to me, but I have seen others do it the other ways. -
Thanks for the reply. The site was developed using PHP, mySQL and Javascript. I was hoping there was a way to do it without getting programmers involved...
-
dont think you are going to do it in robots.txt, rather do a 301 from mode=vote to non mode vote.
If you dont know how to put this into practise, tell me what your site is built with, if it is ASP.NET, i will show you how to impliment, if not someone else should be able to help.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
If I'm using a compressed sitemap (sitemap.xml.gz) that's the URL that gets submitted to webmaster tools, correct?
I just want to verify that if a compressed sitemap file is being used, then the URL that gets submitted to Google, Bing, etc and the URL that's used in the robots.txt indicates that it's a compressed file. For example, "sitemap.xml.gz" -- thanks!
Technical SEO | | jgresalfi0 -
SERP result (URL) doesn't change after a 301
A couple of months ago there was a result in Google for our branded search term which wasn't the 'official' URL, actually the result shown in the SERP was www.mycompany-ip.nl. We've applied a 301 redirect of this URL to the 'official' URL which is a subdomain: department.mycompany.nl. From Google the redirect is obviously working, but up until now, I don't see Google replacing the incorrect URL by the correct URL. I am wondering what to do to make the result correct. André
Technical SEO | | ConclusionDigital0 -
Problems with WooCommerce Product Attribute Filter URL's
I am running a WordPress/WooCommerce site for a client, and Moz is picking up some issues with URL's generated from WooCommerce product attribute filters. For example: ..co.uk/womens-prescription-glasses/?filter_gender=mens&filter_style=full-rim&filter_shape=oval How do I get Google to ignore these filters?
Technical SEO | | SushiUK
I am running Yoast Premium, but not sure if this can solve the issue? Product categories are canonicalised to the root category URL. Any suggestions very gratefully appreciated. Thanks Bob0 -
Blocked URL parameters can still be crawled and indexed by google?
Hy guys, I have two questions and one might be a dumb question but there it goes. I just want to be sure that I understand: IF I tell webmaster tools to ignore an URL Parameter, will google still index and rank my url? IS it ok if I don't append in the url structure the brand filter?, will I still rank for that brand? Thanks, PS: ok 3 questions :)...
Technical SEO | | catalinmoraru0 -
Is there a way for me to automatically download a website's sitemap.xml every month?
From now on we want to store all our sitemap.xml over the next years. Its a nice archive to have that allows us to analyse how many pages we have on our website and which ones were removed/redirected. Any suggestions? Thanks
Technical SEO | | DeptAgency0 -
How Does Google's "index" find the location of pages in the "page directory" to return?
This is my understanding of how Google's search works, and I am unsure about one thing in specific: Google continuously crawls websites and stores each page it finds (let's call it "page directory") Google's "page directory" is a cache so it isn't the "live" version of the page Google has separate storage called "the index" which contains all the keywords searched. These keywords in "the index" point to the pages in the "page directory" that contain the same keywords. When someone searches a keyword, that keyword is accessed in the "index" and returns all relevant pages in the "page directory" These returned pages are given ranks based on the algorithm The one part I'm unsure of is how Google's "index" knows the location of relevant pages in the "page directory". The keyword entries in the "index" point to the "page directory" somehow. I'm thinking each page has a url in the "page directory", and the entries in the "index" contain these urls. Since Google's "page directory" is a cache, would the urls be the same as the live website (and would the keywords in the "index" point to these urls)? For example if webpage is found at wwww.website.com/page1, would the "page directory" store this page under that url in Google's cache? The reason I want to discuss this is to know the effects of changing a pages url by understanding how the search process works better.
Technical SEO | | reidsteven750 -
Google insists robots.txt is blocking... but it isn't.
I recently launched a new website. During development, I'd enabled the option in WordPress to prevent search engines from indexing the site. When the site went public (over 24 hours ago), I cleared that option. At that point, I added a specific robots.txt file that only disallowed a couple directories of files. You can view the robots.txt at http://photogeardeals.com/robots.txt Google (via Webmaster tools) is insisting that my robots.txt file contains a "Disallow: /" on line 2 and that it's preventing Google from indexing the site and preventing me from submitting a sitemap. These errors are showing both in the sitemap section of Webmaster tools as well as the Blocked URLs section. Bing's webmaster tools are able to read the site and sitemap just fine. Any idea why Google insists I'm disallowing everything even after telling it to re-fetch?
Technical SEO | | ahockley0 -
Does 'framing' a website create duplicate content?
Something I have not come across before, but hope others here are able offer advice based on experience: A client has independently created a series of mini-sites, aimed at targeting specific locations. The tactic has worked very well and they have achieved a large amount of well targeted traffic as a result. Each mini-site is different but then in the nav, if you want to view prices or go to the booking page, that then links to what at first appears to be their main site. However, you then notice that the URL is actually situated on the mini-site. What they have done is 'framed' the main site so that it appears exactly the same even when navigating through this exact replica site. Checking the code, there is almost nothing there - in fact there is actually no content at all. Below the head, there is a piece of code: <frameset rows="*" framespacing=0 frameborder=0> <frame src="[http://www.example.com](view-source:http://www.yellowskips.com/)" frameborder=0 marginwidth=0 marginheight=0> <noframes>Your browser does not support frames. Click [here](http://www.example.com) to view.noframes> frameset> Given that main site content does not appear to show in the source code, do we have an issue with duplicate content? This issue is that these 'referrals' are showing in Analytics, despite the fact that the code does not appear in the source, which is slightly confusing for me. They have done this without consultation and I'm very concerned that this could potentially be creating duplicate content of their ENTIRE main site on dozens of mini-sites. I should also add that there are no links to the mini-sites from the main site, so if you guys advise that this is creating duplicate content, I would not be worried about creating a link-wheel if I advise them to link directly to the main site rather than the framed pages. Thanks!
Technical SEO | | RiceMedia0