Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Should I block robots from URLs containing query strings?
-
I'm about to block off all URLs that have a query string using robots.txt. They're mostly URLs with coremetrics tags and other referrer info. I figured that search engines don't need to see these as they're always better off with the original URL.
Might there be any downside to this that I need to consider?
Appreciate your help / experiences on this one.
Thanks
Jenni
-
Thanks for your suggestions. I've already got canonical tags on every page, but they're not all being adhered to and lots of URLs with query strings are still getting organic traffic.
Passing referrer info behind scenes isn't an option with Coremetrics I don't think. Is it?
Interested to know more about number 1 though. How would you do that in WMT other than blocking with robots.txt?
Thanks
-
Instead of blocking them with robots.txt (which isn't very effective), try using the canonical tag instead.
For instance, a URL like this:
http://wwww.testdomain.com/page.html?utm_source=Google&utm_medium=Banner&utm_campaign=CampaignYou could add this canonical tag in the head:
With this solution you don't have to worry about losing quality links OR having your query tracking show up in any of the major search engines.
Cheers- Kyle
-
The downside to this would be if someone linked to the page with the query string, the search engines wouldn't crawl the page and flow link juice properly to the rest of your site.
Other options:
-
Use Google and Bing WMT to ignore those parameter query strings.
-
Make sure the canoncial tag is on those pages, pointing back to the version without the query string
-
Try to pass referrer info behind the scenes if possible
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I canonicalize URLs with no query params even though query params are always automatically appended?
There's a section of my client's website that presents quarterly government financial data. Users can filter results to see different years and quarters of financial info. If a user navigates to those pages, the URLs automatically append the latest query parameters. It's not a redirect...when I asked a developer what the mechanism was for this happening, he said "magic." He honestly didn't know how to describe it. So my question is, is it ok to canonicalize the URL without any query parameters, knowing that the user will always be served a page that does have query parameters? I need to figure out how to manage all of the various versions of these URLs.
Technical SEO | | LeahH0 -
Google Search console says 'sitemap is blocked by robots?
Google Search console is telling me "Sitemap contains URLs which are blocked by robots.txt." I don't understand why my sitemap is being blocked? My robots.txt look like this: User-Agent: *
Technical SEO | | Extima-Christian
Disallow: Sitemap: http://www.website.com/sitemap_index.xml It's a WordPress site, with Yoast SEO installed. Is anyone else having this issue with Google Search console? Does anyone know how I can fix this issue?1 -
Is there a limit to how many URLs you can put in a robots.txt file?
We have a site that has way too many urls caused by our crawlable faceted navigation. We are trying to purge 90% of our urls from the indexes. We put no index tags on the url combinations that we do no want indexed anymore, but it is taking google way too long to find the no index tags. Meanwhile we are getting hit with excessive url warnings and have been it by Panda. Would it help speed the process of purging urls if we added the urls to the robots.txt file? Could this cause any issues for us? Could it have the opposite effect and block the crawler from finding the urls, but not purge them from the index? The list could be in excess of 100MM urls.
Technical SEO | | kcb81780 -
Should I block Map pages with robots.txt?
Hello, I have a website that was started in 1999. On the website I have map pages for each of the offices listed on my site, for which there are about 120. Each of the 120 maps is in a whole separate html page. There is no content in the page other than the map. I know all of the offices love having the map pages so I don't want to remove the pages. So, my question is would these pages with no real content be hurting the rankings of the other pages on our site? Therefore, should I block the pages with my robots.txt? Would I also have to remove these pages (in webmaster tools?) from Google for blocking by robots.txt to really work? I appreciate your feedback, thanks!
Technical SEO | | imaginex0 -
Special characters in URL
Will registered trademark symbol within a URL be bad? I know some special characters are unsafe (#, >, etc.) but can not find anything that mentions registered trademark. Thanks!
Technical SEO | | bonnierSEO0 -
OK to block /js/ folder using robots.txt?
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU) But what if you have lots and lots of JS and you dont want to waste precious crawl resources? Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc. And the legacy versions show up in Google Webmaster Tools as 404s. For example: http://www.discoverafrica.com/js/global_functions.js?v=1.1
Technical SEO | | AndreVanKets
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1 Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether? Isn't that what robots.txt was made for? Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks. We're just trying to power our content and UX elegantly with javascript. What do you guys say: Obey Matt? Or run the javascript gauntlet?0 -
URL query strings and canonical tag
Hi, I have recently been getting my comparison website redesigned and developed onto wordpress and the site is now 90% complete. Part of the redesign has meant that there are now dynamic urls in the format: http://www.mywebsite.com/10-pounds-productss/?display=cost&value=10 I have other pages similar to this but with different content for the different price ranges and these are linked to from the menus: http://www.mywebsite.com/20-pounds-products/?display=cost&value=20 Now my questions are: 1. I am using Joost's All-in-one SEO plugin and this adds a canonical tag to the page that is pointing to http://www.mywebsite.com/10-pounds-products/ which is the permalink. Is this OK as it is or should i change this to http://www.mywebsite.com/10-pounds-products/?display=cost&value=10 2. Which URL will get indexed, what gets shown as the display URL in the SERPs and what page will users land on? I'm a bit confused so apologies if these seem like silly questions. Thanks
Technical SEO | | bizarro10000