Does It Really Matter to Restrict Dynamic URLs by Robots.txt?
-
Today, I was checking Google webmaster tools and found that, there are 117 dynamic URLs are restrict by Robots.txt. I have added following syntax in my Robots.txt You can get more idea by following excel sheet.
#Dynamic URLs
Disallow: /?osCsidDisallow: /?q=
Disallow: /?dir=Disallow: /?p=
Disallow: /*?limit=
Disallow: /*review-form
I have concern for following kind of pages.
Shorting by specification:
http://www.vistastores.com/table-lamps?dir=asc&order=name
Iterms per page:
http://www.vistastores.com/table-lamps?dir=asc&limit=60&order=name
Numbering page of products:
http://www.vistastores.com/table-lamps?p=2
Will it create resistance in organic performance of my category pages?
-
I am quite late to add my reply on this question. Because, I was busy to fix issue regarding dynamic URLs.
I have made following changes on my website.
- I have re-write all dynamic URLs and make it static one exclude session ID and internal search option. Because, I have restricted both version via Robots.txt.
- I have set canonical to near duplicate pages which Dr.Pete described in Duplicate content in post panda world.
I want to give one live example to know more about it.
Base URL: http://www.vistastores.com/patio-umbrellas
Dynamic URLs: It was dynamic but, I have re-write to make it static one. But canonical tag to base URL is available on each near duplicate pages which are as follow.
http://www.vistastores.com/patio-umbrellas/shopby/limit-100
http://www.vistastores.com/patio-umbrellas/shopby/lift-method-search-manual-lift
http://www.vistastores.com/patio-umbrellas/shopby/manufacturer-fiberbuilt-umbrellas-llc
http://www.vistastores.com/patio-umbrellas/shopby/price-2,100
http://www.vistastores.com/patio-umbrellas/shopby/canopy-fabric-search-sunbrella
http://www.vistastores.com/patio-umbrellas/shopby/canopy-shape-search-hexagonal
http://www.vistastores.com/patio-umbrellas/shopby/canopy-size-search-7-ft-to-8-ft
http://www.vistastores.com/patio-umbrellas/shopby/color-search-blue
http://www.vistastores.com/patio-umbrellas/shopby/finish-search-black
http://www.vistastores.com/patio-umbrellas/shopby/p-2
http://www.vistastores.com/patio-umbrellas/shopby/dir-desc/order-positionNow, I am looking forward towards Google crawling and How Google treat all canonical pages. I am quite excited to see changes in organic ranking with distribution of page rank in website. Thanks for your insightful reply.
-
Robots.txt isn't the best solution for dynamic URLs. Depending on the type of URL, there are a number of other solutions available.
1. As blurbpoint mentions, Google Webmaster Tools allows you to specify URL handling. They actually do a decent job of this automatically, but also allow you the option to change the settings yourself.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687
2. Identical pages with different parameters can create duplicate content, which is often best handled with canonical tags.
3. Parameters that result in pagination may require slightly nuanced solutions. I won't get into them all here but Adam Audette gives a good overview of pagination solutions here: http://searchengineland.com/the-latest-greatest-on-seo-pagination-114284
Hope this helps. Best of luck with your SEO!
-
Hi,
Instead of blocking those URLs, You can use "URL parameter" setting in Google webmaster tool. You will get parameters like "?dir" & "?p" in it, select appropriate option from that like what actually happens when this parameter come into picture.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301 vs Canonical - With A Side of Partial URL Rewrite and Google URL Parameters-OH MY
Hi Everyone, I am in the middle of an SEO contract with a site that is partially HTML pages and the rest are PHP and part of an ecommerce system for digital delivery of college classes. I am working with a web developer that has worked with this site for many years. In the php pages, there are also 6 different parameters that are currently filtered by Google URL parameters in the old Google Search Console. When I came on board, part of the site was https and the remainder was not. Our first project was to move completely to https and it went well. 301 redirects were already in place from a few legacy sites they owned so the developer expanded the 301 redirects to move everything to https. Among those legacy sites is an old site that we don't want visible, but it is extensively linked to the new site and some of our top keywords are branded keywords that originated with that site. Developer says old site can go away, but people searching for it are still prevalent in search. Biggest part of this project is now to rewrite the dynamic urls of the product pages and the entry pages to the class pages. We attempted to use 301 redirects to redirect to the new url and prevent the draining of link juice. In the end, according to the developer, it just isn't going to be possible without losing all the existing link juice. So its lose all the link juice at once (a scary thought) or try canonicals. I am told canonicals would work - and we can switch to that. My questions are the following: 1. Does anyone know of a way that might make the 301's work with the URL rewrite? 2. With canonicals and Google parameters, are we safe to delete the parameters after we have ensures everything has a canonical url (parameter pages included)? 3. If we continue forward with 301's and lose all the existing links, since this only half of the pages in the site (if you don't count the parameter pages) and there are only a few links per page if that, how much of an impact would it have on the site and how can I avoid that impact? 4. Canonicals seem to be recommended heavily these days, would the canonical urls be a better way to go than sticking with 301's. Thank you all in advance for helping! I sincerely appreciate any insight you might have. Sue (aka Trudy)
Intermediate & Advanced SEO | | TStorm1 -
Large robots.txt file
We're looking at potentially creating a robots.txt with 1450 lines in it. This will remove 100k+ pages from the crawl that are all old pages (I know, the ideal would be to delete/noindex but not viable unfortunately) Now the issue i'm thinking is that a large robots.txt will either stop the robots.txt from being followed or will slow our crawl rate down. Does anybody have any experience with a robots.txt of that size?
Intermediate & Advanced SEO | | ThomasHarvey0 -
Demoting a URL (not-WMT Related)
I have a pharmaceutical brand that treats two diseases, but wants to primarily promote one. We want searches for "brand dosing" to go to Side A, but currently "brand dosing" goes to Side B. BUT, I want "brand dosing Side B" to still show up in organic search, so a noindex on Side B, or canonicalization of Side A, won't work. Essentially, I want any searches that are not specific to a disease treatment to go to Side A, and then specific Side B related searches, go to Side B. Because this is a client paying me to optimize their site, I obviously want to optimize their whole site, so only optimizing Side A, or unoptimizing Side B, aren't solutions I want to employ. I don't think a solution exists, but I figured my fellow Mozers would know best. Thanks in advance,
Intermediate & Advanced SEO | | GTO_Pharma_SEO0 -
If I own a .com url and also have the same url with .net, .info, .org, will I want to point them to the .com IP address?
I have a domain, for example, mydomain.com and I purchased mydomain.net, mydomain.info, and mydomain.org. Should I point the host @ to the IP where the .com is hosted in wpengine? I am not doing anything with the .org, .info, .net domains. I simply purchased them to prevent competitors from buying the domains.
Intermediate & Advanced SEO | | djlittman0 -
Url structure of a blog
We are trying to work out what the best structure for our blog is as we want each page to rank as highly as possible, we were looking at a flat structure similar to http://www.hongkiat.com/blog/ where every posts is after the blog/ but not in category's although the viewers can look in different category's from the top buttons on the page- photoshop - icons etc or we where going to go for the structured way- blog/photoshop/blog-post.html the only problem is that we will end up 4 deep at least with this and at least 80 characters in the url. any help would be appreciated. Thanks Shaun
Intermediate & Advanced SEO | | BobAnderson0 -
Should all pages on a site be included in either your sitemap or robots.txt?
I don't have any specific scenario here but just curious as I come across sites fairly often that have, for example, 20,000 pages but only 1,000 in their sitemap. If they only think 1,000 of their URL's are ones that they want included in their sitemap and indexed, should the others be excluded using robots.txt or a page level exclusion? Is there a point to having pages that are included in neither and leaving it up to Google to decide?
Intermediate & Advanced SEO | | RossFruin1 -
Google showing high volume of URLs blocked by robots.txt in in index-should we be concerned?
if we search site:domain.com vs www.domain.com, We see: 130,000 vs 15,000 results. When reviewing the site:domain.com results, we're finding that the majority of the URLs showing are blocked by robots.txt. They are subdomains that we use as production environments (and contain similar content as the rest of our site). And, we also find the message "In order to show you the most relevant results, we have omitted some entries very similar to the 541 already displayed." SEER Interactive mentions that this is one way to gauge a Panda penalty: http://www.seerinteractive.com/blog/100-panda-recovery-what-we-learned-to-identify-issues-get-your-traffic-back We were hit by Panda some time back--is this an issue we should address? Should we unblock the subdomains and add noindex, follow?
Intermediate & Advanced SEO | | nicole.healthline0 -
URL blocked
Hi there, I have recently noticed that we have a link from an authoritative website, however when I looked at the code, it looked like this: <a <span="">href</a><a <span="">="http://www.mydomain.com/" title="blocked::http://www.mydomain.com/">keyword</a> You will notice that in the code there is 'blocked::' What is this? has it the same effect as a nofollow tag? Thanks for any help
Intermediate & Advanced SEO | | Paul780