Robots.txt: Syntax URL to disallow
-
Did someone ever experience some "collateral damages" when it's about "disallowing" some URLs?
Some old URLs are still present on our website and while we are "cleaning" them off the site (which takes time), I would like to to avoid their indexation through the robots.txt file.
The old URLs syntax is "/brand//13" while the new ones are "/brand/samsung/13." (note that there is 2 slash on the URL after the word "brand")
Do I risk to erase from the SERPs the new good URLs if I add to the robots.txt file the line "Disallow: /brand//" ?
I don't think so, but thank you to everyone who will be able to help me to clear this out
-
You could inadvertently block /brand/ altogether. Just because you use a // doesn't mean Google follows the same rules when crawling.
-
"I wouldn't risk telling a spider to ignore /brand// because it might have adverse results."
Which adverse results could be expected?
-
(because of the 404 error pages being constantly found in our pages)
Think of it this way:
Which is better? Re-routing traffic when it's congested or putting up a road block to back up even more traffic?Yes, it's more work to do the 301 redirects but if you have "pages being constantly found" you should give instructions to spiders to take the different path.
Now, if you are talking about an error such as:
/brand//samsung/13 SHOULD go to
/brand/samsung/13
Then you could EASILY solve this with HTACCESS redirects. I wouldn't risk telling a spider to ignore /brand// because it might have adverse results. -
Hi guys,
Thank you for your answers
I understand (and agree) with your SEO point of view (301 redirection) but I should have mentioned that these old URLs are leading to a 404 error page for a long time now, we are not considering anymore their SEO strength anymore...
My goal right now is to find a quick and simple way to tell search engines to not consider this type of old URLs (because of the 404 error pages being constantly found in our pages) : doing the 301 redirection to the right page would be a bit more complex at the moment.
So: do you think there is a risk that the second slash won't be "considered" in the robots.txt about the "disallow" line I want to add ? (= do search engines will stop to crawl URLs like "/brand/samsung/13" if I add the line "Disallow: /brand//" ?)
-
I'll further what Highland and Alex Chan are telling you. If you are using Apache (Linux) then you can redirect your old site links using a 301 redirect and .htaccess which is a very powerful tool. Otherwise, if you are using a IIS server, web.config is what you want to use.
A really good resource for .htassess is CSS-Tricks: http://css-tricks.com/snippets/htaccess/301-redirects/
-
Yup like Highland mentioned, using your robots.txt for this isn't a good idea. The robots.txt file isn't guaranteed to work anyway. The only sure fire way to get it working is to move all the URLs from the old structure to the new one, then 301 all the old URLs into the new URLs. The 301 minimizes loss to your SEO.
-
You really don't need a robots for that. I would either 301 the old URL (preferred) or have the old URL return a 404. Both will cause the old URL to be removed from the index. A robots nofollow simply leaves it up but tells the robots not to crawl it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Uppercase in URLs = Dupe Content
Hi Mozzers, My developers recently changed a bunch of the pages I am working on into all lower case (something I know ideally should have been done in the first place). The URLs have sat for about a week as lower case without 301 redirecting the old upper-case URLs to these pages. In Google Webmaster Tools, I'm seeing Google recognize them as duplicate meta tags, title tags, etc. See image: http://screencast.com/t/KloiZMKOYfa We're 301 redirecting the old URLs to the new ones ASAP, but is there anything else I should do? Any chance Google is going to noindex these pages because it seems them as dupes until I fix them? Sometimes I can see both pages in the SERPs if I use personalized results, and it scares me: http://screencast.com/t/4BL6iOhz4py3 Thanks!
Intermediate & Advanced SEO | | Travis-W0 -
Google Maps Integration Dynamic url
We are integrating Google Maps into a search feature on a website. Would you use the standard dynamic generated long url that appears after a search or find a way of reducing this to a shorter url. Taking into account hundreds of results. Question asked for seo purposes.
Intermediate & Advanced SEO | | jazavide0 -
URL language on Global Sites
Has anyone looked into a page not ranking as well because the URL is in English when the subdomain is geared for a different country and different language? I can defiantly see this taking away from the user experience, but didn't know if there was any concrete evidence or case studies that would show if it is a big deal or not for rankability? I know this is a backwards question to begin with because the priority over rankability is always UX, but there may not be a way to fix it unless I can prove it is a big deal.
Intermediate & Advanced SEO | | Ryan_Henry0 -
Panda Updates - robots.txt or noindex?
Hi, I have a site that I believe has been impacted by the recent Panda updates. Assuming that Google has crawled and indexed several thousand pages that are essentially the same and the site has now passed the threshold to be picked out by the Panda update, what is the best way to proceed? Is it enough to block the pages from being crawled in the future using robots.txt, or would I need to remove the pages from the index using the meta noindex tag? Of course if I block the URLs with robots.txt then Googlebot won't be able to access the page in order to see the noindex tag. Anyone have and previous experiences of doing something similar? Thanks very much.
Intermediate & Advanced SEO | | ianmcintosh0 -
Does Google index url with hashtags?
We are setting up some Jquery tabs in a page that will produce the same url with hashtags. For example: index.php#aboutus, index.php#ourguarantee, etc. We don't want that content to be crawled as we'd like to prevent duplicate content. Does Google normally crawl such urls or does it just ignore them? Thanks in advance.
Intermediate & Advanced SEO | | seoppc20120 -
How important is it to clarify URL parameters?
We have a long list of URL parameters in our Google Webmasters account. Currently, the majority are set to 'let googlebot decide.' How important is it to specify exactly what googlebot should do? Would you leave these to 'let googlebot decide' or would you specify how googlebot should treat each parameter?
Intermediate & Advanced SEO | | nicole.healthline0 -
Spammy? Long URLs
Hi All: Is it true that URLs such as this following one are viewed as "spammy" (besides being too long) and that such URLs will negatively affect ranks for keywords and page ranks: http://www.repairsuniverse.com/ipod-parts-ipod-touch-replacement-repair-parts-ipod-touch-1st-gen-replacement-repair-parts.html My thinking is that the page will perform better once it is 301 redirected to a shorter page name, such as: http://www.repairsuniverse.com/ipod-touch-1G-replacement-parts.html It also appears that these long URLs are also more likely to break, creating unnecessary 404s. <colgroup><col width="301"></colgroup> Thanks for your insight on this issue!
Intermediate & Advanced SEO | | holdtheonion0 -
Changing URLS - wondering about implications
We are in the process of changing our URLs from dynamic to more SEO friendly. The website is ciee.org and I'm specifically talking about ciee.org/study. While we work with the business to get approval for ciee.org/study-abroad, we are going with ciee.org/study/abroad. Can anyone foresee any difficulties or negative implications that could come if we change from study/abroad to study-abroad all within 6 months? Thank you in advance!!
Intermediate & Advanced SEO | | CIEEwebTeam0