Robots.txt: Syntax URL to disallow
-
Did someone ever experience some "collateral damages" when it's about "disallowing" some URLs?
Some old URLs are still present on our website and while we are "cleaning" them off the site (which takes time), I would like to to avoid their indexation through the robots.txt file.
The old URLs syntax is "/brand//13" while the new ones are "/brand/samsung/13." (note that there is 2 slash on the URL after the word "brand")
Do I risk to erase from the SERPs the new good URLs if I add to the robots.txt file the line "Disallow: /brand//" ?
I don't think so, but thank you to everyone who will be able to help me to clear this out
-
You could inadvertently block /brand/ altogether. Just because you use a // doesn't mean Google follows the same rules when crawling.
-
"I wouldn't risk telling a spider to ignore /brand// because it might have adverse results."
Which adverse results could be expected?
-
(because of the 404 error pages being constantly found in our pages)
Think of it this way:
Which is better? Re-routing traffic when it's congested or putting up a road block to back up even more traffic?Yes, it's more work to do the 301 redirects but if you have "pages being constantly found" you should give instructions to spiders to take the different path.
Now, if you are talking about an error such as:
/brand//samsung/13 SHOULD go to
/brand/samsung/13
Then you could EASILY solve this with HTACCESS redirects. I wouldn't risk telling a spider to ignore /brand// because it might have adverse results. -
Hi guys,
Thank you for your answers
I understand (and agree) with your SEO point of view (301 redirection) but I should have mentioned that these old URLs are leading to a 404 error page for a long time now, we are not considering anymore their SEO strength anymore...
My goal right now is to find a quick and simple way to tell search engines to not consider this type of old URLs (because of the 404 error pages being constantly found in our pages) : doing the 301 redirection to the right page would be a bit more complex at the moment.
So: do you think there is a risk that the second slash won't be "considered" in the robots.txt about the "disallow" line I want to add ? (= do search engines will stop to crawl URLs like "/brand/samsung/13" if I add the line "Disallow: /brand//" ?)
-
I'll further what Highland and Alex Chan are telling you. If you are using Apache (Linux) then you can redirect your old site links using a 301 redirect and .htaccess which is a very powerful tool. Otherwise, if you are using a IIS server, web.config is what you want to use.
A really good resource for .htassess is CSS-Tricks: http://css-tricks.com/snippets/htaccess/301-redirects/
-
Yup like Highland mentioned, using your robots.txt for this isn't a good idea. The robots.txt file isn't guaranteed to work anyway. The only sure fire way to get it working is to move all the URLs from the old structure to the new one, then 301 all the old URLs into the new URLs. The 301 minimizes loss to your SEO.
-
You really don't need a robots for that. I would either 301 the old URL (preferred) or have the old URL return a 404. Both will cause the old URL to be removed from the index. A robots nofollow simply leaves it up but tells the robots not to crawl it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Same URL-Structure & the same number of URLs indexed on two different websites - can it lead to a Google penalty?
Hey guys. I've got a question about the url structure on two different websites with a similar topic (bith are job search websites). Although we are going to publish different content (texts) on these two websites and they will differ visually, the url structure (except for the domain name) remains exactly the same, as does the number of indexed landingpages on both pages. For example, www.yyy.com/jobs/mobile-developer & www.zzz.com/jobs/mobile-developer. In your opinion, can this lead to a Google penalty? Thanks in advance!
Intermediate & Advanced SEO | | vde130 -
Url structure of a blog
We are trying to work out what the best structure for our blog is as we want each page to rank as highly as possible, we were looking at a flat structure similar to http://www.hongkiat.com/blog/ where every posts is after the blog/ but not in category's although the viewers can look in different category's from the top buttons on the page- photoshop - icons etc or we where going to go for the structured way- blog/photoshop/blog-post.html the only problem is that we will end up 4 deep at least with this and at least 80 characters in the url. any help would be appreciated. Thanks Shaun
Intermediate & Advanced SEO | | BobAnderson0 -
301 Redirection and apostrophes in URLs
Hi I am experiencing trouble getting any redirects with apostrophes in the URLs to 301 redirect in order to eliminate 404 errors. I have tried replacing the instance of the apostrophe in the source URL field to %27 and variations of this but to no avail. The site is a wordpress site (the old URLS are legacies from the old Business Catalyst site) and I am using the redirection plug in. I have gone into some detail with a helpful soul here http://wordpress.org/support/topic/how-to-deal-with-apostrophes-in-source-url but unfortunately to no result. If anyone has any idea how to solve this puzzle I would be grateful for the help. Example: http://www.tesselaars.com/blog/Inside_Flowers/post/Online_Marketing_for_Florists_Part_1%E2%80%93_A_Website_You_Won%27t_Regret/
Intermediate & Advanced SEO | | Seamoose0 -
Robot.txt error
I currently have this under my robot txt file: User-agent: *
Intermediate & Advanced SEO | | Rubix
Disallow: /authenticated/
Disallow: /css/
Disallow: /images/
Disallow: /js/
Disallow: /PayPal/
Disallow: /Reporting/
Disallow: /RegistrationComplete.aspx WebMatrix 2.0 On webmaster > Health Check > Blocked URL I copy and paste above code then click on Test, everything looks ok but then logout and log back in then I see below code under Blocked URL: User-agent: * Disallow: / WebMatrix 2.0 Currently, Google doesn't index my domain and i don't understand why this happening. Any ideas? Thanks Seda0 -
Where to put a page ID in a URL?
Hello, My company is going to change URLs to example.com/category or example.com/product. When we will change the URLs to product or category pages somehow we have to check whether the requested page is from category table in DB or from products table (this gives much speed to page load time). So we have to choose how to make the different product and category pages.
Intermediate & Advanced SEO | | komeksimas
Programmers said that we need to insert id to URL. So the question is: Which is the better way to place an id to an URL? example.com/product-name?id=111 example.com/product-name/111 example.com/product_name-111 Or maybe we should use some other punctuation mark to separate id from product name? p.s. I have read Dynamic URLs vs. static URLs by Google and it still didn't answered which is the best for all of the pages. Somehow others solve this problem by typing only the names to the URL, but could anyone tell what that technology should be?0 -
Google News URL Structure
Hi there folks I am looking for some guidance on Google News URLs. We are restructuring the site. A main traffic driver will be the traffic we get from Google News. Most large publishers use: www.site.com/news/12345/this-is-the-title/ Others use www.example.com/news/celebrity/12345/this-is-the-title/ etc. www.example.com/news/celebrity-news/12345/this-is-the-title/ www.example.com/celebrity-news/12345/this-is-the-title/ (Celebrity is a channel on Google News so should we try and follow that format?) www.example.com/news/celebrity-news/this-is-the-title/12345/ www.example.com/news/celebrity-news/this-is-the-title-12345/ (unique ID no at the end and part of the title URL) www.example.com/news/celebrity-news/celebrity-name/this-is-the-title-12345/ Others include the date. So as you can see there are so many combinations and there doesnt seem to be any unity across news sites for this format. Have you any advice on how to structure these URLs? Particularly if we want to been seen as an authority on the following topics: fashion, hair, beauty, and celebrity news - in particular "celebrity name" So should the celebrity news section be www.example.com/news/celebrity-news/celebrity-name/this-is-the-title-12345/ or what? This is for a completely new site build. Thanks Barry
Intermediate & Advanced SEO | | Deepti_C0 -
Exact keyword URL or not?
Hi all, I have a quick question about the proper use of permalinks. Let's say that I have a website about sports and I want to create an internal page dedicated to shoes. I know that the keyword "shoe" has 15.000 monthly visits, while the keyword "shoes" has 1.000 monthly visits. How do I have to name the internal page? http://www.example.com/shoe or http://www.example.com/shoes (with a final 's')? I would think that by naming the URL http://www.example.com/shoes, the search engine would consider that page for the keywords "shoe" and "shoes", but I am not sure about it. Should I create a URL that only focuses on one specific keyword ("shoe", in this example) or a URL that may encompass more than one keyword ("shoe" and "shoes")? I hope this is clear. Thank you for your time and help. All best, Sal
Intermediate & Advanced SEO | | salvyy0 -
Google Maps results doesn't show my site url but rather the maps url, why is this?
For several of my clients landing pages that show up in the Maps results the website url has been overwritten by the maps url (maps.google.com). Even though on my places page I have the correct website set up. Does anyone have any idea why they would be doing this and how I can correct it? Thanks kinldy in advance, Aaron. maps-url.png
Intermediate & Advanced SEO | | afranklin0