Robots Disallow Backslash - Is it right command
-
Bit skeptical, as due to dynamic url and some other linkage issue, google has crawled url with backslash and asterisk character
ex - www.xyz.com/\/index.php?option=com_product
www.xyz.com/\"/index.php?option=com_product
Now %5c is the encoded version of \ - backslash & %22 is encoded version of asterisk
Need to know for command :-
User-agent: * Disallow: \As am disallowing all backslash url through this - will it only remove the backslash url which are duplicates or the entire site,
-
Thanks, you seem lucky to me.. Almost after 2 month i have got the code for making all these encoded url's redirect correctly. Finally, now if one types
http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10
then he's redirected through 301 to the correct url
http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10
-
Hello Gagan,
I think the best way to handle this would be using the rel canonical tag or rewriting the URLs to get rid of the parameters and replace them with something more user-friendly.
The rel canonical tag would be the easiest way out of those two. I notice the version without the encoding (e.g. http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10 ) have a rel canonical tag that correctly references itself as the canonical version. However, the encoded URLs (e.g. http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10) which is actually http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10 does NOT have a rel canonical tag.
If the version with the backslash had a rel canonical tag stating that the following URL is canonical it would solve your issue, I think.
Canonical URL:
http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10 -
Sure, If i show you some url they are crawled as :-
Sample Incorrect URLs crawled and reported as duplicate one in Google Webmaster & Moz too
|
http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10
| http://www.mycarhelpline.com/\"/index.php?option=com_newcar&view=category&Itemid=2 |
|
Correct URL
http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10
http://www.mycarhelpline.com/index.php?option=com_newcar&view=search&Itemid=2
What we found online
Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces.
%22 reflects - " and %5c as \ (forward slash)
We intend to remove these duplicate one created having %22 and %5c within them..
Many thanks
-
I am not entirely sure I understood your question as intended, but I will do my best to answer.
I would not put this in my robots.txt flie because it could possibly be misunderstood as a forward slash, in which case your entire domain would be blocked:
Disallow: \
We can possibly provide you with some alternative suggestions on how to keep Google from crawling those pages if you could share some real examples.
It may be best to rewrite/redirect those URls instead since they don't seem to be the canonical version you intend to be presented to the user.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site moved. Unable to index page : Noindex detected in robots meta tag?!
Hope someone can shed some light on this: We moved our smaller site (into the main site ( different domains) . The smaller site that was moved ( https://www.bluegreenrentals.com)
Intermediate & Advanced SEO | | bgvsiteadmin
Directory where the site was moved (https://www.bluegreenvacations.com/rentals) Each page from the old site was 301 redirected to the appropriate page under .com/rentals. But we are seeing a significant drop in rankings and traffic., as I am unable to request a change of address in Google search console (a separate issue that I can elaborate on). Lots of (301 redirect) new destination pages are not indexed. When Inspected, I got a message : Indexing allowed? No: 'index' detected in 'robots' meta tagAll pages are set as Index/follow and there are no restrictions in robots.txtHere is an example URL :https://www.bluegreenvacations.com/rentals/resorts/colorado/innsbruck-aspen/Can someone take a look and share an opinion on this issue?Thank you!0 -
Set Robots.txt file to crawl my website at specific times
Our website provider has stated that they can only 'lift' their block on our website in order for it to be crawled as specific times. Is there any way to amend a robots.txt to ensure that it crawls our website at a specific time of day/night in order to coincide with the block being lifted? Many Thanks, Charlene
Intermediate & Advanced SEO | | CharleneKennedy120 -
Disallow URLs ENDING with certain values in robots.txt?
Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually?
Intermediate & Advanced SEO | | jmorehouse0 -
Meta Robot Tag:Index, Follow, Noodp, Noydir
When should "Noodp" and "Noydir" meta robot tag be used? I have hundreds or URLs for real estate listings on my site that simply use "Index", Follow" without using Noodp and Noydir. Should the listing pages use these Noodp and Noydr also? All major landing pages use Index, Follow, Noodp, Noydir. Is this the best setting in terms of ranking and SEO. Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
When is the right time to invest in a Trusted SEO firm
My website www.dealwithautism.com is a 3 month old website. It currently has 50+ quality pages that are KW targeted and on page optimized (usually grade A on Moz page grader). Over the next 12 to 15 months, I plan add a total of 300 to 400 kw targeted pages to strive for topical authority. I am launching my first product (an ebook in the next couple of months) and would eventually move into a membership subscription model in next 15 month. I want to invest in a long term SEO strategy with a reputed and trusted SEO firm. Being just a 1 person show at he moment, my budget is small (about $250 a month) but over time, as I acquire more revenue I will increase my SEO budget accordingly. I believe, if I get traffic, my content has the guts to absorb engagement. From analytics, any page that is not bounced and has received organic traffic (only less than 10 per day though) has an average time spent > 12 mins. So my content seems to be doing its bit now. My question: Is now a good time to invest in SEO for my budget? I need a long term and natural seo strategy, no quick wins - happy to play by the CPC model for my money pages till I see an organic growth. Or should I wait for 5-6 more months to let my site age a bit and also y that time I should have 150+ quality pages, so the authority should be more.
Intermediate & Advanced SEO | | DealWithAutism0 -
The right way for review count?
Hi, I have a question that kinda bothers me for a while now and would love to hear you guys thoughts on this matter. One of the categories in my websites is services reviews, for example:
Intermediate & Advanced SEO | | Ouzan
http://www.websiteplanet.com/review/questionform/ Recently we upgraded the site and gave users the option to write their own review as well. Now as you can see, the way it works is that we write a very big and informative/professional review ourself and in addition we give users the option to write their own reviews and share their experience with the service. Now My question is:
There is the itemprop-review count thingy, now we did it so if 0 users wrote a review it will show 0 reviews on the count.
Is it correct? (Feel free to check the site code yourself)
Or is it wrong? because their is actually 1 review (our editors review) I hope I explained my self well...
If not, please let me know what is unclear. Thanks!0 -
301 redirect or Robots.txt on an interstatial page
Hey guys, I have an affiliate tracking system that works like this : an affiliate puts up a certain code on his site, for example : www.domain.com/track/aff_id This url leads to a page where the hit is counted, analysed and then 302 redirects to my sales page with the affiliates ID in the url : www.mysalespage.com/?=aff_id. However, we've noticed recently that one affiliate seems to be ranking for our own name and the url google indexed was his tracking url (domain.com/track/aff_id). Which is strange because there is absolutely nothing on that page, its just an interstatial page so that our stats tracking software can properly filter hits. To remove the affiliate's url from showing up in the serps, I've come up with 2 solutions : 1 - Change the redirect to a 301 redirect on his track page. 2 - Change our robots.txt page to block all domain.com/track/ pages from being indexed. My question is : if I 301 redirect instead of 302, will I keep the affiliates from outranking me for my own name AND pass on link juice or should I simply block google from crawling the interstatial tracking pages?
Intermediate & Advanced SEO | | CrakJason0 -
Robots.txt unblock
I'm currently having trouble with what appears to be a cached version of robots.txt. I'm being told via errors in my Google sitemap account that I'm denying Googlebot access to the entire site. I uploaded clean and "Allow" robots.txt yesterday, but receive the same error. I've tried "Fetch as Googlebot" on the index and other pages, but still the error. Here is the latest: | Denied by robots.txt |
Intermediate & Advanced SEO | | Elchanan
| 11/9/11 10:56 AM | As I said, there in not blocking on the robots.txt for 24 hours. HELP!0