Robots.txt Question
-
For our company website faithology.com we are attempting to block out any urls that contain a ? mark to keep google from seeing some pages as duplicates.
Our robots.txt is as follows:
User-Agent: * Disallow: /*? User-agent: rogerbot Disallow: /community/ Is the above correct? We are wanting them to not crawl any url with a "?" inside, however we don't want to harm ourselves in seo. Thanks for your help!
-
You can use wild-cards, in theory, but I haven't tested "?" and that could be a little risky. I'd just make sure it doesn't over-match.
Honestly, though, Robots.txt isn't as reliable as I'd like. It can be good for preventing content from being indexed, but once that content has been crawled, it's not great for removing it from the index. You might be better off with META NOINDEX or using the rel=canonical tag.
It depends a lot on what parameters you're trying to control, what value these pages have, whether they have links, etc. A wholesale block of everything with "?" seems really dangerous to me, IMO.
If you want to give a few example URLs, maybe we could give you more specific advice.
-
if I were you I would want to be 100% sure I got it right. This tool has never let me down and the way you have Roger bot he may be blocked.
Why not use a free tool from a very reputable company to make your robot text perfect
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
http://www.searchenginepromotionhelp.com/m/robots-text-tester/
then lastly to make sure everything is perfect I recommend one of my favorite free tools up to 500 pages is as many times as you want that costs I believe $70 a year
http://www.screamingfrog.co.uk/seo-spider/
his one of the best tools on the planet
while you're at Internet marketing ninjas website look for other tools they have loads of excellent tools that are recommend here.
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
Sincerely,
Thomas
-
Yes you can
Robots.txt Wildcard Matching
Google and Microsoft's Bing allow the use of wildcards in robots.txt files.
To block access to all URLs that include a question mark (?), you could use the following entry:
User-agent: *
Disallow: /*?You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .asp, you could use the following entry:
User-agent: Googlebot
Disallow: /*.asp$More background on wildcards available from Google and Yahoo! Search.
More
http://tools.seobook.com/robots-txt/
hope I was of help,
Tom
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Set Robots.txt file to crawl my website at specific times
Our website provider has stated that they can only 'lift' their block on our website in order for it to be crawled as specific times. Is there any way to amend a robots.txt to ensure that it crawls our website at a specific time of day/night in order to coincide with the block being lifted? Many Thanks, Charlene
Intermediate & Advanced SEO | | CharleneKennedy120 -
Question regarding geo-targeting in Google Webmaster Tools.
I understand that it's possible to target both domains/subdomains and subfolders to different geographical regions in GWT. However, I was wondering about the effect of targeting the domain to a single country, say the UK. Then targeting subfolders to other regions (say the US and France). e.g. www.domain.com -> UK
Intermediate & Advanced SEO | | TranslateMediaLtd
www.domain.com/us -> US
www.domain.com/fr -> France etc Would it be better to leave the main domain without a geographical target but set geo-targeting for the subfolders? Or would it be best to set geo-targeting for both the domain and subfolders.0 -
Permalink question
For 5 years I have used the permalink custom structure: /%postname% without the end backslash. I didn't think the difference was that big of a deal, yet last month I was curious of what benefits would happen if I made the change. To my surprise my rankings took a slight dive, but recovered stronger than before. As the URL itself doesn't require a redirect the posts and pages loaded the same with or wothout the "/" But now in Open Site Explorer, all my URL's have no page Authority. All the links i built were pointing to links without the backslash: example.com/post-name Questions: Did Google figure out the change, hence the dip in rankings and strong return? Will keeping /%postname%/ even though many links are pointing to a non backslash URL comeback to haunt me? Is there anything I can do to help lead Google to better see the changes I've made? thx
Intermediate & Advanced SEO | | MikePatch0 -
Site Structure Question
Hi All, Got a question about site structure, I currently have a website where everything is hosted on the root of the domain. See example below: site.com/men site.com/men-shorts site.com/men-shorts-[product name] I want to change the structure to site.com/men/shorts/[product-name] I have asked a couple of SEOs and some agree with me that the structure needs to be changed and some say that as long as I dictate the structure with internal links and breadcrumbs the URL structure doesn't matter... What do you guys think? Many thanks, Carlos
Intermediate & Advanced SEO | | Carlos-R0 -
Duplicate Content Question
My understanding of duplicate content is that if two pages are identical, Google selects one for it's results... I have a client that is literally sharing content real-time with a partner...the page content is identical for both sites, and if you update one page, teh otehr is updated automatically. Obviously this is a clear cut case for canonical link tags, but I'm cuious about something: Both sites seem to show up in search results but for different keywords...I would think one domain would simply win out over the other, but Google seems to show both sites in results. Any idea why? Also, could this duplicate content issue be hurting visibility for both sites? In other words, can I expect a boost in rankings with the canonical tags in place? Or will rankings remain the same?
Intermediate & Advanced SEO | | AmyLB0 -
Question about copying content
Hi there, I have had a question from a retailer asking if they can take all our content i.e. blog articles, product pages etc, what is best practice here in getting SEO value out of this? Here a few ideas I was thinking of: I was thinking they put canonical tags on all pages where they have copied our content? They copy the content but leave all anchor text in place? Please let me know your thoughts. Kind Regards
Intermediate & Advanced SEO | | Paul780 -
Another deduplication question.
Where an existing website has duplicate content issues - specifically the www. and non-www. type; what is the most effective way to inform the searchers and spiders that there is only one page? I have a site where the ecommerce software (Shopfitter 4) allows a fair bit of meta data to be inserted into each product page but I am uncertain, after a couple of attempts to deduplicate some pages, which is the most effective way to ensure that the www related duplication is eliminated sitewide - there is such a solution. I have to own up to having looked at ,htaccess 301 redirects webmaster tools and become increasingly bamboozled by the conflicting advice as to which is the most effective way or combination to get rid of this problem. too olod to learn new tricks I reckon 😉 Your help and clarification would be appreciated as this may help head off more fruitless work.
Intermediate & Advanced SEO | | SkiBum0 -
Should I robots block this directory?
There's about 43k pages indexed in this directory, and while helpful to end users, I don't see it being a great source of unique content for search engines. Would you robots block or meta noindex nofollow these pages in the /blissindex/ directory? ie. http://www.careerbliss.com/blissindex/petsmart-index-980481/ http://www.careerbliss.com/blissindex/att-index-1043730/ http://www.careerbliss.com/blissindex/facebook-index-996632/
Intermediate & Advanced SEO | | CareerBliss0