Block subdomain directory in robots.txt
-
Instead of block an entire sub-domain (fr.sitegeek.com) with robots.txt, we like to block one directory (fr.sitegeek.com/blog).
'fr.sitegeek.com/blog' and 'wwww.sitegeek.com/blog' contain the same articles in one language only labels are changed for 'fr' version and we suppose that duplicate content cause problem for SEO. We would like to crawl and index 'www.sitegee.com/blog' articles not 'fr.sitegeek.com/blog'.so, suggest us how to block single sub-domain directory (fr.sitegeek.com/blog) with robot.txt?
This is only for blog directory of 'fr' version even all other directories or pages would be crawled and indexed for 'fr' version.
Thanks,
Rajiv -
Hi Rajiv,
If you post the same content on both FR & EN version:
-
if both are written in English (or mainly written in English) - best option would be to have a canonical pointing to the EN version
Example: https://fr.sitegeek.com/category/shared-hosting - most of the content is in English - so in this case I would point a canonical to the EN version -
if the FR version is in French - you can use the HREF lang tag - you can use this tool to generate them, check here for common mistakes and doublecheck the final result here.
Just some remarks:
-
partially translated pages offer little value for users - so it's best to fully translate them or only refer to the EN version
-
I have a strong impression that the EN version was machine translated to the FR version. (ex. French sites never use 'Maison' to link to the Homepage - they use Acceuil). Be aware that Google is perfectly capable to detect auto-translated pages and they consider it to be bad practice (check this video of Matt Cutts - starts at 1:50). So you might want to invest in proper translation or proofreading by a native French speaker.
rgds
Dirk
-
-
Thanks Dirk,
we will fix the issue as you suggested.
Could you explain more on duplicate content if we post articles on both 'FR' and 'EN' versions?
Thanks,
Rajiv
-
Just to add to this, if your subdomain has more than /blog on it, and you only want to block /blog, change Dirk's robots.txt to:
User-agent: Googlebot
Disallow: /blogor to block more than just google:
User-agent:*
Disallow: /blog -
The easiest way would be to put the robots.txt in the root of your subdomain & block access for search engines
User-agent: Googlebot
Disallow: /If you subdomain & the main domain are sharing the same root - this option is not possible. In that case, rather than working with robots.txt I would add a canonical on each page pointing to the main domain, or block all pages in the header (if this is technically possible)
You could also check these similar questions: http://moz.com/community/q/block-an-entire-subdomain-with-robots-txt and http://moz.com/community/q/blocking-subdomain-from-google-crawl-and-index - but the answers given are the same as the options above.
Apart from the technical question, qiven the fact that only the labels are translated, these pages make little sense for human users. It would probably make more sense to link to the normal (English) version of the blog (and put (en Anglais) next to the link.
rgds,
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirect question | new blog install on subdomain
Hi, I am running a wordpress site and our blog has grown to have a bit of a life of its own. I would like to use a more blog-oriented wordpress theme to take advantage of features that help with content discoverability, which is what the current theme I'm using doesn't really provide. I've seen sites like Canva, Mint and Hubspot put their blog on a subdomain, so the blog is sort of a separate site within a site. Advantages I see to this approach: Use a separate wordpress theme Help the blog feel like its own site and increase user engagement Give the blog its own name and identity My questions are: Are there any other SEO ramifications for taking this approach? For example, is a subdomain (blog.mysite.com) disadvantageous somehow, or inferior to to mysite.com/article-title? Regarding redirects, I read a recent Moz article about how 301s now do not lose page rank. I would also be able to implement https when I redirect, which is a plus. Is this an ok approach? Assuming I have to create redirect rules manually for each post though Thanks!
Intermediate & Advanced SEO | | mikequery0 -
Directory links with no follow
Hi I'm researching competitor backlinks & they have a lot of directory links which are no follow - but they rank very well. Is this type of link building even allowed by google? I know they they aren't allowed followed directory links, but will no following them help with rankings?
Intermediate & Advanced SEO | | BeckyKey0 -
.ac.uk subdomain vs .co.uk domain
I'd be grateful if I could check my thinking... I've agreed to give some quick advice to a non profit organisation who are in the process of moving their website from an ac.uk subdomain to a .co.uk domain. They believe that their SEO can be improved considerably by making this migration. From my experience, I don't see how this could be the case. Does the unique domain in itself offer enough ranking benefit to justify this approach? The subdomain is on a very high authority domain with many pre-existing links, which makes me even more nervous about this approach. Does anyone have any opinions on this that they could share please? I'm guessing that it is possible to migrate safely and that there might be branding advantages, but from an actual SEO point of view there is not that much benefit? It looks like most of their current traffic is branded traffic.
Intermediate & Advanced SEO | | RG_SEO0 -
Robots.txt Blocked Most Site URLs Because of Canonical
Had a bit of a "Gotcha" in Magento. We had Yoast Canonical Links extension which worked well , but then we installed Mageworx SEO Suite.. which broke Canonical Links. Unfortunately it started putting www.mysite.com/catalog/product/view/id/516/ as the Canonical Link - and all URLs with /catalog/productview/* is blocked in Robots.txt So unfortunately We told Google that the correct page is also a blocked page. they haven't been removed as far as I can see but traffic has certainly dropped. We have also , at the same time had some Site changes grouping some pages & having 301 redirects. Resubmitted site map & did a fetch as google. Any other ideas? And Idea how long it will take to become unblocked?
Intermediate & Advanced SEO | | s_EOgi_Bear0 -
On-Site Directory - Delete or Keep?
We have 2 ecommerce sites. Both have been hit by Penguin (no warnings in WMT) and we're in the process of cleaning up backlinks. We have link directories on both sites. They've got links that are relevant to the sites but also links that aren't relevant. And they're big directories - we're talking thousands of links to other sites. What's the best approach here? Do we leave it alone, delete the whole thing, or manually review and keep highly relevant links but get rid of the rest?
Intermediate & Advanced SEO | | Kingof50 -
Sitemaps and subdomains
At the beginning of our life-cycle, we were just a wordpress blog. However, we just launched a product created in Ruby. Because we did not have time to put together an open source Ruby CMS platform, we left the blog in wordpress and app in rails. Thus our web app is at http://www.thesquarefoot.com and our blog is at http://blog.thesquarefoot.com. We did re-directs such that if the URL does not exist at www.thesquarefoot.com it automatically forwards to blog.thesquarefoot.com. What is the best way to handle sitemaps? Create one for blog.thesquarefoot.com and for http://www.thesquarefoot.com and submit them separately? We had landing pages like http://www.thesquarefoot.com/houston in wordpress, which ranked well for Find Houston commercial real estate, which have been replaced with a landing page in Ruby, so that URL works well. The url that was ranking well for this word is now at blog.thesquarefoot.com/houston/? Should i delete this page? I am worried if i do, we will lose ranking, since that was the actual page ranking, not the new one. Until we are able to create an open source Ruby CMS and move everything over to a sub-directory and have everything live in one place, I would love any advice on how to mitigate damage and not confuse Google. Thanks
Intermediate & Advanced SEO | | TheSquareFoot0 -
Why should I add URL parameters where Meta Robots NOINDEX available?
Today, I have checked Bing webmaster tools and come to know about Ignore URL parameters. Bing webmaster tools shows me certain parameters for URLs where I have added META Robots with NOINDEX FOLLOW syntax. I can see canopy_search_fabric parameter in suggested section. It's due to following kind or URLs. http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=1728 http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=1729 http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=1730 http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=2239 But, I have added META Robots NOINDEX Follow to disallow crawling. So, why should it happen?
Intermediate & Advanced SEO | | CommercePundit0 -
Directories...
So, I have a website, and I have a few pages in directories, and the rest are normal with extensions (i.e. example.com/blah.html instead of example.com/blah/) Now, the directory page isnt ranking yet for a targeting keyword (although I am still in the process of link building to the page w/ anchor text), however, could it be because it is the odd man out being one of the only pages within a directory? Also, I would really like to move all my pages into directories, however some of the internal pages are ranking really well and I do not want to lose that once switching. Has anyone has experiences with using 301s to redirect so sub directories without loosing rankings?
Intermediate & Advanced SEO | | Aftermath_SEO0