Duplicate site (disaster recovery) being crawled and creating two indexed search results
-
I have a primary domain, toptable.co.uk, and a disaster recovery site for this primary domain named uk-www.gtm.opentable.com. In the event of a disaster, toptable.co.uk would get CNAMEd (DNS alias) to the .gtm site. Naturally the .gtm disaster recover domian is an exact match to the toptable.co.uk domain.
Unfortunately, Google has crawled the uk-www.gtm.opentable site, and it's showing up in search results. In most cases the gtm urls don't get redirected to toptable they actually appear as an entirely separate domain to the user. The strong feeling is that this duplicate content is hurting toptable.co.uk, especially as .gtm.ot is part of the .opentable.com domain which has significant authority. So we need a way of stopping Google from crawling gtm.
There seem to be two potential fixes. Which is best for this case?
- use the robots.txt to block Google from crawling the .gtm site
2) canonicalize the the gtm urls to toptable.co.uk
In general Google seems to recommend a canonical change but in this special case it seems robot.txt change could be best.
Thanks in advance to the SEOmoz community!
-
It's a little tricky. While Andrea is right about Robots.txt - it's not great for removal once pages/domains are indexed, you can block the sub-domain with robots.txt and then request removal in Google Webmaster Tools (you need to create a separate account for the sub-domain itself). That's often the fastest way to remove something from the index, and if it has no search value, I might go that route. Just proceed with caution - it's a delicate procedure.
Doing 1-to-1 canonicalization or adding 301 redirects may be the next strongest signal (NOINDEX is a bit weaker, IMO). However, Google will have to re-crawl the sub-domain to do that, so you'll need to keep the paths open.
-
First, if the pages are already indexed then a robots.txt won't make them go away. A meta tag no index on the pages is the better solution. This allows search engines to "read" you page, see the no index tag and then work to remove the pages from index. A robots.txt doesn't necessarily accomplish the same result.
-
If you can do a 1-to-1 page canonicalization (each page on .co.uk is canonicaled to the equivalent page on the .com) then I would do that.
Otherwise, I would noindex the backup site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unknown index.html links coming to my site.
I'm getting a lot of domain/index.html urls on my site which I didn't create initially. We recently transfered to a new site so those links could come from the old site. Does any know how to get a comprehensive list of all the urls that lead to 404?
Intermediate & Advanced SEO | | greenshinenewenergy0 -
Duplicate content created by website Calendar - A Penalty?
A colleague of mine asked me a question about duplicate content coming from their event calendar. I don't think this will affect them negatively, but I would love some feedback and thoughts. ThanksOne of my clients, LifeTech Academy, is using my RavenTools software. Raventools has reported a HUGE amount of duplicate content (4.4K instances).The duplicate content all revolves around their calendar and repeating events (http://lifetechacademy.org/events/)The question is this - will this impact their SEO efforts in a negative way?
Intermediate & Advanced SEO | | Bill_K0 -
Since two years i lost place in google search
Hi, First excuse my but english. I am from France. Since two years i have lost place for the most important of my keyword in google search. The loss is gradual. It's been more than two years as I do not do link building. I do not know what to do. I just read your article on link building. (http://moz.com/blog/beginners-guide-to-link-building)
Intermediate & Advanced SEO | | Pascaltall
I would like if possible for someone to take a look at my website to see what is wrong and what i have to do do. For reasons of discretion I do not want to link the site name here, but you can see the domain name by following this link. http://riador.com/lien.html
Thank you to all.0 -
Why does the site I am working on have so few visits from organic search results?
Hello! I am not very experienced with SEO, but I am trying to help out on a site that has been around since 2010 and has well over a thousand pages of high-quality, original content, with more being added all the time. Only around 65 of the site's daily visits come from organic search results; this seems very low. There has already been significant SEO work done on the site. Is there something about the site that strikes anyone as obviously getting in the way of organic traffic? The URL is ellenjovin.com. I would appreciate any thoughts you may have. Thank you very much!
Intermediate & Advanced SEO | | nyc-seo0 -
List of Search Engines subscribing to the ajax crawling scheme?
Hi, Does anyone have a list of (major) Search Engines that subscribe to the Ajax Crawling Scheme? (https://developers.google.com/webmasters/ajax-crawling/) Specifically interested in major international Search Engines such as Bing/Yahoo, Baidu & Yandex - if anyone knows, please let me know! Thanks in advance
Intermediate & Advanced SEO | | FashionLux0 -
Recovering from index problem (Take two)
Hi all. This is my second pass at the problem. Thank you for your responses before, I think I'm narrowing it down! Below is my original message. Afterwards, I've added some update info. For a while, we've been working on http://thewilddeckcompany.co.uk/. Everything was going swimmingly, and we had a top 5 ranking for the term 'bird hides' for this page - http://thewilddeckcompany.co.uk/products/bird-hides. Then disaster struck! The client added a link with a faulty parameter in the Joomla back end that caused a bunch of duplicate content issues. Before this happened, all the site's 19 pages were indexed. Now it's just a handful, including the faulty URL (thewilddeckcompany.co.uk/index.php?id=13) This shows the issue pretty clearly. https://www.google.co.uk/search?q=site%3Athewilddeckcompany.co.uk&oq=site%3Athewilddeckcompany.co.uk&aqs=chrome..69i57j69i58.2178j0&sourceid=chrome&ie=UTF-8 I've removed the link, redirected the bad URL, updated the site map and got some new links pointing at the site to resolve the problem. Yet almost two month later, the bad URL is still showing in the SERPs and the indexing problem is still there. UPDATE OK, since then I've blocked the faulty parameter in the robots.txt file. Now that page has disappeared, but the right one - http://thewilddeckcompany.co.uk/products/bird-hides - has not been indexed. It's been like this for several week. Any ideas would be much appreciated!
Intermediate & Advanced SEO | | Blink-SEO0 -
Webmaster Tools: Total Indexed VS Ever Crawled
Ok, In WMT's under health > index status I have both total indexed and ever crawled ticked - It also looks like the data is broken up weekly. As an example say you have the following: Total Indexed: 1000 Ever Crawled: 5000 What is this say? It found 5000 pages but only indexed 1000 (20%). Thanks
Intermediate & Advanced SEO | | Bondara0 -
Why does my home page show up in search results instead of my target page for a specific keyword?
I am using Wordpress and am targeting a specific keyword..and am using Yoast SEO if that question comes up.. and I am at 100% as far as what they recommend for on page optimization. The target html page is a "POST" and not a "Page" using Wordpress definitions. Also, I am using this Pinterest style theme here http://pinclone.net/demo/ - which makes the post a sort of "pop-up" - but I started with a different theme and the results below were always the case..so I don't know if that is a factor or not. (I promise .. this is not a clever spammy attempt to promote their theme - in fact parts of it don't even work for me yet so I would not recommend it just yet...) I DO show up on the first page for my keyword.. however.. instead of Google showing the page www.mywebsite.com/this-is-my-targeted-keyword-page.htm Google shows www.mywebsite.com in the results instead. The problem being - if the traffic goes only to my home page.. they will be less likely to stay if they dont find what they want immediately and have to search for it.. Any suggestions would be appreciated!
Intermediate & Advanced SEO | | chunkyvittles0