Duplicate site (disaster recovery) being crawled and creating two indexed search results
-
I have a primary domain, toptable.co.uk, and a disaster recovery site for this primary domain named uk-www.gtm.opentable.com. In the event of a disaster, toptable.co.uk would get CNAMEd (DNS alias) to the .gtm site. Naturally the .gtm disaster recover domian is an exact match to the toptable.co.uk domain.
Unfortunately, Google has crawled the uk-www.gtm.opentable site, and it's showing up in search results. In most cases the gtm urls don't get redirected to toptable they actually appear as an entirely separate domain to the user. The strong feeling is that this duplicate content is hurting toptable.co.uk, especially as .gtm.ot is part of the .opentable.com domain which has significant authority. So we need a way of stopping Google from crawling gtm.
There seem to be two potential fixes. Which is best for this case?
- use the robots.txt to block Google from crawling the .gtm site
2) canonicalize the the gtm urls to toptable.co.uk
In general Google seems to recommend a canonical change but in this special case it seems robot.txt change could be best.
Thanks in advance to the SEOmoz community!
-
It's a little tricky. While Andrea is right about Robots.txt - it's not great for removal once pages/domains are indexed, you can block the sub-domain with robots.txt and then request removal in Google Webmaster Tools (you need to create a separate account for the sub-domain itself). That's often the fastest way to remove something from the index, and if it has no search value, I might go that route. Just proceed with caution - it's a delicate procedure.
Doing 1-to-1 canonicalization or adding 301 redirects may be the next strongest signal (NOINDEX is a bit weaker, IMO). However, Google will have to re-crawl the sub-domain to do that, so you'll need to keep the paths open.
-
First, if the pages are already indexed then a robots.txt won't make them go away. A meta tag no index on the pages is the better solution. This allows search engines to "read" you page, see the no index tag and then work to remove the pages from index. A robots.txt doesn't necessarily accomplish the same result.
-
If you can do a 1-to-1 page canonicalization (each page on .co.uk is canonicaled to the equivalent page on the .com) then I would do that.
Otherwise, I would noindex the backup site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google crawling 200 page site thousands of times/day. Why?
Hello all, I'm looking at something a bit wonky for one of the websites I manage. It's similar enough to other websites I manage (built on a template) that I'm surprised to see this issue occurring. The xml sitemap submitted shows Google there are 229 pages on the site. Starting in the beginning of December Google really ramped up their intensity in crawling the site. At its high point Google crawled 13,359 pages in a single day. I mentioned I manage other similar sites - this is a very unusual spike. There are no resources like infinite scroll that auto generates content and would cause Google some grief. So follow up questions to my "why?" is "how is this affecting my SEO efforts?" and "what do I do about it?". I've never encountered this before, but I think limiting my crawl budget would be treating the symptom instead of finding the cure. Any advice is appreciated. Thanks! *edited for grammar.
Intermediate & Advanced SEO | | brettmandoes0 -
Recovering from index problem (Take two)
Hi all. This is my second pass at the problem. Thank you for your responses before, I think I'm narrowing it down! Below is my original message. Afterwards, I've added some update info. For a while, we've been working on http://thewilddeckcompany.co.uk/. Everything was going swimmingly, and we had a top 5 ranking for the term 'bird hides' for this page - http://thewilddeckcompany.co.uk/products/bird-hides. Then disaster struck! The client added a link with a faulty parameter in the Joomla back end that caused a bunch of duplicate content issues. Before this happened, all the site's 19 pages were indexed. Now it's just a handful, including the faulty URL (thewilddeckcompany.co.uk/index.php?id=13) This shows the issue pretty clearly. https://www.google.co.uk/search?q=site%3Athewilddeckcompany.co.uk&oq=site%3Athewilddeckcompany.co.uk&aqs=chrome..69i57j69i58.2178j0&sourceid=chrome&ie=UTF-8 I've removed the link, redirected the bad URL, updated the site map and got some new links pointing at the site to resolve the problem. Yet almost two month later, the bad URL is still showing in the SERPs and the indexing problem is still there. UPDATE OK, since then I've blocked the faulty parameter in the robots.txt file. Now that page has disappeared, but the right one - http://thewilddeckcompany.co.uk/products/bird-hides - has not been indexed. It's been like this for several week. Any ideas would be much appreciated!
Intermediate & Advanced SEO | | Blink-SEO0 -
Indexing falling/search queries the same - concerned
Hello, I posted abou this a few days ago but didn't really get anywhere and now have new information after looking into it more. This is my site - http://www.whosjack.org My page indexing has been falling steadily daily currently from thousands of pages indexed to just a couple of hundred. My search queries don't seem to be currently affected, I have done crawl tests to see if the site can be crawled and put the site:whosjack.org into Google and had 12,000 results come back when goole has said it has indexed 133 and falling. However all pages indexed on the site:whosjack.org search seem to be stories with just two words in the title? I am sure I am missing out on traffic here but can't work out what the issue is and how to fix it. I have no alerts on my dashboard and when I submit sitemaps to webmaster tools I get 15,115 URLs submitted 12,088 URLs indexedwhich cant be bad?Any help/suggestions really appreciated.
Intermediate & Advanced SEO | | luwhosjack0 -
Temporary Duplicate Sites - Do anything?
Hi Mozzers - We are about to move one of our sites to Joomla. This is one of our main sites and it receives about 40 million visits a month, so the dev team is a little concerned about how the new site will handle the load. Dev's solution, since we control about 2/3 of that traffic through our own internal email and cross promotions, is to launch the new site and not take down the old site. They would leave the old site on its current URL and make the new site something like new.sub.site.com. Traffic we control would continue to the old site, traffic that we detect as new would be re-directed to the new site. Over time (the think about 3-4 months) they would shift the traffic all to the new site, then eventually change the URL of the new site to be the URL of the old site and be done. So this seems to be at the outset a duplicate content (whole site) issue to start with. I think the best course of action is try to preserve all SEO value on the old URL since the new URL will eventually go away and become the old URL. I could consider on the new site no-crawl/no-index tags temporarily while both sites exist, but would that be risky since that site will eventually need to take those tags off and become the only site? Rel=canonical temporarily from the new site to the old site also seems like it might not be the best answer. Any thoughts?
Intermediate & Advanced SEO | | Kenn_Gold0 -
Does Mobile optimised site improve ranking and how to index it faster?
Hi i have several question with regards to mobile optimised site: Does having a mobile optimised site improve ranking in SERP? How can we push/index mobile optimised sites to users searching on mobile sites faster? e.g. returning m.abc.com or abc.com/m to users seraching on mobile earlier.
Intermediate & Advanced SEO | | FWSBIO0 -
Same day indexing... some tips for a non blog site
Hi There's a question today about paying for Yahoo directory - I wondered if these kinds of links would help with getting indexed. We're adding about 10-20 pages of new unique content each day, but it takes google about 4 days to list it and then it only lists a few pages of that 10-20 pages, we have other older sites and blogs that are within the hour. What tips can you give me for getting at least a same day index, and a thorough one at that, so if we publish 20 pages on Thursday by the end of play Friday they're indexed? Would the Yahoo Directory help?
Intermediate & Advanced SEO | | xoffie0 -
How to handle a server outage if I have two sites
I operate a web application. It consists of two sites, www.mysite.com and app.mysite.com. As you might imagine, www is used for marketing purposes, and it's our main organic search entry point. The app.mysite.com domain is where our application portal is for customers, and it is also where our login and registration pages are located. Currently, www.mysite.com is experiencing a catastrophic outage and is returning 504 errors, but app.mysite.com is on a totally separate system with a lot redundancy, and is doing just fine. If we get traffic from referrals or search, we want that traffic to be able to login and register, so we've replaced the 504 error with a 302 redirect to app.mysite.com until the situation is resolved. This provides the best possible experience for users (nothing's worse than a 504). How will this affect SEO? Is there something other than a 302 that I should be doing with the broken www.mysite.com domain?
Intermediate & Advanced SEO | | Ehren0 -
Is linking to search results bad for SEO?
If we have pages on our site that link to search results is that a bad thing? Should we set the links to "nofollow"?
Intermediate & Advanced SEO | | nicole.healthline0