Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Best way to remove full demo (staging server) website from Google index
-
I've recently taken over an in-house role at a property auction company, they have a main site on the top-level domain (TLD) and 400+ agency sub domains!
I recently found that the web development team have a demo domain per site, which is found on a subdomain of the original domain - mirroring the site. The problem is that they have all been found and indexed by Google:
Obviously this is a problem as it is duplicate content and so on, so my question is... what is the best way to remove the demo domain / sub domains from Google's index?
We are taking action to add a noindex tag into the header (of all pages) on the individual domains but this isn't going to get it removed any time soon! Or is it?
I was also going to add a robots.txt file into the root of each domain, just as a precaution! Within this file I had intended to disallow all.
The final course of action (which I'm holding off in the hope someone comes up with a better solution) is to add each demo domain / sub domain into Google Webmaster and remove the URLs individually.
Or would it be better to go down the canonical route?
-
Why couldn't I just put a password on the staging site, and let Google sort out the rest? Just playing devil's advocate.
-
If you've enough time to verify each subdomain in WMT and also removing 400+ domains one by one, then you can go for solution 2. You can't remove subdomain from verified WMT account of main domain, that's why you need to verify each domain.
Adding canonical is a better option, it wouldn't remove all of the demo domains from Google's index rapidly, you have to wait for few months, but you'll be on the safe side.
-
Out of curiosity, why wouldn't you recommend solution 2?
You mentioned that you faced a similar kind of situation in the past, how did that work out? Which of the 3 solutions (or all) did you opt for?
-
Good advice but an IP restriction for the demo sites won't be possible on this occasion as our router throws out a range of different IP addresses and we occasionally need the sites to be viewed externally! Any other suggestions to help?
-
I'd also recommend putting in an IP restriction for any of the demo sites.
So that if anyone visits the demo sites from a non-whitelisted IP address, then you can display an error message, or simply redirect them over to the live site.
That will likely have the search results quickly removed from the search engine.
Hope this helps!
-- Jeff
-
Solution 1:
Add robots.txt on all demo domains and block them, or add noindex in their header.
Solution 2: Verify each domain in webmaster tools and remove it entirely from the link removal section ( I wouldn't recommend this).
Solution 3:
If your both domains like agency1.domain.com and demo.agency1.domain.com have same coding and are clone then you should just add canonical url to the agency1.domain.com and canonical will be http://agency.domain.com/ it will work if it will be automatically shown in the demo domain. if it doesn't show up in the demo domain automatically then add the same canonical to the demo domain.
It will take some time to deindexed from serps, but it will surely work. I've faced the same kind of situation in past.
-
Noindex is your best option, really. It might take weeks, but I don't think any other method is going to be faster. Plus, technically speaking, "noindex" is the proper method for what you want to do - canonical tags or a robots.txt may do the job, but they aren't exactly the right way.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Trying to get Google to stop indexing an old site!
Howdy, I have a small dilemma. We built a new site for a client, but the old site is still ranking/indexed and we can't seem to get rid of it. We setup a 301 from the old site to the new one, as we have done many times before, but even though the old site is no longer live and the hosting package has been cancelled, the old site is still indexed. (The new site is at a completely different host.) We never had access to the old site, so we weren't able to request URL removal through GSC. Any guidance on how to get rid of the old site would be very appreciated. BTW, it's been about 60 days since we took these steps. Thanks, Kirk
Intermediate & Advanced SEO | | kbates0 -
Google Is Indexing my 301 Redirects to Other sites
Long story but now i have a few links from my site 301 redirecting to youtube videos or eCommerce stores. They carry a considerable amount of traffic that i benefit from so i can't take them down, and that traffic is people from other websites, so basically i have backlinks from places that i don't own, to my redirect urls (Ex. http://example.com/redirect) My problem is that google is indexing them and doesn't let them go, i have tried blocking that url from robots.txt but google is still indexing it uncrawled, i have also tried allowing google to crawl it and adding noindex from robots.txt, i have tried removing it from GWT but it pops back again after a few days. Any ideas? Thanks!
Intermediate & Advanced SEO | | cuarto7150 -
Removing UpperCase URLs from Indexing
This search - site:www.qjamba.com/online-savings/automotix gives me this result from Google: Automotix online coupons and shopping - Qjamba
Intermediate & Advanced SEO | | friendoffood
https://www.qjamba.com/online-savings/automotix
Online Coupons and Shopping Savings for Automotix. Coupon codes for online discounts on Vehicles & Parts products. and Google tells me there is another one, which is 'very simliar'. When I click to see it I get: Automotix online coupons and shopping - Qjamba
https://www.qjamba.com/online-savings/Automotix
Online Coupons and Shopping Savings for Automotix. Coupon codes for online discounts on Vehicles & Parts products. This is because I recently changed my program to redirect all urls with uppercase in them to lower case, as it appears that all lowercase is strongly recommended. I assume that having 2 indexed urls for the same content dilutes link juice. Can I safely remove all of my UpperCase indexed pages from Google without it affecting the indexing of the lower case urls? And if, so what is the best way -- there are thousands.0 -
Google Not Indexing XML Sitemap Images
Hi Mozzers, We are having an issue with our XML sitemap images not being indexed. The site has over 39,000 pages and 17,500 images submitted in GWT. If you take a look at the attached screenshot, 'GWT Images - Not Indexed', you can see that the majority of the pages are being indexed - but none of the images are. The first thing you should know about the images is that they are hosted on a content delivery network (CDN), rather than on the site itself. However, Google advice suggests hosting on a CDN is fine - see second screenshot, 'Google CDN Advice'. That advice says to either (i) ensure the hosting site is verified in GWT or (ii) submit in robots.txt. As we can't verify the hosting site in GWT, we had opted to submit via robots.txt. There are 3 sitemap indexes: 1) http://www.greenplantswap.co.uk/sitemap_index.xml, 2) http://www.greenplantswap.co.uk/sitemap/plant_genera/listings.xml and 3) http://www.greenplantswap.co.uk/sitemap/plant_genera/plants.xml. Each sitemap index is split up into often hundreds or thousands of smaller XML sitemaps. This is necessary due to the size of the site and how we have decided to pull URLs in. Essentially, if we did it another way, it may have involved some of the sitemaps being massive and thus taking upwards of a minute to load. To give you an idea of what is being submitted to Google in one of the sitemaps, please see view-source:http://www.greenplantswap.co.uk/sitemap/plant_genera/4/listings.xml?page=1. Originally, the images were SSL, so we decided to reverted to non-SSL URLs as that was an easy change. But over a week later, that seems to have had no impact. The image URLs are ugly... but should this prevent them from being indexed? The strange thing is that a very small number of images have been indexed - see http://goo.gl/P8GMn. I don't know if this is an anomaly or whether it suggests no issue with how the images have been set up - thus, there may be another issue. Sorry for the long message but I would be extremely grateful for any insight into this. I have tried to offer as much information as I can, however please do let me know if this is not enough. Thank you for taking the time to read and help. Regards, Mark Oz6HzKO rYD3ICZ
Intermediate & Advanced SEO | | edlondon0 -
Malicious site pointed A-Record to my IP, Google Indexed
Hello All, I launched my site on May 1 and as it turns out, another domain was pointing it's A-Record to my IP. This site is coming up as malicious, but worst of all, it's ranking on keywords for my business objectives with my content and metadata, therefore I'm losing traffic. I've had the domain host remove the incorrect A-Record and I've submitted numerous malware reports to Google, and attempted to request removal of this site from the index. I've resubmitted my sitemap, but it seems as though this offending domain is still being indexed more thoroughly than my legitimate domain. Can anyone offer any advice? Anything would be greatly appreciated! Best regards, Doug
Intermediate & Advanced SEO | | FranGen0 -
Google Indexing Feedburner Links???
I just noticed that for lots of the articles on my website, there are two results in Google's index. For instance: http://www.thewebhostinghero.com/articles/tools-for-creating-wordpress-plugins.html and http://www.thewebhostinghero.com/articles/tools-for-creating-wordpress-plugins.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+thewebhostinghero+(TheWebHostingHero.com) Now my Feedburner feed is set to "noindex" and it's always been that way. The canonical tag on the webpage is set to: rel='canonical' href='http://www.thewebhostinghero.com/articles/tools-for-creating-wordpress-plugins.html' /> The robots tag is set to: name="robots" content="index,follow,noodp" /> I found out that there are scrapper sites that are linking to my content using the Feedburner link. So should the robots tag be set to "noindex" when the requested URL is different from the canonical URL? If so, is there an easy way to do this in Wordpress?
Intermediate & Advanced SEO | | sbrault740 -
Site Indexed by Google but not Bing or Yahoo
Hi, I have a site that is indexed (and ranking very well) in Google, but when I do a "site:www.domain.com" search in Bing and Yahoo it is not showing up. The team that purchased the domain a while back has no idea if it was indexed by Bing or Yahoo at the time of purchase. Just wondering if there is anything that might be preventing it from being indexed? Also, Im going to submit an index request, are there any other things I can do to get it picked up?
Intermediate & Advanced SEO | | dbfrench0 -
Best practice for removing indexed internal search pages from Google?
Hi Mozzers I know that it’s best practice to block Google from indexing internal search pages, but what’s best practice when “the damage is done”? I have a project where a substantial part of our visitors and income lands on an internal search page, because Google has indexed them (about 3 %). I would like to block Google from indexing the search pages via the meta noindex,follow tag because: Google Guidelines: “Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines.” http://support.google.com/webmasters/bin/answer.py?hl=en&answer=35769 Bad user experience The search pages are (probably) stealing rankings from our real landing pages Webmaster Notification: “Googlebot found an extremely high number of URLs on your site” with links to our internal search results I want to use the meta tag to keep the link juice flowing. Do you recommend using the robots.txt instead? If yes, why? Should we just go dark on the internal search pages, or how shall we proceed with blocking them? I’m looking forward to your answer! Edit: Google have currently indexed several million of our internal search pages.
Intermediate & Advanced SEO | | HrThomsen0