How is Google finding our preview subdomains?
-
I've noticed that Google is able to find, crawl and index preview subdomains we set up for new client sites (e.g. clientpreview.example.com). I know now to use "meta name="robots" and robots.txt) to block the search engines from crawling these subdomains. My question though, is how is Google finding these subdomains? We don't link to these preview domains from anywhere else, so I can't figure out how Google is even getting there.
Does anybody have any insight on this?
-
Thanks for your response Irving. We put some of our preview sites on subdomains of our main domain, but then remove them after the site goes live, so their shouldn't be any duplicate content issues. The main question is just how Google is finding these subdomains.
-
Thanks for the insight guys.
-
I don't specifically use the Google Toolbar, but others in the office may (although I don't think so). It sounds like Chrome could be a potential source as well?
-
I think that this is a good idea. But you gotta be careful.
Our competitor (who ranked #1 and we ranked at #2) had their site redesigned and the design company included the noindex on every page. They forgot to take it off when the new design went live. It took them quite a while to figure it out and we enjoyed all of their sales for about a month.
We are #1 now and they are #2. Must have been a bad design job.
-
If the subdomains are added to WMT google will know about it. if you are designing sites for clients and putting them on your site as subdomains it behooves you to make sure 100% that their dev sites are not being seen by Google. It's duplicate content and your subdomain is the original source of this content. Looks unprofessional too
a) verify any subdomain you are creating for a client in WMT
b) block it in robots.txt and noindex nofollow all pages globally
c) for the ones that are already indexed, go into google WMT and go into that subdomain account and request removal of the site in Googles index. This will remove the indexing for that subdomain only don't worry it won't remove your main site from the index.
-
I would also consider adding a noindex tag if you want the urls removed.
-
I agree with Mat. You never know, but yes Chrome could be another major source. It also depends what you set as your privacy when you setup Chrome (Send anonymous usage data to Google, Yes/No ?) and so on.
-
We usually put them behind an .htaccess login now. We've had situations where the development site have been outranking the live site. Great demo of the power of on-site optimisation, but still a bit annoying for the client.
People used to always blame google toolbar for this. Likewise using chrome could potentially add something to the "to crawl" list. I wonder what the respective privacy policies say about that. I've also seen staging sites pick up links. When an external link on the staging site has been clicked it has alerted someone else, appeared as a link back/trackback etc.
-
The discovery can be from multiple mediums. Do you or the client have Google Toolbar installed ?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can Google Crawl This Page?
I'm going to have to post the page in question which i'd rather not do but I have permission from the client to do so. Question: A recruitment client of mine had their website build on a proprietary platform by a so-called recruitment specialist agency. Unfortunately the site is not performing well in the organic listings. I believe the culprit is this page and others like it: http://www.prospect-health.com/Jobs/?st=0&o3=973&s=1&o4=1215&sortdir=desc&displayinstance=Advanced Search_Site1&pagesize=50000&page=1&o1=255&sortby=CreationDate&o2=260&ij=0 Basically as soon as you deviate from the top level pages you land on pages that have database-query URLs like this one. My take on it is that Google cannot crawl these pages and is therefore having trouble picking up all of the job listings. I have taken some measures to combat this and obviously we have an xml sitemap in place but it seems the pages that Google finds via the XML feed are not performing because there is no obvious flow of 'link juice' to them. There are a number of latest jobs listed on top level pages like this one: http://www.prospect-health.com/optometry-jobs and when they are picked up they perform Ok in the SERPs, which is the biggest clue to the problem outlined above. The agency in question have an SEO department who dispute the problem and their proposed solution is to create more content and build more links (genius!). Just looking for some clarification from you guys if you don't mind?
Technical SEO | | shr1090 -
Homepage disappeared from Google Serp
I redirected my domain using this code in .htaccess : RewriteCond %{HTTP_HOST} ^xxxx.com
Technical SEO | | digitalkiddie
RewriteRule (.*) http://www.xxxx.com/$1 [R=301,L]
<ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L] RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]/)index.(html?|php)(?[^\ ])?\ HTTP/
RewriteRule ^(([^/]/)*)index.(html?|php)$ http://www.xxxx.com/$1 [R=301,L]</ifmodule> A day after I did it, got an error in GWMT "Google can't find your site's robots.txt" and my homepage disappeared from the result pages. When I try to open Google cache of the homepage I got an error 404. I generated new robots.txt, uploaded it , now the error doesnt show but still my homepage is not in the serps. Its been 3 days. What should I do ? Thanks in advance "Google can't find your site's robots.txt" error? - Pro ...0 -
Subdomains & CDNs
I've set up a CDN to speed up my domain. I've set up a CNAME to map the subdomain cdn.example.com to the URL where the CDN hosts my static content (images, CSS and JS files, and PDFs). www.example.com and cdn.example.com are now two different IP addresses. Internal links to my PDF files (white papers and articles) used to be www.example.com/downloads but now they are cdn.example.com/downloads The same PDF files can be accessed at both the www and the cdn. subdomain. Thus, external links to the www version will continue to work. Question 1: Should I set up 301 redirects in .htaccess such as: Redirect permanent /downloads/filename.pdf http://cdn.example.com/downloads/filename.pdf Question 2: Do I need to do anything else in my .htaccess file (or anywhere else) to ensure that any SEO benefit provided by the PDF files remains associated with my domain? Question 3: Am I better off keeping my PDF files on the www side and off of the CDN? Thanks, Akira
Technical SEO | | ahirai0 -
Tags showing up in Google
Yesterday a user pointed out to me that Tags were being indexed in Google search results and that was not a good idea. I went into my Yoast settings and checked the "nofollow, index" in my Taxanomies, but when checking the source code for no follow, I found nothing. So instead, I went into the robot.txt and disallowed /tag/ Is that ok? or is that a bad idea? The site is The Tech Block for anyone interested in looking.
Technical SEO | | ttb0 -
How often Google Places updates?
Hi, My first question in SEOMoz :). I have just finished doing citations and optimisations for a local business. I now want to know how often Google Places update their database to look for citations etc to improve rankings? Many Thanks
Technical SEO | | TheReachers0 -
Google preview magnifying glass icon
Why are my search results missing the google preview magnifying glass icon next to them? My site is an adult toys e-commerce/portfolio site. but doesn't have any sexual explicit content especially on the homepage. I used to have Goole preview working a few month ago but have changed the skin a couple of month ago and Preview is not avalable for my site since then. Google's preview testing tool shows a perfect image for my site. url: http://www.funstuff.co.il Thanks, Asaf
Technical SEO | | AsafY0 -
Continued Lack of Google Indexing
I run a baseball site (http://www.mopupduty.com) that is in a very good link neighbourhood. ESPN, The Score, USA Today, MSG Network, The Toronto Star, Baseball Prospectucs, etc etc. New content has not been getting indexed on Google ever since the last update. Site has no dup content, 100% original. I can't think of any spammy links, we get organic links day after day. In the past Google has indexed the site in minutes. It currently has expanded site links within Google search. Bing & Yahoo index the site in minutes. Are there any quick fixes I can make to increase my chance to get indexed by Google. Or just keep pumping out content and hope to see a change in the upcoming future?
Technical SEO | | mkoster1 -
Redirect Flash Site for Google Only - Is this against TOS?
A photographer client has a flash website, purchased as from a (well respected) template company. The main site is at the root domain, and the HTML version is at www.example.com/?load=html If I visit the site on a browser without Flash installed, I am re-directed automatically to the HTML version. I'm concerned as the site has some great links and the HTML version is well optimised, but doesn't appear anywhere in Google for chosen keywords (ranks perfectly for brand related searches). Google is indexing the Flash version of the site, but I would rather it didn't (there's no real content (just Javascript to load the SWF) and all of the pages load under one URL). How can I block the Flash version from Google but still make the incoming links count towards the HTMl version of the site? If I re-direct Google to the HTML version, is this cloaking, and is it frowned upon? Thanks for any advice you can offer.
Technical SEO | | cmaddison0