How is Google finding our preview subdomains?
-
I've noticed that Google is able to find, crawl and index preview subdomains we set up for new client sites (e.g. clientpreview.example.com). I know now to use "meta name="robots" and robots.txt) to block the search engines from crawling these subdomains. My question though, is how is Google finding these subdomains? We don't link to these preview domains from anywhere else, so I can't figure out how Google is even getting there.
Does anybody have any insight on this?
-
Thanks for your response Irving. We put some of our preview sites on subdomains of our main domain, but then remove them after the site goes live, so their shouldn't be any duplicate content issues. The main question is just how Google is finding these subdomains.
-
Thanks for the insight guys.
-
I don't specifically use the Google Toolbar, but others in the office may (although I don't think so). It sounds like Chrome could be a potential source as well?
-
I think that this is a good idea. But you gotta be careful.
Our competitor (who ranked #1 and we ranked at #2) had their site redesigned and the design company included the noindex on every page. They forgot to take it off when the new design went live. It took them quite a while to figure it out and we enjoyed all of their sales for about a month.
We are #1 now and they are #2. Must have been a bad design job.
-
If the subdomains are added to WMT google will know about it. if you are designing sites for clients and putting them on your site as subdomains it behooves you to make sure 100% that their dev sites are not being seen by Google. It's duplicate content and your subdomain is the original source of this content. Looks unprofessional too
a) verify any subdomain you are creating for a client in WMT
b) block it in robots.txt and noindex nofollow all pages globally
c) for the ones that are already indexed, go into google WMT and go into that subdomain account and request removal of the site in Googles index. This will remove the indexing for that subdomain only don't worry it won't remove your main site from the index.
-
I would also consider adding a noindex tag if you want the urls removed.
-
I agree with Mat. You never know, but yes Chrome could be another major source. It also depends what you set as your privacy when you setup Chrome (Send anonymous usage data to Google, Yes/No ?) and so on.
-
We usually put them behind an .htaccess login now. We've had situations where the development site have been outranking the live site. Great demo of the power of on-site optimisation, but still a bit annoying for the client.
People used to always blame google toolbar for this. Likewise using chrome could potentially add something to the "to crawl" list. I wonder what the respective privacy policies say about that. I've also seen staging sites pick up links. When an external link on the staging site has been clicked it has alerted someone else, appeared as a link back/trackback etc.
-
The discovery can be from multiple mediums. Do you or the client have Google Toolbar installed ?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocking Google from telemetry requests
At Magnet.me we track the items people are viewing in order to optimize our recommendations. As such we fire POST requests back to our backends every few seconds when enough user initiated actions have happened (think about scrolling for example). In order to eliminate bots from distorting statistics we ignore their values serverside. Based on some internal logging, we see that Googlebot is also performing these POST requests in its javascript crawling. In a 7 day period, that amounts to around 800k POST requests. As we are ignoring that data anyhow, and it is quite a number, we considered reducing this for bots. Though, we had several questions about this:
Technical SEO | | rogier_slag
1. Do these requests count towards crawl budgets?
2. If they do, and we'd want to prevent this from happening: what would be the preferred option? Either preventing the request in the frontend code, or blocking the request using a robots.txt line? The latter question is given by the fact that a in-app block for the request could lead to different behaviour for users and bots, and may be Google could penalize that as cloaking. The latter is slightly less convenient from a development perspective, as all logic is spread throughout the application. I'm aware one should not cloak, or makes pages appear differently to search engine crawlers. However these requests do not change anything in the pages behaviour, and purely send some anonymous data so we can improve future recommendations.0 -
Blocking subdomains with Robots.txt file
We noticed that Google is indexing our pre-production site ibweb.prod.interstatebatteries.com in addition to indexing our main site interstatebatteries.com. Can you all help shed some light on the proper way to no-index our pre-prod site without impacting our live site?
Technical SEO | | paulwatley0 -
Google ignoring the Title Tag?
Anybody seen this too? We have a webpage with tiny different title tag and H1. If you search for let's say "Renovatie", you get to see the title tag "De kostprijs van je renovatie". However, when you search with the search term "Wat kost een renovatie", we see the H1 title in the SERP, which is "Wat kost een renovatie". So that's normal when you search a term that's exact the same as the H1 tag, Google ignores the title tag? N.
Technical SEO | | nans0 -
Removed Subdomain Sites Still in Google Index
Hey guys, I've got kind of a strange situation going on and I can't seem to find it addressed anywhere. I have a site that at one point had several development sites set up at subdomains. Those sites have since launched on their own domains, but the subdomain sites are still showing up in the Google index. However, if you look at the cached version of pages on these non-existent subdomains, it lists the NEW url, not the dev one in the little blurb that says "This is Google's cached version of www.correcturl.com." Clearly Google recognizes that the content resides at the new location, so how come the old pages are still in the index? Attempting to visit one of them gives a "Server Not Found" error, so they are definitely gone. This is happening to a couple of sites, one that was launched over a year ago so it doesn't appear to be a "wait and see" solution. Any suggestions would be a huge help. Thanks!!
Technical SEO | | SarahLK0 -
Google showing wrong title
Hi, Can anyone assist a newbie please? My keyword 'Security Systems' is giving me position 1 on page 1, but the title it is using is not the page title. I am assuming for some reason it has made it up, please see below. The actual title tag says - <colgroup><col width="420"></colgroup>
Technical SEO | | DaddySmurf
| Security systems | wireless | battery powered | Police Approved | CSS | Google is showing - Compound Security Systems: Wireless Security Systems | Battery ... <cite>www.compoundsecurity.co.uk/</cite>Manufacturers & suppliers of The Mosquito Device & Professional industry compliant and Police recommended battery powered wireless security systems.Contact us - Mosquito Anti-Loitering Devices - Security Equipment - Installers<iframe class="SEOmoz-iframe" src="chrome-extension://eakacpaijcpapndcfffdgphdiccmpknp/html/serpbar.html#{" settings":{"dmr":false,"dmt":false,"mr":true,"mt":true,"serp-overlay":true,"toolbar-enabled":true,"subdomain-metrics":false,"serp-panel-open":true,"toolbar-position":"bottom"},"lsdata":{"feid":1245,"fipl":315,"fmrp":4.717881718426137,"fmrr":3.169235978744307e-9,"ftrp":5.245402699173709,"ftrr":8.283399271183101e-8,"fuid":15756,"pda":47.20670031521297,"peid":1266,"pid":315,"pmrp":4.634793943915049,"pmrr":1.137485303730264e-7,"ptrp":4.949510322693934,"ptrr":1.6938187788875737e-7,"puid":15777,"ueid":510,"uemrp":4.693787066897207,"uemrr":2.520503065617982e-10,"uid":1060,"uipl":191,"ujid":1038,"umrp":4.96103855611017,"umrr":5.477715129904515e-10,"upa":55.929113684261466,"utrp":6.066855084050327,"utrr":2.1655670926371268e-10},"user":{"level":"pro","options":{},"proenabled":true,"aclurl":"http:="" moz.com="" users="" level?src="mozbar","loginURL":"https://moz.com/login","logoutURL":"https://moz.com/logout","apiRequest":"url-metrics","cols":"133146078688","expired":false,"accessId":"pro-RGFkZHlTbXVyZg%3D%3D","expires":1374249469,"signature":"M%2Bh%2FXh4OS%2BAMtT2j6kuntWSTvNI%3D","displayName":"DaddySmurf"},"serp":1,"host":"www.compoundsecurity.co.uk"}"" scrolling="no"></iframe> If anyone can tell me how to correct this, i would very much appreciate it. Regards, Si0 -
Why has Google stopped indexing my content?
Mystery of the day! Back on December 28th, there was a 404 on the sitemap for my website. This lasted 2 days before I noticed and fixed. Since then, Google has not indexed my content. However, the majority of content prior to that date still shows up in the index. The website is http://www.indieshuffle.com/. Clues: Google reports no current issues in Webmaster tools Two reconsideration requests have returned "no manual action taken" When new posts are detected as "submitted" in the sitemap, they take 2-3 days to "index" Once "indexed," they cannot be found in search results unless I include url:indieshuffle.com The sitelinks that used to pop up under a basic search for "Indie Shuffle" are now gone I am using Yoast's SEO tool for Wordpress (and have been for years) Before December 28th, I was doing 90k impressions / 4.5k clicks After December 28th, I'm now doing 8k impressions / 1.3k clicks Ultimately, I'm at a loss for a possible explanation. Running an SEOMoz audit comes up with warnings about rel=canonical and a few broken links (which I've fixed in reaction to the report). I know these things often correct themselves, but two months have passed now, and it continues to get progressively worse. Thanks, Jason
Technical SEO | | indieshuffle0 -
Where does Google pull the date stamp?
We're a news media site with content that has been live for a few years. All of a sudden, Google is showing our content (even though no one has touched the file) with a date stamp of '3 days ago'. Even for content that is years old. I checked the date it was last cached, and it doesn't even match. The URLs were last cached on January 16, but the date stamp says '3 days ago.' From where does Google pull the date stamp? Any ideas?
Technical SEO | | Aggie0