How Google Carwler Cached Orphan pages and directory?
-
I have website www.test.com
I have made some changes in live website and upload it to "demo" directory (which is recently created) for client approval.
Now, my demo link will be www.test.com/demo/
I am not doing any type of link building or any activity which pass referral link to www.test.com/demo/
Then how Google crawler find it and cached some pages or entire directory?
Thanks
-
Try putting the URL into Google and see if you find any pages linking to it.
I knew a company that created a test site that was a copy of a live site (made with a specific hosted CMS). Didn't exclude the test site in robots because "we all know we won't link to it so it'll be ok". Site got indexed, and it was because a person at the company was having problems with the implementation of the test site, went to the help forum (which person didn't think would be indexed) and posted the URL to the test site.
I found the above by just putting in the URL of the test site into Google, and I saw the post in the help desk. You might try the same to see if somehow there is a rogue link.
-
Is google crawling our mails?
Is it possible?
-
Yup, correct.
I was certain I'd replied to this
Anyway, you ever notice how the ads in gmail are always relevant to the content of your emails? Google are totally reading them
-
The <conspiracy hat="">side of things was him commenting that Google is sometimes accused of processing everything in Gmail and could have possibly pulled your link to the demo directory from that.</conspiracy>
-
Hi Barry,
Yes, We were used Gmail for reporting.
Is it make any sense??
-
<conspiracy-hat></conspiracy-hat>
Did either you or your client use gmail when you sent him the demo link?
Regardless, Dan's advice to noindex and block the directory from spiders is the future when doing development work.
-
Hi JoelHit,
NO, There is not any single refferal link to "Demo" directory from entire website and also from third party websites.
I am aware about Google Crawling and Indexing Systems.
Thanks.
-
Hi Thetjo,
I know about it.
My question is that how Google Crawl it without any referral link?
Thanks.
-
Hi Dan,
No, i am not exclude "demo" directory from robots.txt for any search engine.
I am not using wordpress its simple stattic HTML website (Not using any type of CMS).
-
Did this actually happen or are we talking about a hypothetical situation here? It could be that there is a link to the demo directory you've overlooked? Has the /demo folder perhaps been used in the past and there were still old links to it?
As a meta-solution to this problem: prevent crawlers and nosy people from accessing the content by adding a .htpasswd login to the area used for client approval.
-
Did you block the /demo/ directory in your robots.txt file? This is step number one to try and ensure they don't get crawled. Also, are you using wordpress? If so, wordpress automatically pings search engines when you add a post and if you use the common sitemap plugin, when it creates the sitemap it submits it automatically to Google, so that's another way Google could have found it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Different snippet in Google for same page
Hello, I have a question regarding the snippet of a specific case: When i search the homepage by searching the business name, i find the correct snippet of the homepage (with the meta description that was entered). If i search it via site:www. it still show the default meta description. Has anybody had experience with this? Is there a way to change the snippet of site:www.? Does it influence SEO? Thank you!
Intermediate & Advanced SEO | | conversal0 -
Google is indexing the wrong page
Hello, I have a site I am optimizing and I cant seem to get a particular listing onto the first page due to the fact google is indexing the wrong page. I have the following scenario. I have a client with multiple locations. To target the locations I set them up with URLs like this /<cityname>-wedding-planner.</cityname> The home page / is optimized for their port saint lucie location. the page /palm-city-wedding-planner is optimized for the palm city location. the page /stuart-wedding-planner is optimized for the stuart location. Google picks up the first two and indexes them properly, BUT the stuart location page doesnt get picked up at all, instead google lists / which is not optimized at all for stuart. How do I "let google know" to index the stuart landing page for the "stuart wedding planner" term? MOZ also shows the / page as being indexed for the stuart wedding planner term as well but I assume this is just a result of what its finding when it performs its searches.
Intermediate & Advanced SEO | | mediagiant0 -
Investigating Google's treatment of different pages on our site - canonicals, addresses, and more.
Hey all - I hesitate to ask this question, but have spent weeks trying to figure it out to no avail. We are a real estate company and many of our building pages do not show up for a given address. I first thought maybe google did not like us, but we show up well for certain keywords 3rd for Houston office space and dallas office space, etc. We have decent DA and inbound links, but for some reason we do not show up for addresses. An example, 44 Wall St or 44 Wall St office space, we are no where to be found. Our title and description should allow us to easily picked up, but after scrolling through 15 pages (with a ton of non relevant results), we do not show up. This happens quite a bit. I have checked we are being crawled by looking at 44 Wall St TheSquareFoot and checking the cause. We have individual listing pages (with the same titles and descriptions) inside the buildings, but use canonical tags to let google know that these are related and want the building pages to be dominant. I have worked though quite a few tests and can not come up with a reason. If we were just page 7 and never moved it would be one thing, but since we do not show up at all, it almost seems like google is punishing us. My hope is there is one thing that we are doing wrong that is easily fixed. I realize in an ideal world we would have shorter URLs and other nits and nats, but this feels like something that would help us go from page 3 to page 1, not prevent us from ranking at all. Any thoughts or helpful comments would be greatly appreciated. http://www.thesquarefoot.com/buildings/ny/new-york/10005/lower-manhattan/44-wall-st/44-wall-street We do show up one page 1 for this building - http://www.thesquarefoot.com/buildings/ny/new-york/10036/midtown/1501-broadway, but is the exception. I have tried investigating any differences, but am quite baffled.
Intermediate & Advanced SEO | | AtticusBerg10 -
Page disappears from search results when Google geographic location is close to offline physical location
If you use Google to search georgefox.edu for "doctor of business administration", the first search result is http://www.georgefox.edu/business/dba/ - I'll refer to this page as the DBA homepage from here on. The second page is http://www.georgefox.edu/offices/sfs/grad/tuition/business/dba/ - I'll refer to this page as the DBA program costs page from here on. Search: https://www.google.com/search?q=doctor+of+business+administration+site%3Ageorgefox.edu This appears to hold true no matter what your geographic location is set to on Google. George Fox University is located in Newberg, Oregon. If you search for "doctor of business administration" with your geographic location set to a location beyond a certain distance away from Newberg, Oregon, the first georgefox.edu result is the DBA homepage. Set your location on Google to Redmond, Oregon
Intermediate & Advanced SEO | | RCF
Search: https://www.google.com/search?q=doctor+of+business+administration But, if you set your location a little closer to home, the DBA homepage disappears from the top 50 search results on Google. Set your location on Google to Newberg, Oregon
Search: https://www.google.com/search?q=doctor+of+business+administration Now the first georgefox.edu page to appear in the search results is the DBA program costs page. Here are the locations I have tested so far: First georgefox.edu search result is the DBA homepage Redmond, OR Eugene, OR Boise, ID New York, NY Seattle, WA First georgefox.edu search result is the DBA program costs page Newberg, OR Portland, OR Salem, OR Gresham, OR Corvallis, OR It appears that if your location is set to within a certain distance of Newberg, OR, the DBA homepage is being pushed out of the search results for some reason. Can anyone verify these results? Does anyone have any idea why this is happening?0 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
Adding Orphaned Pages to the Google Index
Hey folks, How do you think Google will treat adding 300K orphaned pages to a 4.5 million page site. The URLs would resolve but there would be no on site navigation to those pages, Google would only know about them through sitemap.xmls. These pages are super low competition. The plot thickens, what we are really after is to get 150k real pages back on the site, these pages do have crawlable paths on the site but in order to do that (for technical reasons) we need to push these other 300k orphaned pages live (it's an all or nothing deal) a) Do you think Google will have a problem with this or just decide to not index some or most these pages since they are orphaned. b) If these pages will just fall out of the index or not get included, and have no chance of ever accumulating PR anyway since they are not linked to, would it make sense to just noindex them? c) Should we not submit sitemap.xml files at all, and take our 150k and just ignore these 300k and hope Google ignores them as well since they are orhpaned? d) If Google is OK with this maybe we should submit the sitemap.xmls and keep an eye on the pages, maybe they will rank and bring us a bit of traffic, but we don't want to do that if it could be an issue with Google. Thanks for your opinions and if you have any hard evidence either way especially thanks for that info. 😉
Intermediate & Advanced SEO | | irvingw0 -
Top Google News Result = Search Result #11 (first on second page)
Hey all, I've noticed that, in most cases, when we have an article that gets the top spot in Google News results for a given keyword, the search result for that same article will appear in position #11 (the first result on the second page for standard SERP viewing). This is nearly always the case, which suggests its built into Google's algorithm to prevent overlap. Has anyone else experienced this? I haven't seen it discussed previously on Moz or other SEO forums, but it makes sense. Or if you haven't experienced this, I'd love to hear about what you're seeing.
Intermediate & Advanced SEO | | dangaul0 -
Why does this page not show in google at all?
www.lavenderblue-flowers.co.uk Sorry for formatting, below is the source. There are alot of blocks from robots.txt but is there anything easily rectified to get this site SOME visibility? Duplicate content maybe PANDA had it? No backlink profile too which isnt helping but even still, surprising to see a domain auth of 1. Thanks in advance for any responses. DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta http-equiv="expires" content="Fri, 17 Jun 2011 12:06:27 GMT"><title>Bridport Interflora Florist, Lavender Blue, Dorset, DT16 3XDtitle><meta name="description" content="Lavender Blue in Bridport, Dorset, DT16 3XD delivers to Interflora florist based in Bridport is a well established family run business with a dedicated team of florists. We specialise in beautiful wedding flowers and take great pride in our funeral tributes, floral arrangements designed for any occasion for local, national and worldwide delivery."><meta name="keywords" content="Bridport,Interflora Florist,Lavender Blue,Dorset,DT16 3XD"><meta name="abstract" content="Interflora florist based in Bridport is a well established family run business with a dedicated team of florists. We specialise in beautiful wedding flowers and take great pride in our funeral tributes, floral arrangements designed for any occasion for local, national and worldwide delivery."><meta name="robots" content="index,nofollow"><link rel="stylesheet" type="text/css" href="/kernel/styles/print.css?new=new" media="print"><link rel="stylesheet" href="/kernel/styles/d4.css?designtype=d4;theme=blue;" type="text/css"><style type="text/css">style><script language="JavaScript1.2" src="/kernel/utils.js?new" type="text/javascript">script><script language="JavaScript1.2" type="text/javascript" src="/kernel/interflora.js?head=1;si=1000343;">script><script language="JavaScript1.2" type="text/javascript">script><script language="javascript"> var b_site_url = getcookie('b_site_url');if (b_site_url != "" && !getcookie('referral_id') && location.protocol == 'http:' && b_site_url != location.host && location.pathname.indexOf('catalog2') == -1) location.href = location.protocol + "//" + b_site_url + location.pathname + location.search;script>head><body><img border="0" src="/kernel/images/speck.gif" width="1" height="1" alt class="nospace"><div id="page-body"><table class="page-topbanner" border="0" cellpadding="0" cellspacing="0"><tr><td background="/kernel/images/d4/border-blue_03.gif" align="left" valign="top"><img src="/kernel/images/d4/border-blue_01.gif" alt>td><td colspan="2" style="background-image: url(/kernel/images/d4/border-blue_03.gif); background-position: top; background-repeat: repeat-x;"><img src="/kernel/images/speck.gif" width="300" height="50" alt>td><td align="right" valign="top" background="/kernel/images/d4/border-blue_03.gif"><img src="/kernel/images/d4/border-blue_04.gif" alt>td>tr><tr><td style="background-image: url(/kernel/images/d4/border-blue_05.gif); background-repeat: repeat-y;" align="left" valign="top"><img src="/kernel/images/d4/border-blue_01b.gif" alt>td><td valign="top" class="sd-image_only" id="sd-logo_store" colspan="1" rowspan="1"><img src="/kernel/imageload?ttl2=15;table=content_images;key1=fd_img_2606422_1" alt="" title="">td><td class="logo-if" align="right"><img src="/kernel/images/logo-if.png" alt="interflora.co.uk the flower experts™">td><td style="background-image: url(/kernel/images/d4/border-blue_07.gif); background-position: right; background-repeat: repeat-y;"> td>tr><tr><td style="background-image: url(/kernel/images/d4/border-blue_05.gif); background-position: left; background-repeat: repeat-y;" colspan="3" align="center"><table id="website" cellspacing="0" border="0" align="center"><tr><td colspan="3" id="fol_address">1 Lilliput Lane, Bridport, Dorset, DT16 3XDtd>tr><tr><td id="email" colspan="3"><b>Email:b> lavenderblueflowers@hotmail.co.uktd>tr><tr><td style="padding-right:10px;"><b>Phone:b> 01308 459145td><td style="padding-right:10px;"><b>Fax:b> 01308 458417td>tr>table>td><td style="background-image: url(/kernel/images/d4/border-blue_07.gif); background-position: right; background-repeat: repeat-y;"> td>tr><tr><td style="background-image: url(/kernel/images/d4/border-blue_05.gif); background-position: left; background-repeat: repeat-y;" colspan="3" align="center"><div class="page-topmenu"><table class="page-topmenu" cellspacing="0"><tr><td id="account"><a href="/myaccount/"><img src="/kernel/images/d4/icon-account.gif" style="margin: 3px 3px 4px 3px; vertical-align: middle;" width="15" height="13" alt="My Account">My Accounta>td><td id="menu"><a href="/">Homea><img class="bullet" src="/kernel/images/speck.gif" width="2" height="2" alt style="margin: 10px 4px 10px 4px;"><a href="/page.xml?page_name=about">About Usa><img class="bullet" src="/kernel/images/speck.gif" width="2" height="2" alt style="margin: 10px 4px 10px 4px;"><a href="/page.xml?page_name=delivery">Delivery Infoa><img class="bullet" src="/kernel/images/speck.gif" width="2" height="2" alt style="margin: 10px 4px 10px 4px;"><a href="/page.xml?page_name=contactus">Contact Usa>td><td id="cart"><a href="/shopcart/"><img src="/kernel/images/d4/icon-shopcart.gif" style="margin: 3px; vertical-align: middle;" width="14" hieght="14" alt="Shopping Basket">Shopping Basketa>td>tr>table>div>td><td style="background-image: url(/kernel/images/d4/border-blue_07.gif); background-position: right; background-repeat: repeat-y;"> td>tr>table><p id="browser-warning" style="display: block; padding: 2px; border: 2px solid #FC9F85; margin: 0px; background-color: #FDFFC4;"><b>For your information:b> This message has appeared because we've noticed your browser doesn't fully support all functions of this site. For further information please <a href="/page.xml?page_name=faq">click herea>.p><script language="JavaScript1.2" type="text/javascript">var theBrowser = navigator.userAgent.toLowerCase();if(is_nav7up || (parseInt(is_moz_ver) >= 1) || is_ie5_5up || theBrowser.indexOf("safari") != -1) {hideElement('browser-warning',0);}script><table class="body" border="0" cellspacing="0" cellpadding="0"><tr><td align="left" valign="bottom" style="background-image: url(/kernel/images/d4/border-blue_05.gif); background-position: left; background-repeat: repeat-y;"><img src="/kernel/images/d4/border-blue_05.gif" alt>td><td class="menu" valign="top"><img src="/kernel/images/speck.gif" width="150" height="1" border="0" alt><br><form method="get" action="/search/index.xml" id="leftnav_search"><table border="0" cellspacing="0" class="global-search"><tr><th colspan="2">SEARCHth>tr><tr><td width="50%"><input class="text" type="text" name="keywords1" id="search" value maxlength="50" size="15">td><td align="left"><input type="submit" class="button" name="search" id="search" value="GO">td>tr><tr><td colspan="2" align="left"><a href="/search/advanced_search.xml">Advanced Searcha>td>tr>table>form><div class="menusection"><a class="menuParent_off" id="parentcat_2003443" href="/catalog/category.xml?category_id=2003443"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">Anniversaryspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_2003443">div><a class="menuParent_off" id="parentcat_2003453" href="/catalog/category.xml?category_id=2003453"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">Congratulationsspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_2003453">div><a class="menuParent_off" id="parentcat_4" href="/category/flower-arrangements/"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">All Flower Bouquetsspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_4">div><a class="menuParent_off" id="parentcat_2003493" href="/catalog/category.xml?category_id=2003493"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">Sympathy & Funeralspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_2003493">div><a class="menuParent_off" id="parentcat_2003463" href="/catalog/category.xml?category_id=2003463"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">Thank Youspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_2003463">div><a class="menuParent_off" id="parentcat_2001478" href="/category/same-day-flowers/"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">Same Day Flower Deliveryspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_2001478">div><a class="menuParent_off" id="parentcat_2124203" href="/category/summer_flowers/"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">Summer Flowersspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_2124203">div><a class="menuParent_off" id="parentcat_2003403" href="/category/luxury-flowers/"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">Luxury Flowersspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_2003403">div><a class="menuParent_off" id="parentcat_1000343" href="/catalo
Intermediate & Advanced SEO | | ewanstevenson0