How Google Carwler Cached Orphan pages and directory?
-
I have website www.test.com
I have made some changes in live website and upload it to "demo" directory (which is recently created) for client approval.
Now, my demo link will be www.test.com/demo/
I am not doing any type of link building or any activity which pass referral link to www.test.com/demo/
Then how Google crawler find it and cached some pages or entire directory?
Thanks
-
Try putting the URL into Google and see if you find any pages linking to it.
I knew a company that created a test site that was a copy of a live site (made with a specific hosted CMS). Didn't exclude the test site in robots because "we all know we won't link to it so it'll be ok". Site got indexed, and it was because a person at the company was having problems with the implementation of the test site, went to the help forum (which person didn't think would be indexed) and posted the URL to the test site.
I found the above by just putting in the URL of the test site into Google, and I saw the post in the help desk. You might try the same to see if somehow there is a rogue link.
-
Is google crawling our mails?
Is it possible?
-
Yup, correct.
I was certain I'd replied to this
Anyway, you ever notice how the ads in gmail are always relevant to the content of your emails? Google are totally reading them
-
The <conspiracy hat="">side of things was him commenting that Google is sometimes accused of processing everything in Gmail and could have possibly pulled your link to the demo directory from that.</conspiracy>
-
Hi Barry,
Yes, We were used Gmail for reporting.
Is it make any sense??
-
<conspiracy-hat></conspiracy-hat>
Did either you or your client use gmail when you sent him the demo link?
Regardless, Dan's advice to noindex and block the directory from spiders is the future when doing development work.
-
Hi JoelHit,
NO, There is not any single refferal link to "Demo" directory from entire website and also from third party websites.
I am aware about Google Crawling and Indexing Systems.
Thanks.
-
Hi Thetjo,
I know about it.
My question is that how Google Crawl it without any referral link?
Thanks.
-
Hi Dan,
No, i am not exclude "demo" directory from robots.txt for any search engine.
I am not using wordpress its simple stattic HTML website (Not using any type of CMS).
-
Did this actually happen or are we talking about a hypothetical situation here? It could be that there is a link to the demo directory you've overlooked? Has the /demo folder perhaps been used in the past and there were still old links to it?
As a meta-solution to this problem: prevent crawlers and nosy people from accessing the content by adding a .htpasswd login to the area used for client approval.
-
Did you block the /demo/ directory in your robots.txt file? This is step number one to try and ensure they don't get crawled. Also, are you using wordpress? If so, wordpress automatically pings search engines when you add a post and if you use the common sitemap plugin, when it creates the sitemap it submits it automatically to Google, so that's another way Google could have found it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
One Page Design / Single Product Page
I have been working in a project. Create a framework for multi pages that I have So here is the case
Intermediate & Advanced SEO | | Roman-Delcarmen
Most of them are single page product / one page design wich means that I dont have many pages to optimize. All this sites/ pages follow the rules of a landing page optimization because my main goals is convert as many users as I can. At this point I need to optimize the SEO, the basic stuff such as header, descriptions, tittles ect. But most of my traffic is generated by affiliates, which is good beacuse I dont have to worrie to generate traffic but if the affiliate network banned my product, then I lose all my traffic. Put all my eggs in the same basket is not a good idea. Im not an seo guru so that is the reason Im asking whic strategies and tactics can give me results. All kind of ideas are welcome1 -
Should I use noindex or robots to remove pages from the Google index?
I have a Magento site and just realized we have about 800 review pages indexed. The /review directory is disallowed in robots.txt but the pages are still indexed. From my understanding robots means it will not crawl the pages BUT if the pages are still indexed if they are linked from somewhere else. I can add the noindex tag to the review pages but they wont be crawled. https://www.seroundtable.com/google-do-not-use-noindex-in-robots-txt-20873.html Should I remove the robots.txt and add the noindex? Or just add the noindex to what I already have?
Intermediate & Advanced SEO | | Tylerj0 -
Can noindexed pages accrue page authority?
My company's site has a large set of pages (tens of thousands) that have very thin or no content. They typically target a single low-competition keyword (and typically rank very well), but the pages have a very high bounce rate and are definitely hurting our domain's overall rankings via Panda (quality ranking). I'm planning on recommending we noindexed these pages temporarily, and reindex each page as resources are able to fill in content. My question is whether an individual page will be able to accrue any page authority for that target term while noindexed. We DO want to rank for all those terms, just not until we have the content to back it up. However, we're in a pretty competitive space up against domains that have been around a lot longer and have higher domain authorities. Like I said, these pages rank well right now, even with thin content. The worry is if we noindex them while we slowly build out content, will our competitors get the edge on those terms (with their subpar but continually available content)? Do you think Google will give us any credit for having had the page all along, just not always indexed?
Intermediate & Advanced SEO | | THandorf0 -
Possible to Improve Domain Authority By Improving Content on Low Page Rank Pages?
My sites domain authority is only 23. The home page has a page authority of 32. My site consists of about 400 pages. The topic of the site is commercial real estate (I am a real estate broker). A number of the sites we compete against have a domain authority of 30-40. Would our overall domain authority improved if we re-wrote the content for several hundred of pages that had the lowest page authority (say 12-15)? Is the overall domain authority derived by an average of the page authority of each page on a domain? Alternatively could we increase domain authority by setting the pages with the lowest page authority to "no index". By the way our domain is www.nyc-officespace-leader.com Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
When does Google index a fetched page?
I have seen where it will index on of my pages within 5 minutes of fetching, but have also read that it can take a day. I'm on day #2 and it appears that it has still not re-indexed 15 pages that I fetched. I changed the meta-description in all of them, and added content to nearly all of them, but none of those changes are showing when I do a site:www.site/page I'm trying to test changes in this manner, so it is important for me to know WHEN a fetched page has been indexed, or at least IF it has. How can I tell what is going on?
Intermediate & Advanced SEO | | friendoffood0 -
Weird Page switch for a keyword in Google Rankings
Over this past weekend Google switched the page which usually showed in search results for keyword benchmarking. It went from from http://www.apqc.org/benchmarking to http://www.apqc.org/benchmarking-portal/osb. Also on Google the Rankings for the keyword 'benchmarking' sank from 15 to 47 for http://www.apqc.org/benchmarking Just looking for some theories or ideas or anyone that has had this happen to them.
Intermediate & Advanced SEO | | inhouseninja0 -
Google Crawl Rate and Cached version - not updated yet :(
Hi, Ive noticed that Google is not recognizing/crawling the latest changes on pages in my site - last update when viewing Cached version in Google Results is over 2 months ago. So, do I Fetch as Googlebot to force an update? Or do I remove the page's cached version in GWT remove urls? Thanks, B
Intermediate & Advanced SEO | | bjs20100 -
If google ignores links from "spammy" link directories ...
Then why does SEO moz have this list: http://www.seomoz.org/dp/seo-directory ?? Included in that list are some pretty spammy looking sites such as: <colgroup><col width="345"></colgroup>
Intermediate & Advanced SEO | | adriandg
| http://www.site-sift.com/ |
| http://www.2yi.net/ |
| http://www.sevenseek.com/ |
| http://greenstalk.com/ |
| http://anthonyparsons.com/ |
| http://www.rakcha.com/ |
| http://www.goguides.org/ |
| http://gosearchbusiness.com/ |
| http://funender.com/free_link_directory/ |
| http://www.joeant.com/ |
| http://www.browse8.com/ |
| http://linkopedia.com/ |
| http://kwika.org/ |
| http://tygo.com/ |
| http://netzoning.com/ |
| http://goongee.com/ |
| http://bigall.com/ |
| http://www.incrawler.com/ |
| http://rubberstamped.org/ |
| http://lookforth.com/ |
| http://worldsiteindex.com/ |
| http://linksgiving.com/ |
| http://azoos.com/ |
| http://www.uncoverthenet.com/ |
| http://ewilla.com/ |0