How can a Page indexed without crawled?

atakala

Hey moz fans,
In the google getting started guide it says

**"
Note: **Pages may be indexed despite never having been crawled: the two processes are independent of each other. If enough information is available about a page, and the page is deemed relevant to users, search engine algorithms may decide to include it in the search results despite never having had access to the content directly. That said, there are simple mechanisms such as robots meta tags to make sure that pages are not indexed.
"

How can it happen, I dont really get the point.
Thank you

Devanur-Rafi

Pleasure is all mine my friend. You are most welcome. Moz SEO community is an indispensable asset and weapon in any SEO's inventory in my opinion. We learn a great deal here while helping others. I am really thankful to each and everyone here on Moz community. Long live Moz and Mozzers. YOU ROCK!!

atakala

Ov man, you always come tome with great ideas I never thought about that .
Thank you very much stay rock!

Devanur-Rafi

Yes, of course my friend, Google has to crawl the page to see the page-level meta robots tag but till date I have not seen any page in Google's index that has been blocked using the robots.txt file and page-level meta robots tag. Password protecting your .htaccess file would be an overkill if you just want Google not to index a page. If you want Google to remove any particular page from its index, you can get it done from webmaster tools account. Here you go for more: https://support.google.com/webmasters/answer/1663419?hl=en

Good Luck to you my friend.

Best regards,

Devanur Rafi

atakala

Thank you guyz,

Devanur You've got the point let me correct you at one point.
You can't say google that remove my index just using meta robots tag, because It can't read the meta tag till it crawl.
So only solution looks like .htaccess password protect.
Anyway thanks for your efforts.

Martijn_Scheijbeler

I'm also thinking site maps, but I'm not really sure if they trust them that much to list links in it that they haven't crawled.

Devanur-Rafi

Hi friend,

If a page has been blocked using Robots.txt file, then Google will not crawl and index the page from within the website but what if a reference of that page is found on a third-party website? In cases like this, link discovery will happen and the page will be indexed without a Description snippet and such pages will have the following text in the place of a description in the search results pages:

"A description for this result is not available because of this site's robots.txt – learn more"

So inorder to completely stop Google from crawling and indexing a page, you should should block the page by implementing, page-level meta robots tag.

Here you go for more: https://support.google.com/webmasters/answer/156449?hl=en

Please feel free to post back if you have any other queries in this regards.

Best regards,

Devanur Rafi

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

How can a Page indexed without crawled?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Paginated Pages Page Depth

Can't generate a sitemap with all my pages

Indexing Dynamic Pages

Rel=next/prev for paginated pages then no need for "no index, follow"?

Webmaster or analytics can we find pages that are 404

Why is my Crawl Report Showing Thousands of Pages that Do Not Exist?

What is the best tool to crawl a site with millions of pages?

How do I index these parameter generated pages?