Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to check if the page is indexable for SEs?
-
Hi, I'm building the extension for Chrome, which should show me the status of the indexability of the page I'm on.
So, I need to know all the methods to check if the page has the potential to be crawled and indexed by a Search Engines. I've come up with a few methods:
- Check the URL in robots.txt file (if it's not disallowed)
- Check page metas (if there are not noindex meta)
- Check if page is the same for unregistered users (for those pages only available for registered users of the site)
Are there any more methods to check if a particular page is indexable (or not closed for indexation) by Search Engines?
Thanks in advance!
-
I understand the difference between what you're doing and what Google shows, I guess I'm just not sure when I'd want to know that something could technically be indexed, but isn't?
I guess I'm not your target market!
Good luck with your tool.
-
With "site:site.com" you can only see if the page is indexED, but to know if it's indexABLE you need to dig deeper. That is why I've decided to automate this process.
As I already told, this gonna be a browser extension, once you got on any page, this ext. automatically checks the page, and show the status (with color, I guess), if this page indexed, if not - it shows if its indexABLE. When I'm looking for linkbuilding resources, this little tool should help a lot
-
Ah, gotcha. Personally, I use Google itself to find out if something is indexable: if it's my own site, I can use Fetch as Google, and the robots.txt tester; if it's another site, you can search for "site:[URL]" to see if Google's indexed it.
I think this tool could be really good if you keep it as an icon and it glows or something if you've accidentally deindexed the page? Then it's helping you proactively.
Hope this helps!
Kristina
-
Actually I'm not. That's why I'm asking, to not to miss this basic stuff, so I really appreciate your advice. Thank you!
If I get your question correctly, you are asking why this extension is need for?
Well, 2 main aims:
-
When I want to check any of pages on my own websites, I just visit the page and see if it's ok with all the robots stuff. (or if it should be closed from robots, see if it really is)
-
For linkbuilding purposes. When I come to the page and see a link from it to external website and I know for sure that I can get the same link to my site, I'm asking myself, if it worth getting link from the page like this, if it's gonna be indexed. Why waste your time on getting links from pages that are closed from indexation.
-
-
Hello Peter,
First of all, thank you for the great ideas.
I don't think it's necessary to call the API, as this check references to only one URL (so no aggressiveness) , I need it to be done as fast as possible. But the idea with Structured Data - bravo!
Thanks a lot!
-
You're probably already doing this, but make sure that all of your tests are using the Googlebot user agent! That could cause different results, especially with the robots.txt check.
A sense check: what is your plugin going to offer over Google Search Console's Fetch as Google and robots.txt Tester?
-
You also can check for HTTP header results for crawling too:
https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tagAlso you can use some of Google services for this. Specially PageSpeed API:
https://developers.google.com/speed/docs/insights/v2/reference/Once you call this API it return JSON with list of blocked resources. It's little bit slower but i found that this is safe. Some hostings have IDS (intruder detection systems) and when some crawl them little bit aggressive they block whole IP or IP range. I know few cases when site is OK to be seen from users, but blocked from Google IP. Webmasters wasn't happy when they discover this. They call hosting few times and got "there isn't issues from our side, we didn't block anything". And 6 hours later they get "seems that another department was blocked this server for few specific IPs".
About checking for logged/nonloged users. You can use StructuredData Testing Tool. Also one call to get JSON with full HTTP response and then compare it with your result.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My product category pages are not being indexed on google can someone help?
My website has been indexed on google and all of its pages can be found on google except for the product category pages - which are where we want our traffic heading to, so this is a big problem for us. Our website is www.skirtinguk.com And an example of a page that isn't being indexed is https://www.skirtinguk.com/product-category/mdf-skirting-board/
Intermediate & Advanced SEO | Jul 1, 2022, 1:09 PM | chelseaskirtinguk0 -
If a page ranks in the wrong country and is redirected, does that problem pass to the new page?
Hi guys, I'm having a weird problem: A new multilingual site was launched about 2 months ago. It has correct hreflang tags and Geo targetting in GSC for every language version. We redirected some relevant pages (with good PA) from another website of our client's. It turned out that the pages were not ranking in the correct country markets (for example, the en-gb page ranking in the USA). The pages from our site seem to have the same problem. Do you think they inherited it due to the redirects? Is it possible that Google will sort things out over some time, given the fact that the new pages have correct hreflangs? Is there stuff we could do to help ranking in the correct country markets?
Intermediate & Advanced SEO | Jun 13, 2018, 4:51 PM | ParisChildress1 -
How long to re-index a page after being blocked
Morning all! I am doing some research at the moment and am trying to find out, just roughly, how long you have ever had to wait to have a page re-indexed by Google. For this purpose, say you had blocked a page via meta noindex or disallowed access by robots.txt, and then opened it back up. No right or wrong answers, just after a few numbers 🙂 Cheers, -Andy
Intermediate & Advanced SEO | Jun 16, 2016, 11:42 AM | Andy.Drinkwater0 -
Why does Google rank a product page rather than a category page?
Hi, everybody In the Moz ranking tool for one of our client's (the client sells sport equipment) account, there is a trend where more and more of their landing pages are product pages instead of category pages. The optimal landing page for the term "sleeping bag" is of course the sleeping bag category page, but Google is sending them to a product page for a specific sleeping bag.. What could be the critical factors that makes the product page more relevant than the category page as the landing page?
Intermediate & Advanced SEO | May 20, 2016, 3:10 PM | Inevo0 -
My blog is indexing only the archive and category pages
Hi there MOZ community. I am new to the QandA and have a question. I have a blog Its been live for months - but I can not get the posts to rank in the serps. Oddly only the categories rank. The posts are crawled it seems - but seen as less important for a reason I don't understand. Can anyone here help with this? See here for what i mean. I have had several wp sites rank well in the serps - and the posts do much better. Than the categories or archives - super odd. Thanks to all for help!
Intermediate & Advanced SEO | Jul 9, 2015, 11:51 AM | walletapp0 -
Whats the best way to remove search indexed pages on magento?
A new client ( aqmp.com.br/ )call me yestarday and she told me since they moved on magento they droped down more than US$ 20.000 in sales revenue ( monthly)... I´ve just checked the webmaster tool and I´ve just discovered the number of crawled pages went from 3.260 to 75.000 since magento started... magento is creating lots of pages with queries like search and filters. Example: http://aqmp.com.br/acessorios/lencos.html http://aqmp.com.br/acessorios/lencos.html?mode=grid http://aqmp.com.br/acessorios/lencos.html?dir=desc&order=name Add a instruction on robots.txt is the best way to remove unnecessary pages of the search engine?
Intermediate & Advanced SEO | Jun 6, 2013, 1:00 PM | SeoMartin10 -
How important is the number of indexed pages?
I'm considering making a change to using AJAX filtered navigation on my e-commerce site. If I do this, the user experience will be significantly improved but the number of pages that Google finds on my site will go down significantly (in the 10,000's). It feels to me like our filtered navigation has grown out of control and we spend too much time worrying about the url structure of it - in some ways it's paralyzing us. I'd like to be able to focus on pages that matter (explicit Category and Sub-Category) pages and then just let ajax take care of filtering products below these levels. For customer usability this is smart. From the perspective of manageable code and long term design this also seems very smart -we can't continue to worry so much about filtered navigation. My concern is that losing so many indexed pages will have a large negative effect (however, we will reduce duplicate content and be able provide much better category and sub-category pages). We probably should have thought about this a year ago before Google indexed everything :-). Does anybody have any experience with this or insight on what to do? Thanks, -Jason
Intermediate & Advanced SEO | Oct 16, 2012, 3:19 PM | cre80 -
How to resolve Duplicate Page Content issue for root domain & index.html?
SEOMoz returns a Duplicate Page Content error for a website's index page, with both domain.com and domain.com/index.html isted seperately. We had a rewrite in the htacess file, but for some reason this has not had an impact and we have since removed it. What's the best way (in an HTML website) to ensure all index.html links are automatically redirected to the root domain and these aren't seen as two separate pages?
Intermediate & Advanced SEO | Oct 7, 2012, 10:05 AM | ContentWriterMicky0