Why isn't our new site being indexed?

RobbieD91

We built a new website for a client recently.

It's been live for three weeks. Robots.txt isn't blocking Googlebot or anything.

Submitted a sitemap.xml through Webmasters but we still aren't being indexed.

Anyone have any ideas?

Tom-Anthony

Hey Dirk,

No worries - I visited the question first time today and considered it unanswered as the site is perfectly accessible in California. I like to confirm what Search Console says as that is 'straight from the horses mouth'.

Thanks for confirming that the IP redirect has changed, that is interesting. It is impossible for us to know when that happened - I would have expected thing to get indexed quite fast when it changed.

With the extra info I'm happy to mark this as answered, but would be good to hear from the OP.

Best,

-Tom

DirkC

Hi Tom,

I am not questioning your knowledge - I re-ran the test on webpagetest.org and I see that the site is now accessible for Californian ip (http://www.webpagetest.org/result/150911_6V_14J6/) which wasn't the case a few days ago (check the result on http://www.webpagetest.org/result/150907_G1_TE9/) - so there has been a change on the ip redirection. I also checked from Belgium - the site is now also accessible from here.

I also notice that if I now do a site:woofadvisor.com in Google I get 19 pages indexed rather than 2 I got a few days ago.

Apparently removing the ip redirection solved (or is solving) the indexation issue - but still this question remains marked as "unanswered"

rgds,

Dirk

Tom-Anthony

I am in California right now, and can access the website just fine, which is why I didn't mark the question as answered - I don't think we have enough info yet. I think the 'fetch as googlebot' will help us resolve that.

You are correct that if there is no robots.txt then Google assumes the site is open, but my concern is that the developers on the team say that there IS a robots.txt file there and it has some contents. I have, on at least two occasions, come across a team that was serving a robots.txt that was only accessible to search bots (once they were doing that 'for security', another time because they mis-understood how it worked). That is why I suggested that Search Console is checked to see what shows up for robots.txt.

DirkC

To be very honest - I am quite surprised that this question is still marked as "Unanswered".

The owners of the site decided to block access for all non UK / Ireland adresses. The main Googlebot is using a Californian ip address to visit the site. Hence - the only page Googlebot can see is https://www.woofadvisor.com/holding-page.php which has no links to the other parts of the site (this is confirmed by the webpagetest.org test with Californian ip address)

As Google indicates - Googlebot can also use other IP adresses to crawl the site ("With geo-distributed crawling, Googlebot can now use IP addresses that appear to come from other countries, such as Australia.") - however it's is very likely that these bots do not crawl with the same frequency/depth as the main bot (the article clearly indicates " Google might not crawl, index, or rank all of your locale-adaptive content. This is because the default IP addresses of the Googlebot crawler appear to be based in the USA).

This can easily be solved by adding a link on /holding-page.php to the Irish/UK version which contains the full content (accessible for all ip adresses) which can be followed to index the full site (so - only put the ip detection on the homepage - not on the other pages)

The fact that the robots.txt gives a 404 is not relevant: if no robots.txt is found Google assumes that the site can be indexed (check this link) - quote: "You only need a robots.txt file if your site includes content that you don't want Google or other search engines to index."

Tom-Anthony

I'd be concerned about the 404ing robots.txt file.

You should check in Search Console:

What does Search Console show in the robots.txt section?
What happens if you fetch a page that is no indexed (e.g. https://www.woofadvisor.com/travel-tips.php) with the 'Fetch as Googlebot' tool?

I checked and do not see any obvious indicators of why the pages are not being indexed - we need more info.

DirkC

I just did a quick check on your site with Webpagetest.org with California IP address http://www.webpagetest.org/result/150907_G1_TE9/ - as you can see here these IP's also go to the holding page - which is logically the only page which can be indexed as it's the only one Googlebot can access.

rgds,

Dirk

DirkC

Hi,

I can't access your site in Belgium - I guess you are redirecting your users based on ip address. If , like me, they are not located in your target country they are 302 redirected to https://www.woofadvisor.com/holding-page.php and there is only 1 page that is indexed.

Not sure which country you are actually targeting - but could it be that you're accidentally redirecting Google bot as well?

Check also this article from Google on ip based targeting.

rgds

Dirk

RobbieD91

Strangely, there are two pages indexed on Google Search.

The homepage and one other

EGOL

I noticed the robots.txt file returned a 404 and asked the developers to take a look and they said the content of it is fine.

Sometimes developers say this stuff. If you are getting a 404, demonstrate it to them.

RobbieD91

I noticed the robots.txt file returned a 404 and asked the developers to take a look and they said the content of it is fine.

But yes, I'll doublecheck the WordPress settings now.

TimHolmes

Your sitemap all looked good, but when I tried to view the robots.txt file in your root, it returned a 404 and so was unable to determine if there was an issue. Could any of your settings in your WordPress installation also be causing it to trip over.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Why isn't our new site being indexed?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Search Console Indexed Page Count vs Site:Search Operator page count

Why are only PDFs on my client's site being indexed, and not actual pages?

How to handle New Page/post with site map

301 redirects don't work properly

The use of tabs on productpages, do or don't?

Can we use our existing site content on new site?

Sitemap for pages that aren't on menus

I have a site that has both http:// and https:// versions indexed, e.g. https://www.homepage.com/ and http://www.homepage.com/. How do I de-index the https// versions without losing the link juice that is going to the https://homepage.com/ pages?