Page not being indexed or crawled and no idea why!
-
Hi everyone,
There are a few pages on our website that aren't being indexed right now on Google and I'm not quite sure why. A little background:
We are an IT training and management training company and we have locations/classrooms around the US. To better our search rankings and overall visibility, we made some changes to the on page content, URL structure, etc. Let's take our Washington DC location for example. The old address was:
http://www2.learningtree.com/htfu/location.aspx?id=uswd44
And the new one is:
http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training
All of the SEO changes aren't live yet, so just bear with me. My question really regards why the first URL is still being indexed and crawled and showing fine in the search results and the second one (which we want to show) is not. Changes have been live for around a month now - plenty of time to at least be indexed.
In fact, we don't want the first URL to be showing anymore, we'd like the second URL type to be showing across the board. Also, when I type into Google site:http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training I'm getting a message that Google can't read the page because of the robots.txt file. But, we have no robots.txt file. I've been told by our web guys that the two pages are exactly the same. I was also told that we've put in an order to have all those old links 301 redirected to the new ones. But still, I'm perplexed as to why these pages are not being indexed or crawled - even manually submitted it into Webmaster tools.
So, why is Google still recognizing the old URLs and why are they still showing in the index/search results?
And, why is Google saying "A description for this result is not available because of this site's robots.txt"
Thanks in advance!
- Pedram
-
Hi Mike,
Thanks for the reply. I'm out of the country right now, so reply might be somewhat slow.
Yes, we have links to the pages on our sitemaps and I have done fetch requests. I did a check now and it seems that the niched "New York" page is being crawled now. Might have been a time issue as you suggested. But, our DC page still isn't being crawled. I'll check up on it periodically and see the progress. I really appreciate your suggestions - it's already helping. Thank you!
-
It possibly just hasn't been long enough for the spiders to re-crawl everything yet. Have you done a fetch request in Webmaster Tools for the page and/or site to see if you can jumpstart things a little? Its also possible that the spiders haven't found a path to it yet. Do you have enough (or any) pages linking into that second page that isn't being indexed yet?
-
Hi Mike,
As a follow up, I forwarded your suggestions to our Webmasters. The adjusted the robots.txt and now reads this, which I think still might cause issues and am not 100% sure why this is:
User-agent: * Allow: /htfu/ Disallow: /htfu/app_data/ Disallow: /htfu/bin/ Disallow: /htfu/PrecompiledApp.config Disallow: /htfu/web.config Disallow: / Now, this page is being indexed: http://www2.learningtree.com/htfu/uswd74/alexandria/it-and-management-training But, a more niched page still isn't being indexed: http://www2.learningtree.com/htfu/usny27/new-york/sharepoint-training Suggestions?
-
The pages in question don't have any Meta Robots Tags on them. So once the Disallow in Robots.txt is gone and you do a fetch request in Webmaster Tools, the page should get crawled and indexed fine. If you don't have a Meta Robots Tag, the spiders consider it Index,Follow. Personally I prefer to include the index, follow tag anyway even if it isn't 100% necessary.
-
Thanks, Mike. That was incredibly helpful. See, I did click the link on the SERP when I did the "site" search on Google, but I was thinking it was a mistake. Are you able to see the disallow robot on the source code?
-
Your Robots.txt (which can be found at http://www2.learningtree.com/robots.txt) does in fact have Disallow: /htfu/ which would be blocking http://www2.learningtree.com**/htfu/**uswd44/reston/it-and-management-training from being crawled. While your old page is also technically blocked, it has been around longer and would already have been cached so will still appear in the SERPs.... the bots just won't be able to see changes made to it because they can't crawl it.
You need to fix the disallow so the bots can crawl your site correctly and you should 301 your old page to the new one.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
On-site duplication working - not penalised - any ideas?
I've noticed a website that has been set up with many virtually identical pages. For example many of them have the same content (minimal text, three video clips) and only the town name varies. Surely this is something that Google would be against? However the site is consistently ranking near the top of Google page 1, e.g. http://www.maxcurd.co.uk/magician-guildford.html for "magician Guildford", http://www.maxcurd.co.uk/magician-ascot.html for "magician Ascot" and so on (even when searching without localisation or personalisation). For years I've heard SEO experts say that this sort of thing is frowned on and that they will get penalised, but it never seems to happen. I guess there must be some other reason that this site is ranked highly - any ideas? The content is massively duplicated and the blog hasn't been updated since 2012 but it is ranking above many established older sites that have lots of varied content, good quality backlinks and regular updates. Thanks.
White Hat / Black Hat SEO | | MagicianUK0 -
Better ranking competitors have paid links from blog pages
I have a trial of all the tools at the moment and it's a lot of fun. I have been delving into site explorer and found that some competitors have links to them from obvious seo promoting paid blog sites. One has no other links except a paid for blog from a site that openly admits it offers paid marketing and they shot up to 4th on page one for a main keyword phrase. The info from moz and matt cuts video's say not to do this, but it's so tempting. The blog is well written, while I sit here and do the right thing, my competitors have page one. If the blog is well written and is meaningful is it OK and if google ever decide it's paid and don't like it, wouldn't it be better to be page one for 6 months and then recover? I'd love to give the link to the seo, blogger thingy but don't want to come across as promoting it in any way. I am sure there are loads of them anyway.
White Hat / Black Hat SEO | | Peter24680 -
Pages linked with Spam been 301 redirected to 404\. Is it ok
Pl suggest, some pages having some spam links pointed to those pages are been redirected to 404 error page (through 301 redirect) - as removing them manually was not possible due to part of core component of cms and many other coding issue, the only way as advised by developer was making 301 redirect to 404 page. Does by redirecting these pages to 404 page using 301 redirect, will nullify all negative or spam links pointing to them and eventually will remove the resulting spam impact on the site too. Many Thanks
White Hat / Black Hat SEO | | Modi0 -
Oh sh@t Wetherby Racecourse has been de indexed by Google :-(
Dio mio! Wetherby racecourse <cite>www.wetherbyracing.co.uk/</cite> has been de indexed by Google, re indexing request has been made via webmaster tools and the offending 3rd party banner ad has been stripped out. So my question is please. How long will it take approximately to re -index?
White Hat / Black Hat SEO | | Nightwing
And is it true re submitting an updated xml site & firing tweets at the ailing site may spark it back into life? Grazie tanto,David0 -
If Google Authorship is used for every page of your website, will it be penalized?
Hey all, I've noticed a lot of companies will implement Google Authorship on all pages of their website, ie landing pages, home pages, sub pages. I'm wondering if this will be penalized as it isn't a typical authored piece of content, like blogs, articles, press releases etc. I'm curious as I'm going to setup Google Authorship and I don't want it to be setup incorrectly for the future. Is it okay to tie each page (home page, sub pages) and not just actual authored content (blogs, articles, press releases) or will it get penalized if that occurs? Thanks and much appreciated!
White Hat / Black Hat SEO | | MonsterWeb280 -
Google Results Pages. after the bomb
So, ever since Google "went nuclear" a few weeks ago I have seen major fluctuations in search engine results pages. Basically what I am seeing is a settling down and RE-inclusion of some of my web properties. Basically I had a client affected by the hack job initially, but about a week later I not only saw my original ranking restored but a litany of other long tails appeared. I wasn't using any shady link techniques but did have considerable article distribution that MAY have connected me inadvertently to some of the "bad neighborhoods." The website itself is a great site with original relevant content, so if it is possible, Google definitely recognized some error in their destructive ranking adjustment and is making good on it for those sites that did not deserve the penalty. Alternatively, it could just be random Google reordering and I got lucky. What are your experiences with the updates?
White Hat / Black Hat SEO | | TheGrid0 -
Need clarification on what is a landing page vs. doorway page
Hello everyone - I just became a PRO member today and wanted to say hello and ask this question... I am launching a new product, but 6 months before I created 4 different domains with landing pages to "prime" my SEO for the keywords I am trying to pursue. Now that I have launched my new product, it resides on the main domain name (let's call it "MainDomain.com"). Here's my dilemma... I want to create landing pages on each of the different domains for my PPC and optimized organic search traffic. For example, on one of the other domains (let's call it "LandingDomain1.com"), I have created a page to optimize for the keyword "event planning software" and sending my PPC traffic for "event planning software" there as well as my email campaigns. This page has original content that I have written for it (it's not duplicate content used elsewhere), but it also has navigation and links pointing to MainDomain.com, which is where we convert and collect registrations. My question is, will this activity be considered a doorway page even though I'm using it for a landing page for a particular audience? And, if it could be considered a doorway page, would I be better off moving all these optimized landing pages to my MainDomain.com and then doing a 301 redirect from those other domains to the MainDomain.com. Your input is much appreciated ... thanks.
White Hat / Black Hat SEO | | DenverDude1