Help, a certain directory is not being indexed
-
Before I start, dont expect this to be too easy. This really has me puzzled and am surprised I am still yet to find a solution for it. Get ready.
We have a wordpress website, launched over 6 months ago and have never had an issue getting content such as pages and post pages and categories indexed. However, I some what recently (about 2 months ago) installed a directory plugin (Business Directory Plugin) which lists businesses via unique urls that are accesible from a sub folder. Its these business listings that I absolutely cannot get indexed.
The index page to the directory which links to the business pages is indexed, however for some reason google is not indexing all the listing pages which are linked to from this page. Its not an issue of the content being uncrawlable or at least dont think so as when I run crawlers on my site such as xml sitemap crawlers it finds all the pages including the directory pages so I am sure its not an issue of the search engines not finding the content.
I have created xml sitemaps and uploaded to webmaster tools, tools recongises that there are many pages in the xml sitemap but google continues to only index a small percentage (everything but my business listings).
The directory has been there for about 8 weeks now so I know there is a issue as it should of been indexed by now.
See our main website at www.smashrepairbid.com.au and the business directory index page at www.smashrepairbid.com.au/our-shops/
To throw in a curve ball, in looking into this issue and setting up tools we noticed a lot of 404 error pages (nearly 4,000). We were very confused where these were coming from as they were only being generated from search engines - humans could not access the 404s and so we are guessing se's were firing some javascript code to generate them or something else weird. We could see the 404s in the logs so we know they were legit but again feel it was only search engines, this was validated when we added some rules to robots.txt and we saw the errors in the logs stop. We put the rules in robots txt file to try and stop google from indexing the 404 pages as we could not find anyway to fix the site / code (no idea what is causing them). If you do a site search in google you will see all the pages that are omitted in the results.
Since adding the rules to robots, our impressions shown through tools have jumped right up (increased by 5 times) so thought this was a good indication of improvement but still not getting the results we want.
Does anyone have any clue whats going on or why google and other se's are not indexing this content? Any help would be greatly appreciated and if you need any other information to assist just ask me.
Really appreciate anyone who can spare their time to help me, I sure do need it.
Thanks.
-
OK issue resolved!
Lynn thank you - was the relative url in the canonical tag that played havoc Changing it to absolute is now causing the pages to be indexed.
Lesson learnt.
-
Hey Kane,
The /shops url was a old url that had a directory in it. We blocked it in the robots as it was generating tons of 404 errors. In webmaster tools we can see thousands of 404 errors within that directory so we deleted it all and tried to block se's from throwing the errors (like i described in initial post).
A number of those listing do have very little information however there are a bunch that do have great content which is why I am not sure if that is the case. I will keep an eye on this though and also check about the logs and let you know what that says.
-
Thanks Lynn.
I have taken on your recommendation and changed the canonical tag to be absolute. Thanks for your help we will see how it goes.
-
As Lynn said, relative canonical tags could absolutely cause issues. That said, I'm seeing absolute URLs in the canonical tag now, so you may have fixed that in the past few days.
Also, I do see the Our Shops pages indexed when I search for site:smashrepairbid.com.au, but I don't see any other pages in the /our-shops/ directory aside from www.smashrepairbid.com.au/our-shops/?action=search
Your robots.txt is currently blocking /shops/. I don't think that would cause an issue but would be nice to remove that if it's not needed...
There's almost zero content on the pages I glanced at, eg. http://www.smashrepairbid.com.au/our-shops/1263/bakker-towing/ and http://www.smashrepairbid.com.au/our-shops/1616/coastal-towing-service/. When you look at it from Google's perspective, there's very little value being added by these pages. No unique photos, no phone number, no website, etc. There's a million local business scrapers that have more content than this, so why should they bother indexing these pages?
Try pulling up your logs and seeing if these URLs have been requested by Google's spiders. Here's a good guide from Ian Lurie on how to do that in Excel: http://www.portent.com/blog/analytics/how-to-read-a-web-site-log-file.htm
If the spiders are crawling those shop URLs but aren't indexing them, I think the first thing to do is add way more content to the pages.
-
Hi Trent,
Having a quick look I saw that you have relative urls in your canonical tag and this could be problematic. I think it would be worth making those urls absolute to avoid any confusion on Google's part in determining what page or page version should be indexed.
Cannot say for sure if this is the problem, but worth looking into.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Taking more than a day to index after the content changed
Hi everyone, As i got stuck with the confusion that - one of our website pages for the business located in Sharjah contents has been moderated and inspected the URL to google for index with new tags and contents. This is the URL which made the changes: https://www.socprollect-mea.com/sharjah-free-zone-company-registration/ and As i came to know that our page reflecting an issue "Valid items with warnings" once after inspecting the URL in the search console. Something which seems interesting and never experienced before - which is showing: "Products"warning - something like that. I came to know that - Missing field "Brand" and showing no global identifier. Does anybody know what it is and can u able to rectify this concern and get me a solution to index our URL faster on Google search. please?
On-Page Optimization | | nazfazy0 -
How to find google indexed pages
I can't find where the # of indexed pages are on my google analytics. I tried the instructions below, but the index status was not an option on my dashboard. View the Index Status page: On the Webmaster Tools home page, click the site you want. On the Dashboard, click Google Index, and then click Index Status.
On-Page Optimization | | SoftwareMarketing0 -
Traffic Down - May Need Outside Help
Hi Moz Community - Our website (www.motivators.com) experienced a small traffic drop in mid-March. This was followed by a steady decline in traffic through May, June and July. Please note that a site redesign went live on April 4th. Starting in mid-July, we began implementing aggressive site improvements (mostly based upon site speed), but our traffic is still down. Can anyone recommend a service or company that can look at our site and determine the root cause / more strategies for improvement? Thanks for your suggestions!
On-Page Optimization | | Motivators0 -
Too Many on page links! Will "NoFollow" for navigation help?
I am getting to many on page links ( for all my pages). Here is my website: http://www.websterpowerproducts.co.uk I think it is to do with the the navigation bar down the right hand side. I don't really want to get ride of this as it offers users a way of getting where they want without lots of clicking. I was wondering if adding a "NoFollow" tag to each of they links would stop the link juice getting diluted by the navigation bar. Many Thanks
On-Page Optimization | | WebsterPowerTools0 -
Why Aren’t All My XML Sitemap Images Indexed in Webmaster Tools?
Hi, Here is our main sitemap http://www.vistastores.com/newsitemap/main_sitemap.xml We have submitted all category wise sitemap having Image Tags : For eg - Ac Category http://www.vistastores.com/newsitemap/window_ac_sitemap.xml contains iamge tag - image:imageimage:locimage:captionimage:title</image:title></image:caption></image:loc></image:image> All our 142 category pages includes these format. Still the sitemap report on 4-Apr-2013 says: Sitemaps content Web pages:
On-Page Optimization | | CommercePundit
Submitted 14,569
Indexed 11,219 Images:
Submitted 21,442
Indexed 11,762 You can see major difference in submitted v/s indexed. I have looked into Jay Simpson question - http://www.seomoz.org/q/any-idea-why-our-sitemap-images-aren-t-indexed to find this answer but didn't get Perfect & clear answer. I need urgent answer to fix this issue..... K0NDuw5s.jpg0 -
Google Index/Cashe questions
I have 15k+ pages. I have 4.5k pages indexed. What relation is the google cashe to indexing pages? My site gets cashed every two days. The competition in my SERP goes 2-3weeks to get cashed. What does this indicate? Is your cashe date your last google crawl? How can I get google to crawl my site? Is there a way I can get google to crawl my site starting from an internal page. This way I could set up a better linking structure that would benefit from doing activities that get that page indexed to help get my site indexed more thoroughly...
On-Page Optimization | | JML11790 -
Help on Avoid Keyword Self-Cannibalization
i ran the on page test and i got the avoid keyword self-canabilzation alert one example is for http://www.deporvillage.com/shimano maybe is because a'm not a native speaking english person but i do not understand what it exactly means the targeted keyword for this page is shimano and the alert i got for the links like the one below that i have in my page http://www.deporvillage.com/ciclismo/zapatillas-shimano which has a anchor text like zapatillas ciclismo shimano
On-Page Optimization | | deporvillage0 -
Help I don't understand Rel Canonical
I'm really stuck on how to fix up Rel Canonical errors on a Wordpress site. I went in and changed all the URLs to remove the www and added / to the end. I get this message on page analysis details: <dt>Canonical URL</dt> <dd>"http://www.some-url.com.au/",</dd> <dd>"http://some-url..com.au/", and</dd> <dd>"http://some-url..com.au/"</dd> <dd>Well the first one with the www doesn't exists and the second two urls are the same! (Note that I have removed the actual URL for this post)</dd> <dd>I'm not sure how to read and fix the errors from the reports ether. The only issues I can see is that the 'Tag Value' has the www and the 'Page Title - URL' doesn't have the www.
On-Page Optimization | | zapprabbit
</dd>0