Help, a certain directory is not being indexed
-
Before I start, dont expect this to be too easy. This really has me puzzled and am surprised I am still yet to find a solution for it. Get ready.
We have a wordpress website, launched over 6 months ago and have never had an issue getting content such as pages and post pages and categories indexed. However, I some what recently (about 2 months ago) installed a directory plugin (Business Directory Plugin) which lists businesses via unique urls that are accesible from a sub folder. Its these business listings that I absolutely cannot get indexed.
The index page to the directory which links to the business pages is indexed, however for some reason google is not indexing all the listing pages which are linked to from this page. Its not an issue of the content being uncrawlable or at least dont think so as when I run crawlers on my site such as xml sitemap crawlers it finds all the pages including the directory pages so I am sure its not an issue of the search engines not finding the content.
I have created xml sitemaps and uploaded to webmaster tools, tools recongises that there are many pages in the xml sitemap but google continues to only index a small percentage (everything but my business listings).
The directory has been there for about 8 weeks now so I know there is a issue as it should of been indexed by now.
See our main website at www.smashrepairbid.com.au and the business directory index page at www.smashrepairbid.com.au/our-shops/
To throw in a curve ball, in looking into this issue and setting up tools we noticed a lot of 404 error pages (nearly 4,000). We were very confused where these were coming from as they were only being generated from search engines - humans could not access the 404s and so we are guessing se's were firing some javascript code to generate them or something else weird. We could see the 404s in the logs so we know they were legit but again feel it was only search engines, this was validated when we added some rules to robots.txt and we saw the errors in the logs stop. We put the rules in robots txt file to try and stop google from indexing the 404 pages as we could not find anyway to fix the site / code (no idea what is causing them). If you do a site search in google you will see all the pages that are omitted in the results.
Since adding the rules to robots, our impressions shown through tools have jumped right up (increased by 5 times) so thought this was a good indication of improvement but still not getting the results we want.
Does anyone have any clue whats going on or why google and other se's are not indexing this content? Any help would be greatly appreciated and if you need any other information to assist just ask me.
Really appreciate anyone who can spare their time to help me, I sure do need it.
Thanks.
-
OK issue resolved!
Lynn thank you - was the relative url in the canonical tag that played havoc Changing it to absolute is now causing the pages to be indexed.
Lesson learnt.
-
Hey Kane,
The /shops url was a old url that had a directory in it. We blocked it in the robots as it was generating tons of 404 errors. In webmaster tools we can see thousands of 404 errors within that directory so we deleted it all and tried to block se's from throwing the errors (like i described in initial post).
A number of those listing do have very little information however there are a bunch that do have great content which is why I am not sure if that is the case. I will keep an eye on this though and also check about the logs and let you know what that says.
-
Thanks Lynn.
I have taken on your recommendation and changed the canonical tag to be absolute. Thanks for your help we will see how it goes.
-
As Lynn said, relative canonical tags could absolutely cause issues. That said, I'm seeing absolute URLs in the canonical tag now, so you may have fixed that in the past few days.
Also, I do see the Our Shops pages indexed when I search for site:smashrepairbid.com.au, but I don't see any other pages in the /our-shops/ directory aside from www.smashrepairbid.com.au/our-shops/?action=search
Your robots.txt is currently blocking /shops/. I don't think that would cause an issue but would be nice to remove that if it's not needed...
There's almost zero content on the pages I glanced at, eg. http://www.smashrepairbid.com.au/our-shops/1263/bakker-towing/ and http://www.smashrepairbid.com.au/our-shops/1616/coastal-towing-service/. When you look at it from Google's perspective, there's very little value being added by these pages. No unique photos, no phone number, no website, etc. There's a million local business scrapers that have more content than this, so why should they bother indexing these pages?
Try pulling up your logs and seeing if these URLs have been requested by Google's spiders. Here's a good guide from Ian Lurie on how to do that in Excel: http://www.portent.com/blog/analytics/how-to-read-a-web-site-log-file.htm
If the spiders are crawling those shop URLs but aren't indexing them, I think the first thing to do is add way more content to the pages.
-
Hi Trent,
Having a quick look I saw that you have relative urls in your canonical tag and this could be problematic. I think it would be worth making those urls absolute to avoid any confusion on Google's part in determining what page or page version should be indexed.
Cannot say for sure if this is the problem, but worth looking into.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Fixing Index Errors in the new Google Search Console - Help
Hi, So I have started using the new Search Console and for one of my clients, there are a few 'Index Coverage Errors'. In the old version you could simply, analyse, test and then mark any URLs as fixed - does anyone know if that is possible in the new version? There are options to validate errors but no 'mark as fixed' options. Do you need to validate the errors before you can fix them?
On-Page Optimization | | daniel-brooks0 -
Google Treating these URL's as diff, but they are same. please help
Google is treating, below URL's as two different URL's when they are same. How to solve this. Please help. Case 1:/2570/Venture-Capital-and-Capital-Markets/2570/venture-capital-and-capital-marketsCase 2: /xxx/Java-Programming//xxx/Java-ProgrammingPlease help, how to solve this. Thanks in advance
On-Page Optimization | | AnkammaRao0 -
Neglected Blog SEO help
I am wedding & sports photographer in San Antonio. I've been busy with photographing sports (lots of them) & totally neglected SEO work for nearly 2 years. I know it's a broad question but what is the best way to get back on track? Should I create a strong landing page with keyword wedding photographer? Should i make separate websites - one for weddings, one for sports? Is PPC good way to jump start? Do I have website structure problems etc? If anyone can comment, that would be great. I am willing to hire SEO experts so if anyone knows anybody, please let me know My website is www.soobumimphotography.com Thank you in advance
On-Page Optimization | | soobumim0 -
Page Titles For Local - Help on URL Structure
Trying to figure out the best way to construct localized urls for the dental website. For example, If I have the URL:
On-Page Optimization | | Czubmeister
http://www.kooskidental.com/services/cosmetic-dentistry/
and If I want to make it local to the city I would use: http://www.kooskidental.com/services/richardson-tx-cosmetic-dentistry/ But what happens is that I have other options off the menu like: http://www.koooskidental.com/services/richardson-tx-cosmetic-dentistry/teeth-whitening/ But if I am trying to rank for richardson tx teeth whitening, I would have to do http://www.koooskidental.com/services/richardson-tx-cosmetic-dentistry/richardson-tx-teeth-whitening/ But that's pretty long and ugly and I don't think I need richardson-tx in their twice. If I am trying to rank for richardson tx cosmetic dentistry and richardson tx teeth whitening, what would be the best structure for the url's?0 -
How do i block an entire category/directory with robots.txt?
Anyone has any idea how to block an entire product category, including all the products in that category using the robots.txt file? I'm using woocommerce in wordpress and i'd like to prevent bots from crawling every single one of products urls for now. The confusing part right now is that i have several different url structures linking to every single one of my products for example www.mystore.com/all-products, www.mystore.com/product-category, etc etc. I'm not really sure how i'd type it into the robots.txt file, or where to place the file. any help would be appreciated thanks
On-Page Optimization | | bricerhodes0 -
Rankings going down and down. Help!
I just joined a company as an in house seo. When I looked at their rankings I noticed a downward trend. How can I reverse that? I'm currently working on their onsite optimization, but is there anything more that I can do? edit
On-Page Optimization | | EcomLkwd0 -
New adsense account request rejected - need help
I'm moving my company to Australia, shutting down the US company. Google said I had to request a new Adsense account, so I did. They opened the account, I added the same ads, in the same places, and they have rejected my application. What do I do now? The other account has been open since 2004. They never said a word about this before. After two years of working on improvements, now I'm just about destroyed. I need some help, because I thought I knew what I was doing, but obviously not! As usual. their helpful response is no help at all. http://bit.ly/NPACk - there are no G ads on the front page http://bit.ly/V8ubB5 - this is a typical story http://bit.ly/UpTC2r - this is a typical press release As mentioned in our welcome email, we conduct a second review of your AdSense application once AdSense code is placed on your site(s). As a result of this review, we have disapproved your account for the following violation(s): Issues: - Site does not comply with Google policies --------------------- Further detail: Site does not comply with Google policies: We're unable to approve your AdSense application at this time for one of the reasons listed below or another reason listed in our program policies ([https://support.google.com/adsense/bin/topic.py?topic=1271507](https://support.google.com/adsense/bin/topic.py?topic=1271507)). We recommend that you review the information provided below and make the necessary changes to your site. 1\. You need to improve your site’s user experience To ensure a good experience for users and advertisers, publishers participating in the AdSense program are required to adhere to the Webmaster Quality guidelines ([http://www.google.com/support/webmasters/bin/answer.py?answer=35769](http://www.google.com/support/webmasters/bin/answer.py?answer=35769)). These guidelines provide many tips to help you to provide a positive experience for your users. You’ll also find more useful information in this AdSense blog post which highlights five user experience principles: [http://adsense.blogspot.com/2012/10/publisher-insights-part-1-5-principles.html](http://adsense.blogspot.com/2012/10/publisher-insights-part-1-5-principles.html). Applying these principles will help you to provide a great experience for users on your site. 2\. Your site is a chat site which is not compliant with our policy Publishers are encouraged to experiment with a variety of ad placements and ad formats. However, as stated in our program policies ([http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182](http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182)), AdSense publishers may not place ad code, search boxes or search results in chat programs. This includes, but is not limited to, instant messaging (IMs), chat sites and other pages that contains dynamic content. 3\. You need to remove all content that encourages violation of Google product policies Publishers may not provide the means to circumvent the policies of any Google products, such as by allowing users to download YouTube videos, or encourage the violation of Google AdSense policies. Moreover, publishers may not make use of Google brand features such as logos, screenshots, or other distinctive features without our express permission. For more information, please visit our Help Center ([http://support.google.com/adsense/bin/answer.py?hl=en&ctx=as2&answer=1348688&rd=1](http://support.google.com/adsense/bin/answer.py?hl=en&ctx=as2&answer=1348688&rd=1)). 4\. Your site is dedicated to the sale and distribution of term papers We’re happy to see our publishers’ sites full of useful and informative content, however, as stated in our program policies ( [https://www.google.com/adsense/support/as/bin/answer.py?hl=en&answer=105953](https://www.google.com/adsense/support/as/bin/answer.py?hl=en&answer=105953) ), the sale or distribution of term papers, or any other content that is illegal, promotes illegal activity, or infringes on the legal rights of others is not allowed. Please review the AdSense program policies ([http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182](http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182)) to ensure that your site meets all of the requirements for approval. As soon as you’ve made the necessary changes, we’ll be happy to take another look at your application.
On-Page Optimization | | loopyal0 -
Google Indexing
Hi, We recently launched a new version of our site on the Magento platform. I submitted a new sitemap and on the first crawl only 7 pages out of 132 were indexed...a few days later and we now have 107 indexed (phew). My question is this....how on earth do i find out which pages are indexed and more importantly not indexed? For all i know they might be really important ones so I need to be able to identify the missing pages so i can work on getting them indexed. Nic
On-Page Optimization | | nicc19760