Help, a certain directory is not being indexed
-
Before I start, dont expect this to be too easy. This really has me puzzled and am surprised I am still yet to find a solution for it. Get ready.
We have a wordpress website, launched over 6 months ago and have never had an issue getting content such as pages and post pages and categories indexed. However, I some what recently (about 2 months ago) installed a directory plugin (Business Directory Plugin) which lists businesses via unique urls that are accesible from a sub folder. Its these business listings that I absolutely cannot get indexed.
The index page to the directory which links to the business pages is indexed, however for some reason google is not indexing all the listing pages which are linked to from this page. Its not an issue of the content being uncrawlable or at least dont think so as when I run crawlers on my site such as xml sitemap crawlers it finds all the pages including the directory pages so I am sure its not an issue of the search engines not finding the content.
I have created xml sitemaps and uploaded to webmaster tools, tools recongises that there are many pages in the xml sitemap but google continues to only index a small percentage (everything but my business listings).
The directory has been there for about 8 weeks now so I know there is a issue as it should of been indexed by now.
See our main website at www.smashrepairbid.com.au and the business directory index page at www.smashrepairbid.com.au/our-shops/
To throw in a curve ball, in looking into this issue and setting up tools we noticed a lot of 404 error pages (nearly 4,000). We were very confused where these were coming from as they were only being generated from search engines - humans could not access the 404s and so we are guessing se's were firing some javascript code to generate them or something else weird. We could see the 404s in the logs so we know they were legit but again feel it was only search engines, this was validated when we added some rules to robots.txt and we saw the errors in the logs stop. We put the rules in robots txt file to try and stop google from indexing the 404 pages as we could not find anyway to fix the site / code (no idea what is causing them). If you do a site search in google you will see all the pages that are omitted in the results.
Since adding the rules to robots, our impressions shown through tools have jumped right up (increased by 5 times) so thought this was a good indication of improvement but still not getting the results we want.
Does anyone have any clue whats going on or why google and other se's are not indexing this content? Any help would be greatly appreciated and if you need any other information to assist just ask me.
Really appreciate anyone who can spare their time to help me, I sure do need it.
Thanks.
-
OK issue resolved!
Lynn thank you - was the relative url in the canonical tag that played havoc Changing it to absolute is now causing the pages to be indexed.
Lesson learnt.
-
Hey Kane,
The /shops url was a old url that had a directory in it. We blocked it in the robots as it was generating tons of 404 errors. In webmaster tools we can see thousands of 404 errors within that directory so we deleted it all and tried to block se's from throwing the errors (like i described in initial post).
A number of those listing do have very little information however there are a bunch that do have great content which is why I am not sure if that is the case. I will keep an eye on this though and also check about the logs and let you know what that says.
-
Thanks Lynn.
I have taken on your recommendation and changed the canonical tag to be absolute. Thanks for your help we will see how it goes.
-
As Lynn said, relative canonical tags could absolutely cause issues. That said, I'm seeing absolute URLs in the canonical tag now, so you may have fixed that in the past few days.
Also, I do see the Our Shops pages indexed when I search for site:smashrepairbid.com.au, but I don't see any other pages in the /our-shops/ directory aside from www.smashrepairbid.com.au/our-shops/?action=search
Your robots.txt is currently blocking /shops/. I don't think that would cause an issue but would be nice to remove that if it's not needed...
There's almost zero content on the pages I glanced at, eg. http://www.smashrepairbid.com.au/our-shops/1263/bakker-towing/ and http://www.smashrepairbid.com.au/our-shops/1616/coastal-towing-service/. When you look at it from Google's perspective, there's very little value being added by these pages. No unique photos, no phone number, no website, etc. There's a million local business scrapers that have more content than this, so why should they bother indexing these pages?
Try pulling up your logs and seeing if these URLs have been requested by Google's spiders. Here's a good guide from Ian Lurie on how to do that in Excel: http://www.portent.com/blog/analytics/how-to-read-a-web-site-log-file.htm
If the spiders are crawling those shop URLs but aren't indexing them, I think the first thing to do is add way more content to the pages.
-
Hi Trent,
Having a quick look I saw that you have relative urls in your canonical tag and this could be problematic. I think it would be worth making those urls absolute to avoid any confusion on Google's part in determining what page or page version should be indexed.
Cannot say for sure if this is the problem, but worth looking into.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to fix duplicate content for homepage and index.html
Hello, I know this probably gets asked quite a lot but I haven't found a recent post about this in 2018 on Moz Q&A, so I thought I would check in and see what the best route/solution for this issue might be. I'm always really worried about making any (potentially bad/wrong) changes to the site, as it's my livelihood, so I'm hoping someone can point me in the right direction. Moz, SEMRush and several other SEO tools are all reporting that I have duplicate content for my homepage and index.html (same identical page). According to Moz, my homepage (without index.html) has PA 29 and index.html has PA 15. They are both showing Status 200. I read that you can either do a 301 redirect or add rel=canonical I currently have a 301 setup for my http to https page and don't have any rel=canonical added to the site/page. What is the best and safest way to get rid of duplicate content and merge the my non index and index.html homepages together these days? I read that both 301 and canonical pass on link juice but I don't know what the best route for me is given what I said above. Thank you for reading, any input is greatly appreciated!
On-Page Optimization | | dreservices0 -
I still don't understand how rel=canonical works. Help?
So here's the deal. I write for many different outlets. I also have many different pages on my blog that have duplicates (authorized, of course). On my blog, I have many different pages that redirect to "the original" content. I've only recently discovered the existence of rel=canonical. However I don't understand how it works. I have very specific questions. Can anyone help? If, on my blog, I have a blog post that's the original. And another website has the same content, used with authorization. If I want to tell search engines that the original content is on MY blog, what can I do? Is the only solution to ask the owner of the other blog to add a rel=canonical in the header of the specific post? If, on my blog, I have a blog post that's NOT the original. Do I simply add rel=canonical to the header, then add a link to the original in the body? If, on my blog, I have THE FIRST 300 WORDS of a blog post, then add a link saying "to read the whole article, click here" with a link pointing to the original, do I need to have a rel=canonical tag somewhere? Does it HAVE to be in the header? Can rel=canonical be used in the - What penalties are included with having duplicate content of my work everywhere on the web? I've been trying to find specifics, but can't. Thanks for the help. I'm quite confused, as you can see.
On-Page Optimization | | cedriklizotte0 -
Project Help
Hi guys, I'm working on a project and I need to identify any on-site issues that may be affecting this website: http://www.jerseystamps.com/ If anyone has any thoughts, I would be more than happy to hear them. Thanks guys 🙂
On-Page Optimization | | AAttias0 -
No index, or no index no follow?
Wondering if I could garner some views on this issue please. I'm about to add an affiliate store to a website I own, the site has a couple of pages of unique content (blogs, articles, advice etc on home improvement - all written by my team). Obviously, the affiliate store will not be unique content, it will be made using the datafeeds from cj.com et al, and so I don't want to get any duplicate content type penalties from Google for this store. Should I add a no index to the pages and allow the bots to still crawl them, or should I add no index and no follow? Ideally I would like to get the affiliate store category pages indexed as they will be a mixture of lots of different merchants and be fairly unique. Can Google still mark the site down for duplicate content if it can crawl it, even if it is noindex? Thanks, Carl
On-Page Optimization | | Grumpy_Carl0 -
Disqus Comments or IntenseDebate for eCommerce. Can It HELP???
Anybody using any of those? Do you have example of eCommerce store using it? Will it help ranking? Any thing I should know about those ''Comments'' plug-in? Thank you, BigBlaze
On-Page Optimization | | BigBlaze2050 -
Does Archive pages help in indexation of the site?
Hello, we have an argue internally regarding if we should keep the archive pages on a news site or not. Pro Archive pages help indexation of the news. although not all of us are sure about this. Con archive pages receive from none to little traffic archive pages are source of duplicate content, duplicate titles - which we can manage some how but does it worth? What is your opinion on this topic, should we keep it or not? thanks, Irina
On-Page Optimization | | InformMedia0 -
Index.php + external site added to end of URL
Good day, I have a domain http://www.ecofriendlylink.com. I am trying to resolve the Crawl Diagnostic errors on it. I have several Duplicate Page Content errors. Example 1: (The domain happynewyou is not mine, some Comments from them have been placed on my site. Ecoshop.php is a page on my site). URL: http://www.ecofriendlylink.com Duplicate Page Content: http://www.ecofriendlylink.com/www./happynewyou.com/ecoshop.php Referrer: None. Example 2: URL: http://ecofriendlylink.com/index.php Duplicate Page Content: http://www.ecofriendlylink.com/index.php http://www.ecofriendlylink.com/www./happynewyou.com/index.php Referrer: http://ecofriendlylink.com/ Example 3: is a different problem, but still a Dup Page Error. URL: http://ecofriendlylink.com/water.php Duplicate Page Content: http://www.ecofriendlylink.com/water.php Referrer: http://ecofriendlylink.com/ water.php is a page on my main domain. The www version and the non-www version, if this a problem and something I need to overcome? So please can you advise what I need to do to get rid of this strange external domain name + index.php (as per examples 1 + 2), and explain what I'm doing wrong with Ex 3. Thank you!
On-Page Optimization | | drkevinhogan0 -
Email in Local Search Directory Listings
Hi! I am relatively new to the world of SEO . . . let's say I'm still testing the waters. My role is at the company I work for is Local Search Specialist. I'm claiming/verifying/publishing listings for my employer's clients in attributing search directories. Yada yada yada. You guys are pros. Anyway, I have become apprehensive when listing a client's email in directories besides Google Places. I know that content matching between the client's website, Places, and attributing directories is a must; but spam is the worst. I'm trying to avoid spammers and sketchy directories contact the clients with sales pitches via email as much as possible. Should I create an additional email specifically for these directory listings and keep tabs to see if real people are using it as a means of contact? Not list the email at all? What should I do?!?!?!?!?!?!? Any insight/words of wisdom is appreciated!
On-Page Optimization | | CakeWebsites0