What would be considered a bad ratio to determine Index Bloat?
-
I am using Annie Cushing's most excellent site audit checklist from Google Docs. My question concerns Index Bloat because it is mentioned in her "Index" tab.
We have 6,595 indexed pages and only 4,226 of those pages have received 1 or more visits since January 1 2013.
Is this an acceptable ratio? If not, why not and what would be an acceptable ratio? I understand the basic concept that "dissipation of link juice and constrained crawl budget can have a significant impact on SEO traffic." [Thanks to Reid Bandremer http://www.lunametrics.com/blog/2013/04/08/fifteen-minute-seo-health-check/#sr=g&m=o&cp=or&ct=-tmc&st=(opu%20qspwjefe)&ts=1385081787]
If we make this an action item I'd like to have some idea how to prioritize it compared to other things that must be done. Thanks all!
-
Hi EGOL,
Wow, thank you so very much. This is one of the best answers I've ever received, probably the best, here in Q & A. Your thoughtful comments and suggestions are so appreciated. Honestly, you gave me a check list of things that have potential to be pure gold for us if we act on them.
Yes, you are correct, this is the site that had many issues with content being under tabs. It's also got a tremendous amount of duplicate and thin content issues, in addition to orphaned pages. Progress has been coming along, slowly and surely, but having your comments, and having them be so specific, pointed and concise are something I can take to my team and say "Here's an awesome check list of things that we can actually address right now, without re-platforming the site [you know, there are always people who think that the root of all a site's problems is the platform that it's on...pure mythology]."
I hope many others find your check list useful. Combined with Annie's audit spreadsheet in Google docs, I feel like I have the tools I need to go to battle and help this site fulfill its potential. Nearly every point you mentioned struck a chord. Better yet, now that I know my way around the "guts" of this homegrown CMS, I feel like I can actually make the necessary changes.
Egol, I really can't thank you enough.
-
I totally agree Keri. Every word Egol wrote , to me, is worth its weight in gold. I think this may be the best response I have ever received here in Q & A.
-
If only people realized how much good information members drop in Q&A...
Once again, thanks for this EGOL!
-
From my experience, that is a frightening number of pages that have not received a visit. I would definitely be taking some type of action. This hits to me like a site in very bad health. I have lots of little pages on a weak little site that get a lot more traffic than none since January. This would be high on my priority list of things to solve. Solving this could bring major income so this is potential opportunity as much as it is a problem.
To diagnose, I would check.... I know you and suspect that you have looked at all of these but just making a list, just in case.
A) Duplicate content problem? Does this site have lots of pages with very similar other pages on the same site. Does the company have another site that is running the same product descriptions? Does the site run product descriptions that are used from a datafeed supplied to vendors? Are affiliates using the same content? Have other websites stolen the content?
B) Have you been scraped and republished by a strong website? Just one is all it would take. A strong site was once scraping and republishing some of my short content pages and that killed the traffic into a section of my site. As soon as I asked them to stop traffic was back within days. One site can hurt you like that or numerous small sites - even minor sites in Asia can do this.
C) Lots of thin content? Do you have a lot of pages that might only have two or three unique sentences? Google could be disrespecting your entire site because of this.
D) Technical problem? I would be looking at robots.txt and .htaccess, noindex, badly coded links, content management system causing duplicated title tags or other problems? Faulty analyitics that make it look like these pages are not getting traffic when really they are.
E) Content cannibalization? Lots of separate pages for red widgets that are being filtered from the SERPs.
F) Inadequate linkjuice? This is not a huge site but not a small one. Does it have a nice amount of linkjuice coming in?
G) Does this site have pages that are really deeeeep down in the linkstructure? Many clicks down? Fix that either with a new linkstructure or some kickass powerful links that hit nodes deep in the site to force spiders down. I would solve with linkstructure.
H) This isn't the site that had all of the content behind tabs that I remember from a while ago? (My memory is really bad so it might not even be your site.) If you have pages like that I would get rid of those tabs immediately. I have a personal opinion that Google does not treat content hidden behind tabs as well as content that is out in the open.
I) Are there a lot of other sites - strong ones - publlishing very similar pages - like product description pages - competing for the same keywords. If that is the case you could be crowded out of the SERPs and receiving no traffic on these pages.
J) Does this site have a bad history? Does it have something that might be causing a penalty or filtering?
After doing all of that you might have something that is really worth fixing. If you can't identify the problem I would be slashing, hatcheting those pages from the site right away.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No index on subdomains
Hi, We have a subdomain that is appearing in the search results - I want to hide this as it looks really bad. If I were to add the no index tag to the sub domain would URL would this affect the whole domain or just that sub domain? The main domain is vitally important - it is just that sub domain I need to hide. Many thanks
Technical SEO | | Creditsafe0 -
Why is my blog disappearing from Google index?
My Google blogger blog is about 10 months old. In that time i have worked really hard with adding unique content, building relationships with other bloggers in the same niche, and done some inbound marketing. 2 weeks ago I updated the template to something cleaner, with a little more "wordpress" feel to it. This means i've messed about with the code a lot in these weeks, adding social buttons etc. The problem is that from some point late last week thurs/fri my pages started disappearing from Googles index. I have checked webmaster tools and have no manual actions. My link profile is pretty clean as its a new site, and i have manually checked every piece of content published for plagiarism etc. So what is going on? Did i break my blog? Or is something else amiss? Impressions are down 96% comparing Nov 1-5th to previous 5 days. site is here: http://bit.ly/174beVm Thanks for any help in advance.
Technical SEO | | Silkstream0 -
Carwling and indexing problems
hi, i have noticed since my site was upgraded that google is taking a long time to publish my articles. before the upgrade google would publish the article straight away, but now it takes an average of around 4 days. the article i am talking about at the moment is here http://www.in2town.co.uk/celebrities-in-the-news/stuart-hall-has-his-prison-sentence-for-sex-crimes-doubled-to-30-months now i have a blog here on blogger and the article was picked up within six mins http://showbizgossipandnews.blogspot.co.uk/2013/07/stuart-hall-has-his-prison-sentence-for.html so i am just wondering what the problem is and what i need to solve this my problem is, my site is mostly a news site so it is no good to me if google is publishing new stories every four days, any help would be great.
Technical SEO | | ClaireH-1848860 -
Expert Indexation challenge!
We have a major and strange indexation problem on our site for several languages for a while now. If I type in the search query "langsom computer" ("slow pc" in Danish) it used to display the page (www.spamfighter.com/SLOW-PCfighter/Lang_DA/) in the top 3. Now it displays this site instead as result #11 which is an entirely different product: http://www.spamfighter.com/VIRUSfighter/Lang_DA/ The same happens for some other languages. The French search: "Optimisez votre PC trop lent avec une meilleure performance" (Optimize your slow PC for better performance) displays: http://www.spamfighter.com/VIRUSfighter/Lang_FR/ which has nothing in common with the search and the page intended: http://www.spamfighter.com/SLOW-PCfighter/Lang_FR/ Anyone have ANY idea what this could be?
Technical SEO | | Crunchii0 -
Determining When to Break a Page Into Multiple Pages?
Suppose you have a page on your site that is a couple thousand words long. How would you determine when to split the page into two and are there any SEO advantages to doing this like being more focused on a specific topic. I noticed the Beginner's Guide to SEO is split into several pages, although it would concentrate the link juice if it was all on one page. Suppose you have a lot of comments. Is it better to move comments to a second page at a certain point? Sometimes the comments are not super focused on the topic of the page compared to the main text.
Technical SEO | | ProjectLabs1 -
URL query considered duplicate content?
I have a Magento site. In order to reduce duplicate content for products of the same style but with different colours I have combined them on to 1 product page. I would like to allow the pictures to be dynamic, i.e. allow a user to search for a colour and all the products that offer that colour appear in the results, but I dont want the default product image shown but the product image for that colour applying to the query. Therefore to do this I have to append a query string to the end of the URL to produce this result: www.website.com/category/product-name.html?=red My question is, will the query variations then be picked up as duplicate content: www.website.com/category/product-name.html www.website.com/category/product-name.html?=red www.website.com/category/product-name.html?=yellow Google suggest it has contingencies in its algorithm and I will not be penalised: http://googlewebmastercentral.blogspot.co.uk/2007/09/google-duplicate-content-caused-by-url.html But other sources suggest this is not accurate. Note the article was written in 2007.
Technical SEO | | BlazeSunglass0 -
Would this be considered "thin content?"
I share a lot of images via twitter and over the last year I've used several different tools to do this; mainly twitpic, and now instagram. Last year I wanted to try to find a way to host those images on my site so I could get the viewers of the picture back to my site instead a 3rd party (twitpic, etc.) I found a few plugins that worked "sort of" well, and so I used that for a while. (I have since stopped doing that in favor of using instagram.) But my question is do all of these image posts hurt my site you think? I had all of these images under a category called "twitter" but have since moved them to an uncategorized category until I figure out what I want to do with them. I wanted to see if anyone could chime in and give me some advice. Since the posts are just images with no content (other than the image) and the title isn't really "optimized" for anything do these posts do me more harm than good. Do I delete them all? Leave them as is? Or do something else? Also in hindsight I'm assuming this was a bad idea since the bounce rate for people clicking on a link just to see an image was probably very high, and may have caused the opposite result of what I was looking for. If I knew than what I know now I would have tracked the bounce rate of those links, how many people who viewed one of those images actually went to another page on the site, etc. But hindsight's 20/20. 🙂
Technical SEO | | NoahsDad0 -
How to tell if PDF content is being indexed?
I've searched extensively for this, but could not find a definitive answer. We recently updated our website and it contains links to about 30 PDF data sheets. I want to determine if the text from these PDFs is being archived by search engines. When I do this search http://bit.ly/rRYJPe (google - site:www.gamma-sci.com and filetype:pdf) I can see that the PDF urls are getting indexed, but does that mean that their content is getting indexed? I have read in other posts/places that if you can copy text from a PDF and paste it that means Google can index the content. When I try this with PDFs from our site I cannot copy text, but I was told that these PDFs were all created from Word docs, so they should be indexable, correct? Since WordPress has you upload PDFs like they are an image could this be causing the problem? Would it make sense to take the time and extract all of the PDF content to html? Thanks for any assistance, this has been driving me crazy.
Technical SEO | | zazo0