Index bloating issue
-
Hello,
In the last month, I noticed a huge spike in the number of pages indexed on my site, which I think is impacting my SEO quality score.
While I've only have about 90 pages on my site map, the number of pages indexed jumped to 446, with about 536 pages being blocked by robots. At first we thought this might be due to duplicate product pages showing up in different categories on my site, but we added something to our robot.txt file to not index those pages. But the number has not gone down. I've tried to consult with our hosting vendor, but no one seems to be concerned or have any idea why there was such a big jump in the last month.
Any insights or pointers would be so greatly appreciated, so that I can fix/improve my SEO as quickly as possible!
Thanks!
-
in order to determine if your website is hacked this is one of the best tools I know of both to find out and to remove the malware.
In order to determine rather not you have on-site SEO problems on a very technical and granular scale I would use
https://www.deepcrawl.com/ $80 a month you cannot go wrong
another amazing tool and it's free for the first 500 pages and if you want the added features which you do or more pages only about $150 a year is
-
Thank you. These are helpful suggestions.
-
A couple of things to note:
- As Robert mentioned, I would definitely make sure there is no longer an issue on your wordpress site relating to your previous hack.
- Robots.txt disallow does not stop pages from being indexed. It merely tells search engines to stop crawling that page from here out. The meta noindex tag is more applicable for noindexing pages that are already out there.
- I would check your search console crawl errors to see if there's a hefty spike in 404 errors as well, as it may be old spam pages you removed from the site.
- If these pages that are bloating your index are all still old spam filled pages from when you were hacked, you could start by using the search console's "remove url's" tool, which will remove all these url's from the index temporarily. For a more long term approach, instead of them giving off a 404 if they have been removed, making the server give off a "410" response would tell google they are gone forever, and thus they will be removed from the index as time goes on.
-
When I do the search for my main url - the results are clean. Just the pages to my site show up. And the index results for this site still bloated. However, for my wordpress site, which is a subdomain and on a different platform to my main site, there are some issues (it was hacked as Rob noted below). But we have since cleaned up the pages etc, reuploaded the site maps, etc. So I'm a little stumped on my main site (which wasn't hacked - that I'm aware of).
-
What do you see if you do a search for site:yoursite.com ?
-
Hello Julie,
This sounds like you might have a hacking issue on your website. You probably need someone to conduct a full code audit of your site to determine whether any files you have uploaded (plugins, for example) were contaminated. If a site is hacked, new pages can be added that are hidden from view and difficult to detect unless handled by a security specialist.
We recently brought on a new client who had this issue and discovered that his site had 1000's of pages dedicated to testosterone pills, etc. We had to go through GWT and the site logs to determine what new pages were created and it was a complete hack job.
In terms of fixing your SEO, the first step is to determine where/if the hack exists. Once that is decided, you have to clean up the site and restore the site's security.
I would be happy to help you with the next steps if you would like. I am always available!
Thanks and best of luck,
Rob
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Follow no-index
I have a question about the right way to not index pages: With a canonical or follow no-index. First we have a blog page: **Blogpage **
Technical SEO | | Happy-SEO
URL: /blog/
index follow Page 2 blog:
URL: /blog?=p2
index follow
rel="prev" /blog/
el="next" ?=p3 Nothing strange here i guess. But we also have other pages with chance on duplicate content: /SEO-category/
/SEO-category/view-more/ Because i don't want the "view-more" items to be indexed i want to set it on: follow no-index (follow to reach pages). But now the "view-more" also have pagination. What is the best way? Option 1:
/SEO-category/view-more/
Follow no-index /SEO-category/view-more?=p2
Follow no-index
rel="prev" /view-more/
el="next" ?=p3 Option 2: /SEO-category/view-more/
Canonical: /SEO-category/ /SEO-category/view-more?=p2
rel="prev" /view-more/
el="next" ?=p3 Option 3: Other suggests? Thanks!0 -
Issue with Cached pages
I have a client who has a three domains:
Technical SEO | | paulbaguley
budgetkits.co.uk
prosocceruk.co.uk
cheapfootballkits.co.uk Budget Kits is not active but Pro Soccer and Cheap Football Kits are. The issue is when you do site:budgetkits.co.uk on Google it brings back results. If you click on the link it goes to page saying website doesn't exist which is correct but if you click on cached it shows you a page from prosocceruk.co.uk or cheapfootballkits.co.uk. The cached pages are very recent by a couple of days ago to a week. The first result brings up www.budgetkits.co.uk/rainwear but the cached page is www.prosocceruk.co.uk/rainwear The third result brings up www.budgetkits.co.uk/kids-football-kits but the cached page is http://www.cheapfootballkits.co.uk The history of this issue is that budgetkits.co.uk was its own website 7 years ago and then it used to point at prosocceruk.co.uk after that but it no longer does for about two months. All files have been deleted from budgetkits.co.uk so it is just a domain. Any help with this would be very much appreciated as I have not seen this kind of issue before.0 -
GWT Images Indexing
Hi guys! How does normally take to get Google to index the images within the sitemap? I recently submitted a new, up to date sitemap and most of the pages have been indexed already, but no images have. Any reason for that? Cheers
Technical SEO | | PremioOscar0 -
Dev Site Was Indexed By Google
Two of our dev sites(subdomains) were indexed by Google. They have since been made private once we found the problem. Should we take another step to remove the subdomain through robots.txt or just let it ride out? From what I understand, to remove the subdomain from Google we would verify the subdomain on GWT, then give the subdomain it's own robots.txt and disallow everything. Any advice is welcome, I just wanted to discuss this before making a decision.
Technical SEO | | ntsupply0 -
Image Indexing Issue by Google
Hello All,My URL is: www.thesalebox.comI have Submitted my image Sitemap in google webmaster tool on 10th Oct 2013,Still google could not indexing any of my web images,Please refer my sitemap - www.thesalebox.com/AppliancesHomeEntertainment.xml and www.thesalebox.com/Hardware.xmland my webmaster status and image indexing status are below,
Technical SEO | | CommercePunditCan you please help me, why my images are not indexing in google yet? is there any issue? please give me suggestions?Thanks!
0 -
Skip indexing the search pages
Hi, I want all such search pages skipped from indexing www.somesite.com/search/node/ So i have this in robots.txt (Disallow: /search/) Now any posts that start with search are being blocked and in Google i see this message A description for this result is not available because of this site's robots.txt – learn more. How can i handle this and also how can i find all URL's that Google is blocking from showing Thanks
Technical SEO | | mtthompsons0 -
Correct linking to the /index of a site and subfolders: what's the best practice? link to: domain.com/ or domain.com/index.html ?
Dear all, starting with my .htaccess file: RewriteEngine On
Technical SEO | | inlinear
RewriteCond %{HTTP_HOST} ^www.inlinear.com$ [NC]
RewriteRule ^(.*)$ http://inlinear.com/$1 [R=301,L] RewriteCond %{THE_REQUEST} ^./index.html
RewriteRule ^(.)index.html$ http://inlinear.com/ [R=301,L] 1. I redirect all URL-requests with www. to the non www-version...
2. all requests with "index.html" will be redirected to "domain.com/" My questions are: A) When linking from a page to my frontpage (home) the best practice is?: "http://domain.com/" the best and NOT: "http://domain.com/index.php" B) When linking to the index of a subfolder "http://domain.com/products/index.php" I should link also to: "http://domain.com/products/" and not put also the index.php..., right? C) When I define the canonical ULR, should I also define it just: "http://domain.com/products/" or in this case I should link to the definite file: "http://domain.com/products**/index.php**" Is A) B) the best practice? and C) ? Thanks for all replies! 🙂
Holger0 -
Drupal issue
Hi seomozzers again, One of my clients uses DRUPAL(cms) and I have an issue when editing any pages. I access the edit section of the page, try to insert meta description tags, save and view the page source and NO meta description tags appear!! Why is that? Is there a specific setting that I need to implement? Under Meta Tags, apparently DRUPAL likes to put canonical tags by default(unless i can tweek one of the settings), and I would like to remove them. The weird thing is that even there are no canonical tags set, when viewing the page source I can still locate a canonical tag. Is there a setting that allow me to remove the canonical by default? Thank you mozzers:)
Technical SEO | | Ideas-Money-Art0