Index bloating issue
-
Hello,
In the last month, I noticed a huge spike in the number of pages indexed on my site, which I think is impacting my SEO quality score.
While I've only have about 90 pages on my site map, the number of pages indexed jumped to 446, with about 536 pages being blocked by robots. At first we thought this might be due to duplicate product pages showing up in different categories on my site, but we added something to our robot.txt file to not index those pages. But the number has not gone down. I've tried to consult with our hosting vendor, but no one seems to be concerned or have any idea why there was such a big jump in the last month.
Any insights or pointers would be so greatly appreciated, so that I can fix/improve my SEO as quickly as possible!
Thanks!
-
in order to determine if your website is hacked this is one of the best tools I know of both to find out and to remove the malware.
In order to determine rather not you have on-site SEO problems on a very technical and granular scale I would use
https://www.deepcrawl.com/ $80 a month you cannot go wrong
another amazing tool and it's free for the first 500 pages and if you want the added features which you do or more pages only about $150 a year is
-
Thank you. These are helpful suggestions.
-
A couple of things to note:
- As Robert mentioned, I would definitely make sure there is no longer an issue on your wordpress site relating to your previous hack.
- Robots.txt disallow does not stop pages from being indexed. It merely tells search engines to stop crawling that page from here out. The meta noindex tag is more applicable for noindexing pages that are already out there.
- I would check your search console crawl errors to see if there's a hefty spike in 404 errors as well, as it may be old spam pages you removed from the site.
- If these pages that are bloating your index are all still old spam filled pages from when you were hacked, you could start by using the search console's "remove url's" tool, which will remove all these url's from the index temporarily. For a more long term approach, instead of them giving off a 404 if they have been removed, making the server give off a "410" response would tell google they are gone forever, and thus they will be removed from the index as time goes on.
-
When I do the search for my main url - the results are clean. Just the pages to my site show up. And the index results for this site still bloated. However, for my wordpress site, which is a subdomain and on a different platform to my main site, there are some issues (it was hacked as Rob noted below). But we have since cleaned up the pages etc, reuploaded the site maps, etc. So I'm a little stumped on my main site (which wasn't hacked - that I'm aware of).
-
What do you see if you do a search for site:yoursite.com ?
-
Hello Julie,
This sounds like you might have a hacking issue on your website. You probably need someone to conduct a full code audit of your site to determine whether any files you have uploaded (plugins, for example) were contaminated. If a site is hacked, new pages can be added that are hidden from view and difficult to detect unless handled by a security specialist.
We recently brought on a new client who had this issue and discovered that his site had 1000's of pages dedicated to testosterone pills, etc. We had to go through GWT and the site logs to determine what new pages were created and it was a complete hack job.
In terms of fixing your SEO, the first step is to determine where/if the hack exists. Once that is decided, you have to clean up the site and restore the site's security.
I would be happy to help you with the next steps if you would like. I am always available!
Thanks and best of luck,
Rob
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page Indexing without content
Hello. I have a problem of page indexing without content. I have website in 3 different languages and 2 of the pages are indexing just fine, but one language page (the most important one) is indexing without content. When searching using site: page comes up, but when searching unique keywords for which I should rank 100% nothing comes up. This page was indexing just fine and the problem arose couple of days ago after google update finished. Looking further, the problem is language related and every page in the given language that is newly indexed has this problem, while pages that were last crawled around one week ago are just fine. Has anyone ran into this type of problem?
Technical SEO | | AtuliSulava1 -
Are there any SEO issues we should be aware of on Gutenberg?
We are launching a new website and switching to WP 5.0 Gutenberg. Are there any issues we should be aware of related to SEO with the new platform?
Technical SEO | | AegisLiving0 -
Google not Indexing images on CDN.
My URL is: https://bit.ly/2hWAApQ We have set up a CDN on our own domain: https://bit.ly/2KspW3C We have a main xml sitemap: https://bit.ly/2rd2jEb and https://bit.ly/2JMu7GB is one the sub sitemaps with images listed within. The image sitemap uses the CDN URLs. We verified the CDN subdomain in GWT. The robots.txt does not restrict any of the photos: https://bit.ly/2FAWJjk. Yet, GWT still reports none of our images on the CDN are indexed. I ve followed all the steps and still none of the images are being indexed. My problem seems similar to this ticket https://bit.ly/2FzUnBl but however different because we don't have a separate image sitemap but instead have listed image urls within the sitemaps itself. Can anyone help please? I will promptly respond to any queries. Thanks
Technical SEO | | TNZ
Deepinder0 -
Duplicate Content Issues with Pagination
Hi Moz Community, We're an eCommerce site so we have a lot of pagination issues but we were able to fix them using the rel=next and rel=prev tags. However, our pages have an option to view 60 items or 180 items at a time. This is now causing duplicate content problems when for example page 2 of the 180 item view is the same as page 4 of the 60 item view. (URL examples below) Wondering if we should just add a canonical tag going to the the main view all page to every page in the paginated series to get ride of this issue. https://www.example.com/gifts/for-the-couple?view=all&n=180&p=2 https://www.example.com/gifts/for-the-couple?view=all&n=60&p=4 Thoughts, ideas or suggestions are welcome. Thanks
Technical SEO | | znotes0 -
Sitelinks Issue - Different Languages
Hey folks, We run different ccTLD's for revolveclothing.com (revolveclothing.es, revolveclothing.com.br, etc. etc.) and they all have their own WMT/Google Console with their own href lang tags etc. The problem is this. https://www.google.fr/#q=revolve+clothing When you look at the sitelinks, you'll see that one of them (sales page) happens to be in Portuguese on the French site. Can anyone investigate and see why?
Technical SEO | | ggpaul5620 -
Should I index my search result pages?
I have a job site and I am planning to introduce a search feature. The question I have is, is it a good idea to index search results even if the query parameters are not there? Example: A user searches for "marketing jobs in New York that pay more than 50000$". A random page will be generated like example.com/job-result/marketing-jobs-in-new-york-that-pay-more-than-50000/ For any search that gets executed, the same procedure would be followed. This would result in a large number of search result pages automatically set up for long tail keywords. Do you think this is a good idea? Or is it a bad idea based on all the recent Google algorithm updates?
Technical SEO | | jombay0 -
Weird Indexing Question
Google has indexed mysite.com/ and mysitem.com/\/ (no idea why). If you click on the /%5C? URL it takes you to mysite.com//. I have a rel=canonical tag on it that goes to mysite.com/ but I was wondering if there was another way to correct the issue.
Technical SEO | | BryanPhelps-BigLeapWeb0 -
Are Google now indexing iFrames?
A client is pulling content through an iFrame, and when searching for a snippet of that exact content the page that is pulling the data is being indexed and not the iFrame page. Seen this before?
Technical SEO | | White.net0