Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Investigating a huge spike in indexed pages
-
I've noticed an enormous spike in pages indexed through WMT in the last week. Now I know WMT can be a bit (OK, a lot) off base in its reporting but this was pretty hard to explain. See, we're in the middle of a huge campaign against dupe content and we've put a number of measures in place to fight it. For example:
-
Implemented a strong canonicalization effort
-
NOINDEX'd content we know to be duplicate programatically
-
Are currently fixing true duplicate content issues through rewriting titles, desc etc.
So I was pretty surprised to see the blow-up. Any ideas as to what else might cause such a counter intuitive trend? Has anyone else see Google do something that suddenly gloms onto a bunch of phantom pages?
-
-
I haven't contacted the forum yet but that's my next step.
Pages indexed: 91k
Blocked by robots.txt: 8.4million
I don't even know how you could create 8.4 million indexable pages from our content.
-
Have you contacted the Google Webmaster Help forums? As that seems to be a glitch in Google.
How many pages are scraped by Mozbot? If the amount that mozbot shows is different, then you should either sit and wait until Google removes those indexed pages or create a conversation on the forums so someone at google can give you a hint of what is going on.
-
Any help out there? Since the original question was posted, I've seen some improvement but even with aggressive canonicalization and noindexing, I'm still seeing a boatload of indexed pages. I am still seeing pages indexed that I've asked explicitly to be omitted by robots.txt (/search.aspx and */filter). I'm guessing it's just going to take a while to deindex what's there. Still, 91k pages indexed is quite a lot when you consider we only have about 3-4k pages and some articles.
Is anyone aware of any significant releases by Google?
-
Quite recent. We were actually seeing a nice downward trend in the huge number of pages indexed and then the number tripled. Crazy is an understatement. I would have thought the number of pages would fall given the number of pages that now use canonicals.
-
How long have you waited since you applied all the rules to avoid duplicate content, as if it was just recently, then Google should be "rebuilding" the index of your site and stats may be a little crazy while that is happening.
If it was over 2 month ago and you are seeing the increase now, then I'd suggest you revise the rules you created to see if your own Website isn't creating all those new pages.
Hope that helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to check if an individual page is indexed by Google?
So my understanding is that you can use site: [page url without http] to check if a page is indexed by Google, is this 100% reliable though? Just recently Ive worked on a few pages that have not shown up when Ive checked them using site: but they do show up when using info: and also show their cached versions, also the rest of the site and pages above it (the url I was checking was quite deep) are indexed just fine. What does this mean? thank you p.s I do not have WMT or GA access for these sites
Technical SEO | | linklander0 -
No Index PDFs
Our products have about 4 PDFs a piece, which really inflates our indexed pages. I was wondering if I could add REL=No Index to the PDF's URL? All of the files are on a file server, so they are embedded with links on our product pages. I know I could add a No Follow attribute, but I was wondering if any one knew if the No Index would work the same or if that is even possible. Thanks!
Technical SEO | | MonicaOConnor0 -
How to stop google from indexing specific sections of a page?
I'm currently trying to find a way to stop googlebot from indexing specific areas of a page, long ago Yahoo search created this tag class=”robots-nocontent” and I'm trying to see if there is a similar manner for google or if they have adopted the same tag? Any help would be much appreciated.
Technical SEO | | Iamfaramon0 -
No index on subdomains
Hi, We have a subdomain that is appearing in the search results - I want to hide this as it looks really bad. If I were to add the no index tag to the sub domain would URL would this affect the whole domain or just that sub domain? The main domain is vitally important - it is just that sub domain I need to hide. Many thanks
Technical SEO | | Creditsafe0 -
When creating parent and child pages should key words be repeated in url and page title?
We are in the direct mail advertising business: PrintLabelAndMail.com Example: Parent:
Technical SEO | | JimDirectMailCoach
Postcard Direct Mail Children:
Postcard Mailings
Postcard Design
Postcard Samples
Postcard Pricing
Postcard Advantages should "postcard" be repeated in the URL and Page Title? and in this example should each of the 5 children link back directly to the parent or would it be better to "daisy chain" them using each as parent for the next?0 -
Best way to handle pages with iframes that I don't want indexed? Noindex in the header?
I am doing a bit of SEO work for a friend, and the situation is the following: The site is a place to discuss articles on the web. When clicking on a link that has been posted, it sends the user to a URL on the main site that is URL.com/article/view. This page has a large iframe that contains the article itself, and a small bar at the top containing the article with various links to get back to the original site. I'd like to make sure that the comment pages (URL.com/article) are indexed instead of all of the URL.com/article/view pages, which won't really do much for SEO. However, all of these pages are indexed. What would be the best approach to make sure the iframe pages aren't indexed? My intuition is to just have a "noindex" in the header of those pages, and just make sure that the conversation pages themselves are properly linked throughout the site, so that they get indexed properly. Does this seem right? Thanks for the help...
Technical SEO | | jim_shook0 -
WordPress - How to stop both http:// and https:// pages being indexed?
Just published a static page 2 days ago on WordPress site but noticed that Google has indexed both http:// and https:// url's. Usually I only get http:// indexed though. Could anyone please explain why this may have happened and how I can fix? Thanks!
Technical SEO | | Clicksjim1 -
How to block "print" pages from indexing
I have a fairly large FAQ section and every article has a "print" button. Unfortunately, this is creating a page for every article which is muddying up the index - especially on my own site using Google Custom Search. Can you recommend a way to block this from happening? Example Article: http://www.knottyboy.com/lore/idx.php/11/183/Maintenance-of-Mature-Locks-6-months-/article/How-do-I-get-sand-out-of-my-dreads.html Example "Print" page: http://www.knottyboy.com/lore/article.php?id=052&action=print
Technical SEO | | dreadmichael0