Investigating a huge spike in indexed pages
-
I've noticed an enormous spike in pages indexed through WMT in the last week. Now I know WMT can be a bit (OK, a lot) off base in its reporting but this was pretty hard to explain. See, we're in the middle of a huge campaign against dupe content and we've put a number of measures in place to fight it. For example:
-
Implemented a strong canonicalization effort
-
NOINDEX'd content we know to be duplicate programatically
-
Are currently fixing true duplicate content issues through rewriting titles, desc etc.
So I was pretty surprised to see the blow-up. Any ideas as to what else might cause such a counter intuitive trend? Has anyone else see Google do something that suddenly gloms onto a bunch of phantom pages?
-
-
I haven't contacted the forum yet but that's my next step.
Pages indexed: 91k
Blocked by robots.txt: 8.4million
I don't even know how you could create 8.4 million indexable pages from our content.
-
Have you contacted the Google Webmaster Help forums? As that seems to be a glitch in Google.
How many pages are scraped by Mozbot? If the amount that mozbot shows is different, then you should either sit and wait until Google removes those indexed pages or create a conversation on the forums so someone at google can give you a hint of what is going on.
-
Any help out there? Since the original question was posted, I've seen some improvement but even with aggressive canonicalization and noindexing, I'm still seeing a boatload of indexed pages. I am still seeing pages indexed that I've asked explicitly to be omitted by robots.txt (/search.aspx and */filter). I'm guessing it's just going to take a while to deindex what's there. Still, 91k pages indexed is quite a lot when you consider we only have about 3-4k pages and some articles.
Is anyone aware of any significant releases by Google?
-
Quite recent. We were actually seeing a nice downward trend in the huge number of pages indexed and then the number tripled. Crazy is an understatement. I would have thought the number of pages would fall given the number of pages that now use canonicals.
-
How long have you waited since you applied all the rules to avoid duplicate content, as if it was just recently, then Google should be "rebuilding" the index of your site and stats may be a little crazy while that is happening.
If it was over 2 month ago and you are seeing the increase now, then I'd suggest you revise the rules you created to see if your own Website isn't creating all those new pages.
Hope that helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page Rank Flow
I wonder if someone can help me understand clearly page rank flow. If we have a website with a Home page, Services, About and Contact as a very basic website and the page rank will flow to each of those pages from the Home page (i'm not including internal linking between pages or anchor text from the home page content - this is a question purely about home page flow via the main navigation). If the Services page had 3 drop down pages. Would the home page rank also flow to each of these or is it going to the Services page which then distributes it to the three drop down. So instead of Home page rank flowing to 3 pages 33% each - it is flowing to 6 pages 16.6% each. Or is it flowing to 3 pages - 33.3% then the Services pages get a third of 33.3% ->10.1% I know this is simplifying it all a great deal- but it is the basic concept I am trying to grasp on this simple example. Thanks
Technical SEO | | AL123al0 -
Sitemap.gz is being indexed and is showing up in SERP instead of actual pages.
Sitemap.gz is being indexed and is showing up in SERP instead of actual pages. I recently uploaded my sitemap file - https://psglearning.com/sitemapcustom/sitemap-index.xml - via Search Console. The only record within the XML file is sitemaps.gz. When I searched for some content on my site - here is the search https://goo.gl/mqxBeq - I was shown the following search result, indicating that our GZ file is getting indexed instead of our pages. http://www.psglearning.com/catalog 1 http://www.psglearning.com ...www.psglearning.com/sitemapcustom/sitemap.gz... 1 https://www.psglearning.com/catalog/productdetails/9781284059656/ 1 https://www.psglearning.com/catalog/productdetails/9781284060454/ 1 ... My sitemap is listed at https://psglearning.com/sitemapcustom/sitemap-index.xml inside the sitemap the only reference is to sitemap.gz. Should we remove the link the the sitemap.gz within the xml file and just serve the actual page paths? <sitemapindex< span=""> xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"></sitemapindex<><sitemap></sitemap>https://www.psglearning.com/sitemapcustom/sitemap.gz<lastmod></lastmod>2017-06-12T09:41-04:00
Technical SEO | | pdowling0 -
Why Are Some Pages On A New Domain Not Being Indexed?
Background: A company I am working with recently consolidated content from several existing domains into one new domain. Each of the old domains focused on a vertical and each had a number of product pages and a number of blog pages; these are now in directories on the new domain. For example, what was www.verticaldomainone.com/products/productname is now www.newdomain.com/verticalone/products/product name and the blog posts have moved from www.verticaldomaintwo.com/blog/blogpost to www.newdomain.com/verticaltwo/blog/blogpost. Many of those pages used to rank in the SERPs but they now do not. Investigation so far: Looking at Search Console's crawl stats most of the product pages and blog posts do not appear to be being indexed. This is confirmed by using the site: search modifier, which only returns a couple of products and a couple of blog posts in each vertical. Those pages are not the same as the pages with backlinks pointing directly at them. I've investigated the obvious points without success so far: There are a couple of issues with 301s that I am working with them to rectify but I have checked all pages on the old site and most redirects are in place and working There is currently no HTML or XML sitemap for the new site (this will be put in place soon) but I don't think this is an issue since a few products are being indexed and appearing in SERPs Search Console is returning no crawl errors, manual penalties, or anything else adverse Every product page is linked to from the /course page for the relevant vertical through a followed link. None of the pages have a noindex tag on them and the robots.txt allows all crawlers to access all pages One thing to note is that the site is build using react.js, so all content is within app.js. However this does not appear to affect pages higher up the navigation trees like the /vertical/products pages or the home page. So the question is: "Why might product and blog pages not be indexed on the new domain when they were previously and what can I do about it?"
Technical SEO | | BenjaminMorel0 -
Duplicate Home Page
Hi everyone! So, I;m using the crawl diagnostics in Moz and it's telling that I've got duplicate content for these two pages: http://www.bridgelanguages.com/
Technical SEO | | Bridge_Education_Group
http://www.bridgelanguages.com/index.php?p=3233&source=3 Would a redirect from the 2nd page to the 1st one be a solution? I'm not even sure where that 2nd link is on the site? Any suggestions or has anyone experienced the same? Thanks! Kelly0 -
Page Content
Our site is a home to home moving listing portal. Consumers who wants to move his home fills a form so that moving companies can cote prices. We were generating listing page URL’s by using the title submitted by customer. Unfortunately we have understood by now that many customers have entered exactly same title for their listings which has caused us having hundreds of similar page title. We have corrected all the pages which had similar meta tag and duplicate page title tags. We have also inserted controls to our software to prevent generating duplicate page title tags or meta tags. But also the page content quality not very good because page content added by customer.(example: http://www.enakliyat.com.tr/detaylar/evden-eve--6001) What should I do. Please help me.
Technical SEO | | iskq0 -
Index page 404 error
Crawl Results show there is 404 error page which is index.htmk **it is under my root, ** http://mydomain.com/index.htmk I have checked my index page on the server and my index page is index.HTML instead of index.HTMK. Please help me to fix it
Technical SEO | | semer0 -
Google indexing directory folder listing page
Google somehow managed to find several of our images index folders and decided to include them into their index. Example: websitesite.com/category/images/ is what you'll see when doing a site:website.com search. So, I have two-part question: 1) Does this hurt our site's ability to rank in any way?
Technical SEO | | invision
Because all Google sees is just a directory listing page with a bunch of links to images in the folder. 2) If there could be any negative effect, what is the best way to get these folders out of Google's index?
I could block via robots.txt, but I'm afraid it will also block all the images in that folder from being indexed in Google image search. I could also turn off directory listing in cpanel / htaccess, but then that gives is a 403 forbidden. Will this hurt the site in anyway and would it prevent Google from indexing the images in the directory? Thanks,
Tony0 -
Getting a citation page indexed
Howdy mozzers, I have a citation on a .govt domain with 2 links pointing to my site. The page is not indexed by Google, bing or yahoo. URL; http://www.familyservices.govt.nz/directory/viewprovider.htm?id=17077 I have tried getting the paged indexed by building bookmark links to it. I have tweeted the url and gotten a few re-tweets for it. But no luck. The page has got no nofollow meta tag. Other listings have been indexed by google. Could someone please advise on means to help me get the page indexed? A strategy that I have not yet tried is submitting a sitemap that includes the external url as I am not sure if it is possible to include url's not part of my domain. Any advice, help would be greatly appreciated. viva le SEOmoz Thanks
Technical SEO | | ihms1