Indexed pages and current pages - Big difference?
-
Our website shows ~22k pages in the sitemap but ~56k are showing indexed on Google through the "site:" command. Firstly, how much attention should we paying to the discrepancy? If we should be worried what's the best way to find the cause of the difference?
The domain canonical is set so can't really figure out if we've got a problem or not?
-
Hi Nathan,
The delta between the number of pages returned by the site: operator and the number of pages in your sitemap could be down to a number of issues:
- Your XML sitemap may represent only a percentage of the total number of valid content URLs that your site is capable of generating.
a) Often sites will only generate XML sitemaps for URLs that someone has decided are "important", when the total number of URLs is much larger.
- Your XML sitemap contains ALL the valid content URLs that your site is capable of generating, but search engines are somehow finding more URLs.
a) Look in Google Webmaster Tools under Optimization >> HTML improvements >> Duplicate title tags
i) Do the pages with duplicate titles have duplicate page content? If so, your publishing platform is allowing multiple URLs to render the same content, which is a bug that needs to be fixed
b) Run a crawler like Xenu Link Sleuth or Screaming Frog against your site, and see how many URLs they discover. Export the results to Excel and look for weird URLs
i) Usually culprits for duplicate content include incorrect canonicalization (www vs non-www, URLs ending in /index.html vs just /, etc)
ii) Look for URLs ending with strange query strings (affiliate tracking, session IDs, etc)
c) Use the site: operator in other engines (Bing, blekko, etc) and compare the numbers they return. Especially if this number is larger than the number Google is returning, starting looking for weird URL patterns
Also, I'm not sure what you mean by "the domain canonical has been set correctly". If you're referring to use of the canonical link element for every URL, there are plenty of ways that can go wrong. E.g., if your CMS requires that each published URL have rel="canonical", but allows URLs to be published with and without the trailing /index.html, you can end up with a canonical link element on the non-canonical version of the URL, further confusing engines. Something to look into.
-
You might have a duplicate content issue. You will want to check if you have the proper 301 redirect and a canonical command in the head of your code. If you don't have this set properly then the search engines will see the www and non-www versions of your site as duplicate. Also remember that the search engines also by default place this at the end of the url /
Here are two links that can help if this is the issue.
http://www.webconfs.com/how-to-redirect-a-webpage.php/
http://www.mattcutts.com/blog/rel-canonical-html-head/
Hope this helps. Good Luck
-
Yes this is a potentially significant problem. The easiest way to troubleshoot is to do the 'site:' command again, and go to the last page of results. You should be seeing pages that aren't in your sitemap. Very likely duplicated content.
If you are having a rough time troubleshooting, post a link and I'll be glad to take a peek.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Alternatives 301? Issues redirection of index.html page with Adobe Business Catalyst
Hi Moz community, As for now we have two different versions of a client's homepage that’s dividing our traffic. One of the urls is the index.html version of the other url. We are using Adobe Business Catalyst for one of our clients and they told us they can’t 301 redirect. Adobe Business Catalyst does 301 redirects, but not to itself like an .htaccess rewrite. Doing a 301 redirect using BC from index.html to / creates an infinite loop and break the page. Are there alternatives to a 301 or any suggestions how to solve this? Thanks for all your answers and thoughts in advance,
Technical SEO | | Anna_Hoesl
Anna0 -
On-Page Problem
Hello Mozzers, A friend has a business website and the on-page stuff is done really bad. He wants to rank for: conference room furnishing, video conference, digital signage. (Don't worry about the keywords, it's just made up for an example.) For these three services he has a page: hiswebsite.com/av AV stands for audio and video and is the h1. If you click on one of the service, the url doesn't change. Like if you click on video conference, just the text changes, the url stays /av. All his targeted pages got an F Grade, I am not surprised, the services titles are in . Wouldn't it be a lot better to make an own page for every service with a targeted keyword, like hiswebsite.com/video-conference All this stuff is on /av, how will a 301 resirect work to all the service pages, does this make sense? Any help is appreciated! Thanks in advance!
Technical SEO | | grobro1 -
How do I get my pages to go from "Submitted" to "Indexed" in Google Webmaster Tools?
Background: I recently launched a new site and it's performing much better than the old site in terms of bounce rate, page view, pages per session, session duration, and conversions. As suspected, sessions, users, and % new sessions are all down. Which I'm okay with because the the old site had a lot of low quality traffic going to it. The traffic we have now is much more engaged and targeted. Lastly, the site was built using Squarespace and was launched the middle of August. **Question: **When reviewing Google Webmaster Tools' Sitemaps section, I noticed it says 57 web pages Submitted, but only 5 Indexed! The sitemap that's submitted seems to be all there. I'm not sure if this is a Squarespace thing or what. Anyone have any ideas? Thanks!!
Technical SEO | | Nate_D0 -
Has Google stopped rendering author snippets on SERP pages if the author's G+ page is not actively updated?
Working with a site that has multiple authors and author microformat enabled. The image is rendering for some authors on SERP page and not for others. Difference seems to be having an updated G+ page and not having a constantly updating G+ page. any thoughts?
Technical SEO | | irvingw0 -
Different links to to the same page
Hi, Based on the user's actions we post activity into users Facebook timeline. And each activity has link back to our particular page on our website. For example if original page was: www.Domain.com from Facebook timeline it would be like this: www.Domain.com?Ffb_action_ids=101508953168 Do you think this will have a negative effect on our page rankings as we will eded up having a lot of different URL's to the same page? www.Domain.com?Ffb_action_ids=101508953168 www.Domain.com?Ffb_action_ids=456788765609 etc.. Thank you, Karen Bdoyan
Technical SEO | | showme0 -
Can I redirect when Google is showing these as 2 different pages?
Hi Guys, Google webmaster is showing 1000 duplicate title tags because its picking up our pages like this. How can I correct this? Please explain in detail please. Thank You Tim /store/ICICLES_NO_7_CLEAR_WITH_PINK_NUBBY//store/ICICLES_NO_7_CLEAR_WITH_PINK_NUBBY
Technical SEO | | fasctimseo0 -
De-indexing thin content & Panda--any advantage to immediate de-indexing?
We added the nonidex, follow tag to our site about a week ago on several hundred URLs, and they are still in Google's index. I know de-indexing takes time, but I am wondering if having those URLs in the index will continue to "pandalize" the site. Would it be better to use the URL removal request? Or, should we just wait for the noindex tags to remove the URLs from the index?
Technical SEO | | nicole.healthline0 -
2000 pages indexed in Yahoo, 0 in Google. NO PR, What is wrong?
Hello Everyone, I have a friend with a blog site that has over 2000 pages indexed in Yahoo but none in Google and no page rank. The web site is http://www.livingorganicnews.com/ I know it is not the best site but I am guessing something is wrong and I don't see it. Can you spot it? Does he have some settings wrong? What should he do? Thank you.
Technical SEO | | QuietProgress0