Should all pages on a site be included in either your sitemap or robots.txt?
-
I don't have any specific scenario here but just curious as I come across sites fairly often that have, for example, 20,000 pages but only 1,000 in their sitemap. If they only think 1,000 of their URL's are ones that they want included in their sitemap and indexed, should the others be excluded using robots.txt or a page level exclusion? Is there a point to having pages that are included in neither and leaving it up to Google to decide?
-
Thanks guys!
-
You bet - Cheers!
-
Clever PHD,
You are correct. I have found that these little housekeeping issues like eliminating duplicate content really do make a big difference.
Ron
-
I thinks Ron's point was that if you have a bunch of duplicates, the dups are not "real" pages, if you are only counting "real" pages. Therefore, if Google indexes your "real" pages and the dup versions of them, you can have more pages indexed. That is the issue then that you have duplicate versions of the same page in Google's index and so which will rank for a given key term? You could be competing against yourself. That is why it is so important you deal with crawl issues.
-
Thank you. Just curious, how would the number of pages indexed be higher than the number of actual pages?
-
I think you are looking at the pages indexed which is generally a higher number than those on your web site. There is a point to marking things up so that there is a no follow on any pages that you do not want indexed as well as properly marking up the web pages that you do specifically want indexed. It is really important that you eliminate duplicate pages. A common source of these duplicates is improper tags on the blog. Make sure that your tags are set up in a logical hierarchy like your site map. This will assist the search engines when they re index your page.
Hope this helps,
Ron
-
You want to have as many pages in the index as possible, as long as they are high quality pages with original content - if you publish quality original articles on a regular basis, you want to have all those pages indexed. Yes, from a practical perspective you may only be able to focus on tweaking the SEO on a portion of them, but if you have good SEO processes in place as you produce those pages, they will rank long term for a broad range of terms and bring traffic..
If you have 20,000 pages as you have an online catalog and you have 345 different ways to sort the same set of page results, or if you have keyword search URLs, or printer friendly version pages or your shopping cart pages, you do not want those indexed. These pages are typically, low quality/thin content pages and/or are duplicates and those do you no favor. You would want to use the noindex meta tag or canonical where appropriate. The reality is that out of the 20,000 pages, there are probably only a subset that are the "originals" and so you dont want to waste Googles time in crawling those pages.
A good concept here to look up is Crawl Budget or Crawl Optimization
http://searchengineland.com/how-i-think-crawl-budget-works-sort-of-59768
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Cache and index page of Mobile site
Hi, I want to check cache and index page of mobile site. I am checking it on mobile phone but it is showing the cache version of desktop. So anybody can tell me the way(tool, online tool etc.) to check mobile site index and cache page.
Intermediate & Advanced SEO | | vivekrathore0 -
Tidied up site by getting rid of bad pages and now rankings tanked. - Please help
Hello Mozzers. We historically had Location specific landing pages on our eCommerce site. examples - site.co.ukj/cleaning-enquipment-london site.co.ukj/cleaning-enquipment-Manchester These all had unique content(600 words approx) and ranked in top 10 for many cities. I understand these would have been classed as doorway pages so we got rid of them (301'd back to the category pages) and now our rankings for these terms have tanked. We also have specific branch pages but we have kept these like many other companies with multiple branches do. It feels like by doing a good thing and tidying up everything , we are actually making our site worse. Everything else seems to be in place. Loads of new regular content , clean profile , mobile friendly, lots of citations etc etc. Any idea what could be going on here. Here's a link in our site - http://goo.gl/0yjSd8 thanks Pete
Intermediate & Advanced SEO | | PeteC120 -
Parallax site with snippets of internal pages on the homepage
Hello, I am working on a parallax site that also has an internal landing page structure. The homepage includes snippets of the existing copy from some of the other internal pages. My question is what can I do to the homepage to prevent duplicate content in this situation? We aren't utilizing the entire landing page on the homepage just a few lines. Would it be possible to place a 'no-index, follow' tag on these sections? Thanks in Advance
Intermediate & Advanced SEO | | Robertnweil10 -
How to handle a blog subdomain on the main sitemap and robots file?
Hi, I have some confusion about how our blog subdomain is handled in our sitemap. We have our main website, example.com, and our blog, blog.example.com. Should we list the blog subdomain URL in our main sitemap? In other words, is listing a subdomain allowed in the root sitemap? What does the final structure look like in terms of the sitemap and robots file? Specifically: **example.com/sitemap.xml ** would I include a link to our blog subdomain (blog.example.com)? example.com/robots.xml would I include a link to BOTH our main sitemap and blog sitemap? blog.example.com/sitemap.xml would I include a link to our main website URL (even though it's not a subdomain)? blog.example.com/robots.xml does a subdomain need its own robots file? I'm a technical SEO and understand the mechanics of much of on-page SEO.... but for some reason I never found an answer to this specific question and I am wondering how the pros do it. I appreciate your help with this.
Intermediate & Advanced SEO | | seo.owl0 -
Use Canonical or Robots.txt for Map View URL without Backlink Potential
I have a Page X with lots of unique content. This page has a "Map view" option, which displays some of the info from Page X, but a lot is ommitted. Questions: Should I add canonical even though Map View URL does not display a lot of info from Page X or adding to robots.txt or noindex, follow? I don't see any back links coming to Map View URL Should Map View page have unique H1, title tag, meta des?
Intermediate & Advanced SEO | | khi50 -
Multi-Location SEO: Sites vs Pages
I just started with a new company that requires multi-location SEO for its niche product/service. Currently, we have a main corporate website, as well as, 40+ individual dealer websites (we host all). Keep in mind each of these dealers consist of only 1-2 people, so corporate I will be managing the site or sites and content strategy. Many of the individual dealer sites actually rank very well (#1-#3) in their areas for our targeted keywords, but they all use the same duplicate content. Also, there are many dealer sites that have dropped off the radar in last year, which is probably because of the duplicate and static content. So I'm at a crossroads... Attempt to redo all of these location sites with unique and local content for each or Create optimized unique pages for each of them on our main site and redirect their current local domains to their page on our site Any advise regarding which direction to go in and why. Why is very important. It will be very difficult to convince a dealer that is #1 with his local site that we are redirecting to our main site, so I need some good ammo and reasoning. Also, any tips toward achieving local seo success will be greatly appreciated, too! Thank you!
Intermediate & Advanced SEO | | the-coopersmith0 -
Site revamp for neglected site - modifying site structure, URLs and content - is there an optimal approach?
A site I'm involved with, www.organicguide.com, was at one stage (long ago) performing reasonably well in the search engines. It was ranking highly for several keywords. The site has been neglected for some considerable period of time. A new group of people are interested in revamping the site, updating content, removing some of the existing content, and generally refreshing the site entirely. In order to go forward with the site, significant changes need to be made. This will likely involve moving the entire site across to wordpress. The directory software (edirectory.com) currently being used has not been designed with SEO in mind and as a result numerous similar pages of directory listings (all with similar titles and descriptions) are in google's results, albeit with very weak PA. After reading many of the articles/blog posts here I realize that a significant revamp and some serious SEO work is needed. So, I've joined this community to learn from those more experienced. Apart from doing 301 redirects for pages that we need to retain, is there any optimal way of removing/repairing the current URL structure as the site gets updated? Also, is it better to make changes all at once or is an iterative approach preferred? Many thanks in advance for any responses/advice offered. Cheers MacRobbo
Intermediate & Advanced SEO | | macrobbo0 -
How to increase the page rank for keyword for entire site
sorry for my bad english is there any way to increase the ranking for a keyword for the entire site .i know that seo is done per page basis .my site contains 1000ds of posts and i cant get back links for each and every post .so i picked 4 keywords which are mostly used while searching my products , is there any method i can increase my ranking for those keywords like increasing domain authority EXAMPLE :like if i want to increase my ranking for "buy laptop" .if any user searches In google with buy laptop i want my site or any of related pages that match the user search query must show up in front
Intermediate & Advanced SEO | | prakash.moturu0