Add selective URLs to an XML Sitemap
-
Hi!
Our website has a very large no of pages. I am looking to create an XML Sitemap that contains only the most important pages (category pages etc). However, on crawling the website in a tool like Xenu (the others have a 500 page limit), I am unable to control which pages get added to the XML Sitemap, and which ones get excluded.
Essentially, I only want pages that are upto 4 clicks away from my homepage to show up in the XML Sitemap.
How should I create an XML sitemap, and at the same time control which pages of my site I add to it (category pages), and which ones I remove (product pages etc).
Thanks in advance!
Apurv
-
Thanks a lot for sharing Travis. This is really helpful!
Appreciate your help here.
-
Hey Intermediate,
Here's my setup - image - http://screencast.com/t/qThC401hQVUp Be careful of the line breaks if you want your sitemap to be pretty (I'm not sure if it also works if everything is on a single line).
Column A:
Column B:
URLColumn
<lastmod>2013-08-27</lastmod>
Column
<changefreq>always</changefreq>Column E:
<priority>1</priority>Column F:
=CONCATENATE(A2,B2,C2,D2,E2)You will need to add this as first 2 lines in your sitemap:
and add to the end, but you should be good to go!
I Hope that helps! -
Thanks Schwaab!
-
Hi Travis
That sounds like a smart way to go about this. Could you please guide me regarding how to add parameters like lastmod, priority, changefreq etc in the XML sitemap, using the URLs that I have in the Excel sheet.
Thanks!
-
If you have a list of all the URLs on your site, it is easy to create a sitemap using excel. I have a template that I use and I can crank out a 50k URL sitemap in 5 minutes.
-
I would recommend purchasing Screaming Frog. You can crawl the site and sort the URLs by level. Remove the URLs that are too deep from the crawl and export to XML sitemap. Screaming Frog is definitely worth the price to unlock all of its features and have an unlimited crawl limit.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Clean URL vs. Parameter URL and Using Canonical URL...That's a Mouthfull!
Hi Everyone, I a currently migrating a Magento site over to Shopify Plus and have a question about best practices for using the canonical URL. There is a competitor that I believe is not doing it the correct way, so I want to make sure my way is the better choice. With 'Vendor Pages' in Shopify, they show up looking like: https://www.campusprotein.com/collections/vendors?q=Cellucor. Not as clean. Problem is that Shopify also creates https://www.campusprotein.com/collections/cellucor. Same products, same page, just a different more clean URL. I am seeing both indexed in Google. What I want to do is basically create a canonical URL from the URL with the parameter that points to the clean URL. The two pages are very similar. The only difference is that the clean URL page has some additional content at the top of the page. I would say the two pages are 90% the same. Do you see any issue with that?
Technical SEO | | vetofunk0 -
Clarification on indexation of XML sitemaps within Webmaster Tools
Hi Mozzers, I have a large service based website, which seems to be losing pages within Google's index. Whilst working on the site, I noticed that there are a number of xml sitemaps for each of the services. So I submitted them to webmaster tools last Friday (14th) and when I left they were "pending". On returning to the office today, they all appear to have been successfully processed on either the 15th or 17th and I can see the following data: 13/08 - Submitted=0 Indexed=0
Technical SEO | | Silkstream
14/08 - Submitted=606,733 Indexed=122,243
15/08 - Submitted=606,733 Indexed=494,651
16/08 - Submitted=606,733 Indexed=517,527
17/08 - Submitted=606,733 Indexed=517,498 Question 1: The indexed pages on 14th of 122,243 - Is this how many pages were previously indexed? Before Google processed the sitemaps? As they were not marked processed until 15th and 17th? Question 2: The indexed pages are already slipping, I'm working on fixing the site by reducing pages and improving internal structure and content, which I'm hoping will fix the crawling issue. But how often will Google crawl these XML sitemaps? Thanks in advance for any help.0 -
Keywords, when are you overdoing it in the URL?
Hi guys, I'm auditing a site covering compensation for cancer. Keywords could include: Undiagnosed cancer 20 cancer compensation 10 undiagnosed cancer symptoms 10 cancer misdiagnosis claims 20 cancer claims 10 misdiagnosis of cancer 50 cancer misdiagnosis 70 So, when structuring the URL for the category, this was previously selected: www.site.co.uk/medical-negligence/cancer-misdiagnosis Although sub-pages appear like this: www.site.co.uk/medical-negligence/cancer-misdiagnosis/breast-cancer-misdiagnosis-claim/ 'Cancer misdiagnosis' as a keyword attracts the most traffic, but if we're using it on sub-pages - is there a need to include it twice on all sub-page URLs? With that in mind, would it be better to follow the following format? www.site.co.uk/medical-negligence/cancer-compensation www.site.co.uk/medical-negligence/cancer-compensation/breast-cancer-misdiagnosis-claim/ Or is there a better way to structure this? Thanks in advance guys!
Technical SEO | | Muhammad-Isap0 -
OSE says URL redirects to URL with trailing slash but it doesn't.
Site is www.example.com/folder/us and OSE says this URL redirects to www.example.com/folder/us/, but it does not. When I look at the OSE report for the latter version with the "/" it says "No Data Available For This URL". Why would that be? The original URL is www.example.com and it redirects to www.example.com/folder/us. Is this anything I need to worry about? I thought that the trailing / doesn't really mean much anymore but nonetheless, why does it think it redirects there?
Technical SEO | | rock220 -
HTML Sitemap Pagination?
Im creating an a to z type directory of internal pages within a site of mine however there are cases where there are over 500 links within the pages. I intend to use pagination (rel=next/prev) to avoid too many links on the page but am worried about indexation issues. should I be worried?"
Technical SEO | | DMGoo0 -
Long URL
I am using seomoz software as a trial, it has crawled my site and a report is telling me that the URL for my forum is to long: <dl> <dt>Title</dt> <dd>Healthy Living Community</dd> <dt>Meta Description</dt> <dd>Healthy life discussion forum chatting about all aspects of healthy living including nutrition, fitness, motivation and much more.</dd> <dt>Meta Robots</dt> <dd>noodp, noydir</dd> <dt>Meta Refresh</dt> <dd>Not present/empty</dd> <dd> 1 Warning Long URL (> 115 characters) Found about 17 hours ago <dl> <dt>Number of characters</dt> <dd>135 (over by 21)</dd> <dt>Description</dt> <dd>A good URL is descriptive and concise. Although not a high priority, we recommend a URL that is shorter than 75 characters.</dd> </dl> </dd> <dd> URL: http://www.goodhealthword.com/forum/reprogramming-health/welcome-to-the-forum-for-discussing-the-4-steps-for-reprogramming-ones-health/ The problem is when I check the page via edit or in the admin section of wordpress, the url is a s follows: http://www.goodhealthword.com/forum/ My question is where is I cannot see where this long url is located, it appears to be a valid page but I cant find it. Thanks Pete </dd> </dl>
Technical SEO | | petemarko0 -
Compare URLs with 302 redirects
Hello I have a store which was developed in Magento. I have about 8300 errors like this: URL: http://www.theprinterdepo.com/catalog/product_compare/add/product/100/uenc/aHR0cDovL3d3dy50aGVwcmludGVyZGVwby5jb20vcHJpbnRlci1wYXJ0cy5odG1sP3A9NA,,/ 1 Warning 302 (Temporary Redirect) Found 3 days ago <dl> <dt>Redirects to</dt> <dt>http://goo.gl/XMaZg</dt> <dd>Description</dd> <dd>Using a 302 redirect will cause search engine crawlers to treat the redirect as temporary and not pass any link juice (ranking power). We highly recommend that you replace 302 redirects with 301 redirects.</dd> </dl> <a class="more expanded">Minimize</a> These URLs, are generated by magento by the COMPARE feature. In my store we bought an extension called SEO Enterprise Suite and I asked the developers(www.mageworx) about this error. Their answer is: Sorry for the late reply. Our extension adds NOINDEX,FOLLOW tag to compare and cookies pages so that they won't be indexed. I do not think that these redirects can hurt your SEO because these pages won't be indexed at all. The question is: What should I do? Is there anyway that SEOMOZ ignores these URLs? What should I do next, I just dont like to have that HIGH number of errors and warnings. Thank you
Technical SEO | | levalencia10 -
Sitemap.xml problem in Google webmaster
Hi, My sitemap.xml is not submitting correctly in Google Webmaster. There is 697 url submitted but only 56 are in Google index. At the top of webmaster this is what it says ->>> http://www.example.com/sitemap.xml has been resubmitted. But when when I clicked status button RED X occurs. Any suggestions about this, thanks...
Technical SEO | | Socialdude0