Xml sitemap advice for website with over 100,000 articles
-
Hi,
I have read numerous articles that support submitting multiple XML sitemaps for websites that have thousands of articles... in our case we have over 100,000. So, I was thinking I should submit one sitemap for each news category.
My question is how many page levels should each sitemap instruct the spiders to go? Would it not be enough to just submit the top level URL for each category and then let the spiders follow the rest of the links organically?
So, if I have 12 categories the total number of URL´s will be 12???
If this is true, how do you suggest handling or home page, where the latest articles are displayed regardless of their category... so I.E. the spiders will find l links to a given article both on the home page and in the category it belongs to. We are using canonical tags.
Thanks,
Jarrett
-
It's really a process of experimenting over time to find out the method that results in the most URLs indexed that in turn brings the most relevant traffic. Personally I wouldn't have one for each category, yet without tests there's no conclusive reasoning either way.
-
Thanks for the tip... I will do that.
I´m still unsure if I really need to submit a sitemap with thousands of URL´s I was thinking I should create an sitemap index file the points to individual top level category sitemaps and leave it at that. If I do this though, I suppose I don´t need individual sitemaps per category as I will just insert the category URL´s in the root sitemap. What do you think?
-
To add to Corey's response, I'll repeat what I just provided another question here on Pro Q&A. Sitemap.xml files can handle a maximum of 50,000 URLs, however I've seen them choke with as few as 10,000. Its important to run them through a tool like tools.pingdom.com to ensure they load within just a couple seconds.
Then submit them through Google/Bing webmaster systems and then see if they succeed in crawling all of them.
-
We break up our sitemap files into several different site maps, and then use a sitemap index file to make sure Google finds them all.
At the bottom of this post they talk about using an index file to combine multiple sitemaps, and they also specifically say it is fine to have one time sensitive site map (ie: front page items) and several other less time sensitive ones (categories in your case).
http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I have over 12,000 permanent redirects. Is this okay?
Hey guys, I have moved my website to http://www to https:/www. I use a plugin called Really Simple SSL on Wordpress. Now I have 12,000 permanent 301 redirects. It seems like my whole site has been moved. Is this the right way to do it? I am quite new to SEO and I quite don't understand if this is good or bad. From my intuition, it seems like a lot of redirects for Google bot to crawl through. Is there a better way to move to HTTPS without all these redirects? Thank you. Paul
Intermediate & Advanced SEO | | paultg0 -
Delay to rank for authority website versus new website
Hello, I have a website that has been existing for years. How long does it take if I have a good content on a page for it to rank ? I read here and that that it can take 4 to 6 months but it never says if it is for a brand new website or a old website that has an authority and some links. I also read that some people publish content and rank within a week on competitive keywords. So who is right, what is there to read in between the lines ?
Intermediate & Advanced SEO | | seoanalytics0 -
International XML Sitemaps - Standalone, or Integrate into Existing XML Sitemap?
Hi there, We understand that hreflang tagging can be incorporated into an existing XML sitemap. That said, is there any inherent issue with having two sitemaps - your regular XML sitemap plus an international XML sitemap which lists off many of the same URLs as your original XML sitemap? For example, one of our clients has an XML sitemap file they don't want to have to edit, but we want to implement international hreflang xml sitemaps for them. Can we add an "English" XML sitemap with the proper hreflang tagging even though this new sitemap contains many duplicates as the existing XML sitemap file? Thank you!
Intermediate & Advanced SEO | | FPD_NYC0 -
Which search engines should we submit our sitemap to?
Other than Google and Bing, which search engines should we submit our sitemap to?
Intermediate & Advanced SEO | | NicheSocial0 -
Xml sitemap Issue... Xml sitemap generator facilitating only few pages for indexing
Help me I have a website earlier 10,000 WebPages were facilitated in xml sitemap for indexation, but from last few days xml sitemap generator facilitating only 3300 WebPages for indexing. Please help me to resolve the issue. I have checked Google webmaster indexed pages, its showing 8,141. I have tried 2-3 paid tools, but all are facilitating 3300 pages for indexing. I am not getting what is the exact problem, whether the server not allowing or the problem with xml sitemap generator. Please please help me…
Intermediate & Advanced SEO | | udistm0 -
Noindex xml RSS feed
Hey, How can I tell search engines not to index my xml RSS feed? The RSS feed is created by Yoast on WordPress. Thanks, Luke.
Intermediate & Advanced SEO | | NoisyLittleMonkey0 -
Canonical URLs and Sitemaps
We are using canonical link tags for product pages in a scenario where the URLs on the site contain category names, and the canonical URL points to a URL which does not contain the category names. So, the product page on the site is like www.example.com/clothes/skirts/skater-skirt-12345, and also like www.example.com/sale/clearance/skater-skirt-12345 in another category. And on both of these pages, the canonical link tag references a 3rd URL like www.example.com/skater-skirt-12345. This 3rd URL, used in the canonical link tag is a valid page, and displays the same content as the other two versions, but there are no actual links to this generic version anywhere on the site (nor external). Questions: 1. Does the generic URL referenced in the canonical link also need to be included as on-page links somewhere in the crawled navigation of the site, or is it okay to be just a valid URL not linked anywhere except for the canonical tags? 2. In our sitemap, is it okay to reference the non-canonical URLs, or does the sitemap have to reference only the canonical URL? In our case, the sitemap points to yet a 3rd variation of the URL, like www.example.com/product.jsp?productID=12345. This page retrieves the same content as the others, and includes a canonical link tag back to www.example.com/skater-skirt-12345. Is this a valid approach, or should we revise the sitemap to point to either the category-specific links or the canonical links?
Intermediate & Advanced SEO | | 379seo0 -
Create a new XML Sitemap for a blog subdomain?
What would be the best way to go about this? A site just put a blog on http://blog.domain.com/ Should there be a separate XML Sitemap for that particular subdomain or should the original XML Sitemap for the main domain be sufficient? Looking forward to your responses. Thanks
Intermediate & Advanced SEO | | iAnalyst.com0