Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Xml sitemap advice for website with over 100,000 articles
-
Hi,
I have read numerous articles that support submitting multiple XML sitemaps for websites that have thousands of articles... in our case we have over 100,000. So, I was thinking I should submit one sitemap for each news category.
My question is how many page levels should each sitemap instruct the spiders to go? Would it not be enough to just submit the top level URL for each category and then let the spiders follow the rest of the links organically?
So, if I have 12 categories the total number of URL´s will be 12???
If this is true, how do you suggest handling or home page, where the latest articles are displayed regardless of their category... so I.E. the spiders will find l links to a given article both on the home page and in the category it belongs to. We are using canonical tags.
Thanks,
Jarrett
-
It's really a process of experimenting over time to find out the method that results in the most URLs indexed that in turn brings the most relevant traffic. Personally I wouldn't have one for each category, yet without tests there's no conclusive reasoning either way.
-
Thanks for the tip... I will do that.
I´m still unsure if I really need to submit a sitemap with thousands of URL´s I was thinking I should create an sitemap index file the points to individual top level category sitemaps and leave it at that. If I do this though, I suppose I don´t need individual sitemaps per category as I will just insert the category URL´s in the root sitemap. What do you think?
-
To add to Corey's response, I'll repeat what I just provided another question here on Pro Q&A. Sitemap.xml files can handle a maximum of 50,000 URLs, however I've seen them choke with as few as 10,000. Its important to run them through a tool like tools.pingdom.com to ensure they load within just a couple seconds.
Then submit them through Google/Bing webmaster systems and then see if they succeed in crawling all of them.
-
We break up our sitemap files into several different site maps, and then use a sitemap index file to make sure Google finds them all.
At the bottom of this post they talk about using an index file to combine multiple sitemaps, and they also specifically say it is fine to have one time sensitive site map (ie: front page items) and several other less time sensitive ones (categories in your case).
http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why has my website been removed from Bing?
I have a website that has recently been removed from Bing's index, but can't figure out why. The website isn't new, and it is indexed just fine on Google. These are the steps I've tried: The website is verified in Bing Webmaster Tools and successfully submitted the sitemap. I tested the URL to ensure that Bingbot is allowed to crawl the site I submitted URLs to Bing via the URL Submission tool There isn't a "noindex" on the site preventing it from being indexed When I do a URL Inspection, an error message comes up saying "The inspected URL is known to Bing but has some issues which are preventing us from serving it to our users. We recommend you to follow Bing Webmaster Guidelines." I contacted Bing to ask whether the website was removed in error, but received a reply that the website doesn't comply with Bing's quality guidelines, but they wouldn't go into detail as to which guidelines the website isn't meeting. The website URL is https://www.pardeehospital.org. Can anyone offer any advice or insight as to why Bing won't index our site? Thank you!
Intermediate & Advanced SEO | | lindsey.steinkamp0 -
Please help need some advice?
Can any of you guys please help me I have alerts on links coming in and it looks like recently someone did this, it looks maliciously done as it is only our domain mentioned and most are brand new posts? http://testosteroneclinicindenve53950.shotblogs.com/testosterone-clinic-in-denver-fundamentals-explained-6102386 http://claytondmnnp.ampedpages.com/Details-Fiction-and-testosterone-clinic-in-denver-16897309 http://vinylvehiclecarwrap38041.alltdesign.com/a-review-of-vinyl-vehicle-car-wrap-9574042 http://devinxccct.educationalimpactblog.com/1784474/little-known-facts-about-vinyl-vehicle-car-wrap http://keeganbsftf.ka-blogs.com/7488539/how-vinyl-vehicle-car-wrap-can-save-you-time-stress-and-money http://andybxoes.thezenweb.com/vinyl-vehicle-car-wrap-Fundamentals-Explained-17581028 http://kylerhfdzu.blogkoo.com/not-known-details-about-vinyl-vehicle-car-wrap-9029141 http://troyytkyn.timeblog.net/7695911/the-greatest-guide-to-vinyl-vehicle-car-wrap http://waylontyzab.pointblog.net/testosterone-clinic-in-denver-Secrets-16335972 http://testosteroneclinicindenve30516.onesmablog.com/Top-testosterone-clinic-in-denver-Secrets-17252737 http://emiliogkmop.blogofoto.com/7667522/top-guidelines-of-testosterone-clinic-in-denver http://caidenaczxt.blogs-service.com/7514172/testosterone-clinic-in-denver-fundamentals-explained http://daltonpyfms.mybjjblog.com/5-simple-statements-about-testosterone-clinic-in-denver-explained-6517932 Should I try to disavow these and submit to google or will google know our site which has been up for 5 years is not doing this? Should I do any of these https://tehnoblog.org/google-webmaster-tools-my-website-got-bombed-with-backlinks-what-to-do/
Intermediate & Advanced SEO | | BobAnderson0 -
Which search engines should we submit our sitemap to?
Other than Google and Bing, which search engines should we submit our sitemap to?
Intermediate & Advanced SEO | | NicheSocial0 -
SEO Advice for Angular JS
We are changing our homepage (and gradually the rest of the site) to Angular JS.
Intermediate & Advanced SEO | | theLotter
In order not to lose anything in terms of SEO we are implementing Hashbangs + escaped fragment snapshots. Are there any other SEO considerations you think we should have and/or additional elements that we could add to the page to improve it in terms of SEO?0 -
Hreflang in vs. sitemap?
Hi all, I decided to identify alternate language pages of my site via sitemap to save our development team some time. I also like the idea of having leaner markup. However, my site has many alternate language and country page variations, so after creating a sitemap that includes mostly tier 1 and tier 2 level URLs, i now have a sitemap file that's 17mb. I did a couple google searches to see is sitemap file size can ever be an issue and found a discussion or two that suggested keeping the size small and a really old article that recommended keeping it < 10mb. Does the sitemap file size matter? GWT has verified the sitemap and appears to be indexing the URLs fine. Are there any particular benefits to specifying alternate versions of a URL in vs. sitemap? Thanks, -Eugene
Intermediate & Advanced SEO | | eugene_bgb0 -
XML Sitemap for classifieds
I have seeon some trends for sites which do not even use XML sitemp and robots e.g. see this site. How do you see if sitemap is not used. Also for classified websites, should ad pages be included in sitemap because after certain duration those ads will be deleted and google might not be able to crawl. What do you suggest about XML sitemap for classified website.
Intermediate & Advanced SEO | | MozAddict0 -
Canonical URLs and Sitemaps
We are using canonical link tags for product pages in a scenario where the URLs on the site contain category names, and the canonical URL points to a URL which does not contain the category names. So, the product page on the site is like www.example.com/clothes/skirts/skater-skirt-12345, and also like www.example.com/sale/clearance/skater-skirt-12345 in another category. And on both of these pages, the canonical link tag references a 3rd URL like www.example.com/skater-skirt-12345. This 3rd URL, used in the canonical link tag is a valid page, and displays the same content as the other two versions, but there are no actual links to this generic version anywhere on the site (nor external). Questions: 1. Does the generic URL referenced in the canonical link also need to be included as on-page links somewhere in the crawled navigation of the site, or is it okay to be just a valid URL not linked anywhere except for the canonical tags? 2. In our sitemap, is it okay to reference the non-canonical URLs, or does the sitemap have to reference only the canonical URL? In our case, the sitemap points to yet a 3rd variation of the URL, like www.example.com/product.jsp?productID=12345. This page retrieves the same content as the others, and includes a canonical link tag back to www.example.com/skater-skirt-12345. Is this a valid approach, or should we revise the sitemap to point to either the category-specific links or the canonical links?
Intermediate & Advanced SEO | | 379seo0 -
Sitemap in SERPS
What's up guys, Having some troubles with SERP rankings. My sitemap (navigation) is appearing instead of my actual keywords. I have tried a few methods to fix this; setting a preferred domain, using a 301 redirects, deleting out of date pages via Google webmaster tools. Nothing seems to work. My next step was to refresh the cache for my entire site - does anyone know how to do this? Can't see any tools... Any help would be great. Cheers, Jon.
Intermediate & Advanced SEO | | jamesjk240