Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Xml sitemap advice for website with over 100,000 articles
-
Hi,
I have read numerous articles that support submitting multiple XML sitemaps for websites that have thousands of articles... in our case we have over 100,000. So, I was thinking I should submit one sitemap for each news category.
My question is how many page levels should each sitemap instruct the spiders to go? Would it not be enough to just submit the top level URL for each category and then let the spiders follow the rest of the links organically?
So, if I have 12 categories the total number of URL´s will be 12???
If this is true, how do you suggest handling or home page, where the latest articles are displayed regardless of their category... so I.E. the spiders will find l links to a given article both on the home page and in the category it belongs to. We are using canonical tags.
Thanks,
Jarrett
-
It's really a process of experimenting over time to find out the method that results in the most URLs indexed that in turn brings the most relevant traffic. Personally I wouldn't have one for each category, yet without tests there's no conclusive reasoning either way.
-
Thanks for the tip... I will do that.
I´m still unsure if I really need to submit a sitemap with thousands of URL´s I was thinking I should create an sitemap index file the points to individual top level category sitemaps and leave it at that. If I do this though, I suppose I don´t need individual sitemaps per category as I will just insert the category URL´s in the root sitemap. What do you think?
-
To add to Corey's response, I'll repeat what I just provided another question here on Pro Q&A. Sitemap.xml files can handle a maximum of 50,000 URLs, however I've seen them choke with as few as 10,000. Its important to run them through a tool like tools.pingdom.com to ensure they load within just a couple seconds.
Then submit them through Google/Bing webmaster systems and then see if they succeed in crawling all of them.
-
We break up our sitemap files into several different site maps, and then use a sitemap index file to make sure Google finds them all.
At the bottom of this post they talk about using an index file to combine multiple sitemaps, and they also specifically say it is fine to have one time sensitive site map (ie: front page items) and several other less time sensitive ones (categories in your case).
http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why has my website been removed from Bing?
I have a website that has recently been removed from Bing's index, but can't figure out why. The website isn't new, and it is indexed just fine on Google. These are the steps I've tried: The website is verified in Bing Webmaster Tools and successfully submitted the sitemap. I tested the URL to ensure that Bingbot is allowed to crawl the site I submitted URLs to Bing via the URL Submission tool There isn't a "noindex" on the site preventing it from being indexed When I do a URL Inspection, an error message comes up saying "The inspected URL is known to Bing but has some issues which are preventing us from serving it to our users. We recommend you to follow Bing Webmaster Guidelines." I contacted Bing to ask whether the website was removed in error, but received a reply that the website doesn't comply with Bing's quality guidelines, but they wouldn't go into detail as to which guidelines the website isn't meeting. The website URL is https://www.pardeehospital.org. Can anyone offer any advice or insight as to why Bing won't index our site? Thank you!
Intermediate & Advanced SEO | | lindsey.steinkamp0 -
Please help need some advice?
Can any of you guys please help me I have alerts on links coming in and it looks like recently someone did this, it looks maliciously done as it is only our domain mentioned and most are brand new posts? http://testosteroneclinicindenve53950.shotblogs.com/testosterone-clinic-in-denver-fundamentals-explained-6102386 http://claytondmnnp.ampedpages.com/Details-Fiction-and-testosterone-clinic-in-denver-16897309 http://vinylvehiclecarwrap38041.alltdesign.com/a-review-of-vinyl-vehicle-car-wrap-9574042 http://devinxccct.educationalimpactblog.com/1784474/little-known-facts-about-vinyl-vehicle-car-wrap http://keeganbsftf.ka-blogs.com/7488539/how-vinyl-vehicle-car-wrap-can-save-you-time-stress-and-money http://andybxoes.thezenweb.com/vinyl-vehicle-car-wrap-Fundamentals-Explained-17581028 http://kylerhfdzu.blogkoo.com/not-known-details-about-vinyl-vehicle-car-wrap-9029141 http://troyytkyn.timeblog.net/7695911/the-greatest-guide-to-vinyl-vehicle-car-wrap http://waylontyzab.pointblog.net/testosterone-clinic-in-denver-Secrets-16335972 http://testosteroneclinicindenve30516.onesmablog.com/Top-testosterone-clinic-in-denver-Secrets-17252737 http://emiliogkmop.blogofoto.com/7667522/top-guidelines-of-testosterone-clinic-in-denver http://caidenaczxt.blogs-service.com/7514172/testosterone-clinic-in-denver-fundamentals-explained http://daltonpyfms.mybjjblog.com/5-simple-statements-about-testosterone-clinic-in-denver-explained-6517932 Should I try to disavow these and submit to google or will google know our site which has been up for 5 years is not doing this? Should I do any of these https://tehnoblog.org/google-webmaster-tools-my-website-got-bombed-with-backlinks-what-to-do/
Intermediate & Advanced SEO | | BobAnderson0 -
Is possible to submit a XML sitemap to Google without using Google Search Console?
We have a client that will not grant us access to their Google Search Console (don't ask us why). Is there anyway possible to submit a XML sitemap to Google without using GSC? Thanks
Intermediate & Advanced SEO | | RosemaryB0 -
Priority Attribute in XML Sitemaps - Still Valid?
Is the priority value (scale of 0-1) used for each URL in an XML sitemap still a valid way of communicating to search engines which content you (the webmaster) believe is more important relative to other content on your site? I recall hearing that this was no longer used, but can't find a source. If it is no longer used, what are the easiest ways to communicate our preferences to search engines? Specifically, I'm looking to preference the most version version of a product's documentation (version 9) over the previous version (version 8). Thanks!
Intermediate & Advanced SEO | | Allie_Williams0 -
SEO Advice for Angular JS
We are changing our homepage (and gradually the rest of the site) to Angular JS.
Intermediate & Advanced SEO | | theLotter
In order not to lose anything in terms of SEO we are implementing Hashbangs + escaped fragment snapshots. Are there any other SEO considerations you think we should have and/or additional elements that we could add to the page to improve it in terms of SEO?0 -
Would it be better to Start Over vs doing a Website Migration?
Hey guys /gals I have a question please. I have a computer repair business that does extremely well in search and is on the front page of google for anything computer repair related. However, I am currently re-branding my company and have completely redesigned every aspect of the UI and the SEO Site structure as well as the fact that I have completely written vastly different content and different title tag lines and meta descriptions for each page. So basically when doing a migration we know that we want to keep our content, titles, headlines and meta descriptions the same as to not lose our page rank. Seeing that I have completely went against the grain in all directions on a much needed company re-branding and everything is completely different from the old site is it even worthwhile 301 redirecting my old urls to the new ones that would (best) correspond with the new? In the plainest English, would I do better at Ranking the New Website QUICKER without doing 301 redirects from the OLD to the NEW? In an EXTREME instance like what I have done, would the Domain Migration IMPEDED me ranking the new site seeing how nothing is the same? I have build a Rock solid SILO Site Architecture on the New site which is WordPress using the Thesis Framework and the old domain is built on JOOMLA 1.5 Thank fellas Marshall
Intermediate & Advanced SEO | | MarshallThompson0 -
Effects of having both http and https on my website
You are able to view our website as either http and https on all pages. For example: You can type "http://mywebsite.com/index.html" and the site will remain as http: as you navigate the site. You can also type "https://mywebsite.com/index.html" and the site will remain as https: as you navigate the site. My question is....if you can view the entire site using either http or https, is this being seen as duplicate content/pages? Does the same hold true with "www.mywebsite.com" and "mywebsite.com"? Thanks!
Intermediate & Advanced SEO | | rexjoec1 -
What should be done with old news articles?
Hello, We have a portal website that gives information about the industry we work in. This website includes various articles, tips, info, reviews and more about the industry.We also have a news section that was previously indexed in Google news but is not for the past few month.The site was hit by Panda over a year ago and one of the things we have been thinking of doing is removing pages that are irrelavant/do not provide added value to the site.Some of these pages are old news articles posted over 3-4 years ago and that have had hardly any traffic to.All the news articles on the site are under a /archive/ folder sorted by month and year, so for example a url for a news item from April 2010 would be /archive/042010/article-nameMy question is do you think removing such news articles would benefit the site helping it get out of Panda (many other things have been done in the site as well), if not what is the best suggested way to keep these articles on the site in a way which Google indexes them and treats them well.thx
Intermediate & Advanced SEO | | Tit0