Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Xml sitemap advice for website with over 100,000 articles
-
Hi,
I have read numerous articles that support submitting multiple XML sitemaps for websites that have thousands of articles... in our case we have over 100,000. So, I was thinking I should submit one sitemap for each news category.
My question is how many page levels should each sitemap instruct the spiders to go? Would it not be enough to just submit the top level URL for each category and then let the spiders follow the rest of the links organically?
So, if I have 12 categories the total number of URL´s will be 12???
If this is true, how do you suggest handling or home page, where the latest articles are displayed regardless of their category... so I.E. the spiders will find l links to a given article both on the home page and in the category it belongs to. We are using canonical tags.
Thanks,
Jarrett
-
It's really a process of experimenting over time to find out the method that results in the most URLs indexed that in turn brings the most relevant traffic. Personally I wouldn't have one for each category, yet without tests there's no conclusive reasoning either way.
-
Thanks for the tip... I will do that.
I´m still unsure if I really need to submit a sitemap with thousands of URL´s I was thinking I should create an sitemap index file the points to individual top level category sitemaps and leave it at that. If I do this though, I suppose I don´t need individual sitemaps per category as I will just insert the category URL´s in the root sitemap. What do you think?
-
To add to Corey's response, I'll repeat what I just provided another question here on Pro Q&A. Sitemap.xml files can handle a maximum of 50,000 URLs, however I've seen them choke with as few as 10,000. Its important to run them through a tool like tools.pingdom.com to ensure they load within just a couple seconds.
Then submit them through Google/Bing webmaster systems and then see if they succeed in crawling all of them.
-
We break up our sitemap files into several different site maps, and then use a sitemap index file to make sure Google finds them all.
At the bottom of this post they talk about using an index file to combine multiple sitemaps, and they also specifically say it is fine to have one time sensitive site map (ie: front page items) and several other less time sensitive ones (categories in your case).
http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Any Tips for Reviving Old Websites?
Hi, I have a series of websites that have been offline for seven years. Do you guys have any tips that might help restore them to their former SERPs glory? Nothing about the sites themselves has changes since they went offline. Same domains, same content, and only a different server. What has changed is the SERPs landscape. I've noticed competitive terms that these sites used to rank on the first page for with far more results now. I have also noticed some terms result in what seems like a thesaurus similar language results from traditionally more authoritative websites instead of the exact phrase searched for. This concerns me because I could see a less relevant page outranking me just because it is on a .gov domain with similar vocabulary even though the result is not what people searching for the term are most likely searching for. The sites have also lost numerous backlinks but still have some really good ones.
Intermediate & Advanced SEO | | CopBlaster.com1 -
SEO on dynamic website
Hi. I am hoping you can advise. I have a client in one of my training groups and their site is a golf booking engine where all pages are dynamically created based on parameters used in their website search. They want to know what is the best thing to do for SEO. They have some landing pages that Google can see but there is only a small bit of text at the top and the rest of the page is dynamically created. I have advised that they should create landing pages for each of their locations and clubs and use canonicals to handle what Google indexes.Is this the right advice or should they noindex? Thanks S
Intermediate & Advanced SEO | | bedynamic0 -
Multilingual Sitemaps
Hey there, I have a site with many languages. So here are my questions concerning the sitemaps. The correct way of creating a sitemap for a multilingual site is as followed ( by the official blog of Google ) <urlset xmlns="</span>http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"> http://www.example.com/loc> <xhtml:link rel="alternate" hreflang="en" href="</span>http://www.example.com/"/> <xhtml:link rel="alternate" hreflang="de" href="</span>http://www.example.com/de"/> <xhtml:link rel="alternate" hreflang="fr" href="</span>http://www.example.com/fr"/><a href=" http:="" www.example.com="" fr"="" target="_blank"></xhtml:link><a href=" http:="" www.example.com="" de"="" target="_blank"></xhtml:link><a href=" http:="" www.example.com="" "="" target="_blank"></xhtml:link><a href=" http:="" www.sitemaps.org="" schemas="" sitemap="" 0.9"="" rel="nofollow" target="_blank"></urlset> **So here is my first question. My site has over 200.000 pages that all of them support around 5-6 languages. Am I suppose to do this example 200.000 times?****My second question is. My root domain is www.example.com but this one redirects with 301 to www.example.com/en should the sitemap be at ****www.example.com/sitemap.xmlorwww.example.com/en/sitemap.xml ???****My third question is as followed. On WMT do I submit my sitemap in all versions of my site? I have all my languages there.**Thanks in advance for taking the time to respond to this thread and by creating it I hope many people will solve their own questions.
Intermediate & Advanced SEO | | Angelos_Savvaidis0 -
Credit Links on Client Websites
I know there have been several people who have asked this but a lot of them were back in 2012 before many of the google changes. My question is the same though. With all the changes with Google's algorithm. Is it okay to put your link on the bottom of your clients website. Like Web Design by, etc. Part of the reason is to drive traffic but also if someone is actually interested who designed the website, they will click it. But now reading about how bad links can hurt you tremendously, it makes me second guess if this is ok. My gut feeling says, no.
Intermediate & Advanced SEO | | blackrino0 -
Noindex xml RSS feed
Hey, How can I tell search engines not to index my xml RSS feed? The RSS feed is created by Yoast on WordPress. Thanks, Luke.
Intermediate & Advanced SEO | | NoisyLittleMonkey0 -
Should sitemap include https pages?
Hi guys, Trying to figure out some onsite issues I've been having. Would appreciate any feedback on the following 2 questions: My homepage (http://mysite.com) is a 301 redirect to https://mysite.com, which is under SSL. Only 2 pages of my site are https, the rest are http. Should the directory of my sitemap be https://mysite.com/sitemap.xml or should it be kept with http (even though the redirected homepage is to https)? Should my sitemap include the https pages (only 2 pages) as well as the http? Thanks, G
Intermediate & Advanced SEO | | G.Anderson0 -
How to structure articles on a website.
Hi All, Key to a successful website is quality content - so the Gods of Google tell me. Embrace your audience with quality feature rich articles on your products or services, hints and tips, how to, etc. So you build your article page with all the correct criteria; Long Tail Keyword or phrases hitting the URL, heading, 1st sentance, etc. My question is this
Intermediate & Advanced SEO | | Mark_Ch
Let's say you have 30 articles, where would you place the 30 articles for SEO purposes and user experiences. My thought are:
1] on the home page create a column with a clear heading "Useful articles" and populate the column with links to all 30 articles.
or
2] throughout your website create link references to the articles as part of natural information flow.
or
3] Create a banner or impact logo on the all pages to entice your audience to click and land on dedicated "articles page" Thanks Mark0 -
Google Not Indexing XML Sitemap Images
Hi Mozzers, We are having an issue with our XML sitemap images not being indexed. The site has over 39,000 pages and 17,500 images submitted in GWT. If you take a look at the attached screenshot, 'GWT Images - Not Indexed', you can see that the majority of the pages are being indexed - but none of the images are. The first thing you should know about the images is that they are hosted on a content delivery network (CDN), rather than on the site itself. However, Google advice suggests hosting on a CDN is fine - see second screenshot, 'Google CDN Advice'. That advice says to either (i) ensure the hosting site is verified in GWT or (ii) submit in robots.txt. As we can't verify the hosting site in GWT, we had opted to submit via robots.txt. There are 3 sitemap indexes: 1) http://www.greenplantswap.co.uk/sitemap_index.xml, 2) http://www.greenplantswap.co.uk/sitemap/plant_genera/listings.xml and 3) http://www.greenplantswap.co.uk/sitemap/plant_genera/plants.xml. Each sitemap index is split up into often hundreds or thousands of smaller XML sitemaps. This is necessary due to the size of the site and how we have decided to pull URLs in. Essentially, if we did it another way, it may have involved some of the sitemaps being massive and thus taking upwards of a minute to load. To give you an idea of what is being submitted to Google in one of the sitemaps, please see view-source:http://www.greenplantswap.co.uk/sitemap/plant_genera/4/listings.xml?page=1. Originally, the images were SSL, so we decided to reverted to non-SSL URLs as that was an easy change. But over a week later, that seems to have had no impact. The image URLs are ugly... but should this prevent them from being indexed? The strange thing is that a very small number of images have been indexed - see http://goo.gl/P8GMn. I don't know if this is an anomaly or whether it suggests no issue with how the images have been set up - thus, there may be another issue. Sorry for the long message but I would be extremely grateful for any insight into this. I have tried to offer as much information as I can, however please do let me know if this is not enough. Thank you for taking the time to read and help. Regards, Mark Oz6HzKO rYD3ICZ
Intermediate & Advanced SEO | | edlondon0