How Do I Generate a Sitemap for a Large Wordpress Site?
-
Hello Everyone!
I am working with a Wordpress site that is in Google news (i.e. everyday we have about 30 new URLs to add to our sitemap) The site has years of articles, resulting in about 200,000 pages on the site. Our strategy so far has been use a sitemap plugin that only generates the last few months of posts, however we want to improve our SEO and submit all the URLs in our site to search engines.
The issue is the plugins we've looked at generate the sitemap on-the-fly. i.e. when you request the sitemap, the plugin then dynamically generates the sitemap. Our site is so large that even a single request for our sitemap.xml ties up tons of server resources and takes an extremely long time to generate the sitemap (if the page doesn't time out in the process).
Does anyone have a solution?
Thanks,
Aaron
-
In my case, xml-sitempas works extremely good. I fully understand that a DB solution would avoid the crawl need, but the features that I get from xml-sitemaps are worth it.
I am running my website on a powerful dedicated server with SSDs, so perhaps that's why I'm not getting any problems plus I set limitations on the generator memory consumption and activated the feature that saves temp files just in case the generation fails.
-
My concern with recommending xml-sitemaps was that I've always had problems getting good, complete maps of extremely large sites. An internal CMS-based tool is grabbing pages straight from the database instead of having to crawl for them.
You've found that it gets you a pretty complete crawl of your 5K-page site, Federico?
-
I would go with the paid solution of xml-sitemaps.
You can set all the resources that you want it to have available, and it will store in temp files to avoid excessive consumption.
It also offers settings to create large sitemaps using a sitemap_index and you could get plugins that create the news sitemap automatically looking for changes since the last sitemap generation.
I have it running in my site with 5K pages (excluding tag pages) and it takes 10 minutes to crawl.
Then you also have plugins that create the sitemaps dynamically, like SEO by Yoast, Google XML Sitemaps, etc.
-
I think the solution to your server resource issue is to create multiple sitemaps, Aaron. Given that the sitemap protocol only allows 50,000 URLs max. per sitemap and Google News sitemaps can't be over 1000 URLs, this was going to be a necessity anyway, so may as well use these limitations to your advantage.
There's a functionality available for sitemaps called a sitemap index. It basically lists all the sitemap.xmls you've created, so the search engines can find and index them. You put it at the root of the site and then link to it in robots.txt just like a regular sitemap. (Can also submit it in GWT). In fact, Yoast's SEO plugin sitemaps and others use just this functionality already for their News add-on.
In your case, you could build the News sitemap dynamically to meet its special requirements (up to 1000 URLs and will crawl only last 2 days of posts) and to ensure it's up-to-the-minute accurate, as is critical for news sites.
Then separately you would build additional, segmented sitemaps for the existing 200,000 pages. Since these are historical pages, you could easily serve them from static files, since they wouldn't need to update once created. By having them static, there's be no server load to serve them each time - only the load to generate the current news sitemap. (I'd actually recommend you keep each static sitemap to around 25,000 pages each to ensure search engines can crawl them easily)
This approach would involve a bit of fiddling to initially set up, as you'd need to generate the "archive" sitemaps then convert them to static versions, but once set up, the News sitemap would take care of itself and once a month (or whatever you decide) you'd need to add the "expiring" pages from the News sitemap to the most recent "archive" segment. A smart programmer might even be able to automate that process.
Does this approach sound like it might solve your problem?
Paul
P.S. Since you'd already have the sitemap index capability, you could also add video and image sitemaps to your site if appropriate.
-
Have you ever tried using a web-based sitemap generator? Not sure how it would respond to your site but at least it would be running on someone else's server, right?
Not sure what else to say honestly.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemaps: Best Practice
What should and what shouldn't go in the sitemap? In particular, pages like subscribe to our newsletter/ unsubscribe to our newsletter? Is there really any benefit in highlighting those pages to the SEs? Thanks for any advice/ anecdotes 🙂
Intermediate & Advanced SEO | | Fubra0 -
Mobile First Index: What Could Happen To Sites w Large Desktop but Small Mobile Sites?
I have a question about how Mobile First could affect websites with separate (and smaller) mobile vs desktop sites. Referencing this SE Roundtable article (seorountable dot com /google-mobile-first-index-22953.html), "If you have less content on your mobile version than on your desktop version - Google will probably see the less content mobile version. Google said they are indexing the mobile version first." But Google/ Gary Illyes are also on the record stating the switch to mobile-first should be minimally disruptive. Does "Mobile First" mean that they'll consider desktop URLs "second", or will they actually just completely discount the desktop site in lieu of the mobile one? In other words: will content on your desktop site that does not appear in mobile count in desktop searches? I can't find clear answer anywhere (see also: /jlh-marketing dot com/mobile-first-unanswered-questions/). Obviously the writing is on the wall (and has been for years) that responsive is the way to go moving forward - but just looking for any other viewpoints/feedback here since it can be really expensive for some people to upgrade. I'm basically torn between "okay we gotta upgrade to responsive now" and "well, this may not be as critical as it seems". Sigh... Thanks in advance for any feedback and thoughts. LOL - I selected "there may not be a right answer to this question" when submitting this to the Moz community. 🙂
Intermediate & Advanced SEO | | mirabile0 -
Is my site being penalized?
I've gone through all the points on https://moz.com/blog/technical-site-audit-for-2015 but the site only ranks for its brand name after months. The website is not ranking in the top 100 for any main keywords (2,3,4 word phrases), only for a handful of very long phrases (4+). All of the content is unique, all pages are indexed, the website is fast and doesn't contain any crawl errors and there are a couple of links pointing to it. There is a sitewide follow link in the footer pointing to another domain, its parent company and vice-versa. This is not done for any SEO reasons but the companies are related and also the products are supplementary of each other. Could this be an issue? Or is my site being penalized by something else?
Intermediate & Advanced SEO | | Robbern0 -
1 Ecommerce site for several product segments or 1 Ecommerce site for each product segment ?
I am currently struggling with the decision whether to create individual ecommerce sites for each of 3 consumer product segments or rather to integrate them all under one umbrella domain. Obviously integration under 1 domain makes link building easier, but I am not sure how far google will favor in rankings websites focussed on one topic=product segment. Product segments are medium competitive.Product segments are not directly related but there may be some overlap in customer demographics- Any thoughts ?
Intermediate & Advanced SEO | | lcourse1 -
Should I link my similar sites together?
Hi I currently have two sites within exactly the same market. I've just purchased a third website from someone. Should I link these sites together? (i.e. in the page header should I cross link them or point two of them to the third?) If I do this will it harm them if they are on the same C-Class IP blocks? Is using private domains and different hosting companies considered dodgey in any way? Basically I'm a big wimp and don't want to do anything potentially that might potentially hurt my rankings;)
Intermediate & Advanced SEO | | Blendfish0 -
Site #2 beats site #1 in every aspect?
Hey guys, loving SEOMoz so far and will definitely continue my subscription after the free trial. I have a question however, which I am really confused about. When researching my primary keyword, I have found that the second ranked site beats the top site in every single aspect, apart from domain age, which is almost 6 years for the top one and 6 months for the second. When I say every single aspect, I mean everything. More authority for the page and domain, more links, more anchor text links, more authoritive links, more social signals, more relevant links, better domain (although second ranked site is a .net), better MozRank, better MozTrust etc.... I have noticed though, that in the UK SERPs, those sites are switched, so #2 is actually #1. Could it be that the US SERPs just haven't updated yet, or am I missing something completely different.
Intermediate & Advanced SEO | | darrenspeed1 -
Are there any disadvantages of switching from xml sitemaps to .asp sitemaps in GWT
I have been using multiple xml sitemaps for products for over 6 months and they are indexing well with GMT. I have been having this manually amended when a product becomes obsolete or we no longer stock it. I now have the option to automate the sitemaps from a SQL feed but using .asp sitemaps that I would submit the same way in GWT. I'd like your thoughts on the Pro's and cons of this, pluses for me is realtime updates, con's I percieve GMT to prefer xml files. what do you think?
Intermediate & Advanced SEO | | robertrRSwalters0 -
How Does This Site Rank So Well?!
So this website -> http://bailbondsripoffreport.com/ Ranks on the First Page for the term "Bail Bonds" It's the spammiest crappiest piece of junk website ever! lol - How does this site rank so well, it's not even a year old and it's link structure is crap. Can I like report them and have them removed lol. Any ideas would be appreciated. Thanks!
Intermediate & Advanced SEO | | utahseopros0