Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Sitemap use for very large forum-based community site
-
I work on a very large site with two main types of content, static landing pages for products, and a forum & blogs (user created) under each product. Site has maybe 500k - 1 million pages. We do not have a sitemap at this time.
Currently our SEO discoverability in general is good, Google is indexing new forum threads within 1-5 days roughly. Some of the "static" landing pages for our smaller, less visited products however do not have great SEO.
Question is, could our SEO be improved by creating a sitemap, and if so, how could it be implemented? I see a few ways to go about it:- Sitemap includes "static" product category landing pages only - i.e., the product home pages, the forum landing pages, and blog list pages. This would probably end up being 100-200 URLs.
- Sitemap contains the above but is also dynamically updated with new threads & blog posts.
Option 2 seems like it would mean the sitemap is unmanageably long (hundreds of thousands of forum URLs). Would a crawler even parse something that size? Or with Option 1, could it cause our organically ranked pages to change ranking due to Google re-prioritizing the pages within the sitemap?
Not a lot of information out there on this topic, appreciate any input. Thanks in advance. -
Agreed, you'll likely want to go with option #2. Dynamic sitemaps are a must when you're dealing with large sites like this. We advise them on all of our clients with larger sites. If your forum content is important for search then these are definitely important to include as the content likely changes often and might be naturally deeper in the architecture.
In general, I'd think of sitemaps from a discoverability perspective instead of a ranking one. The primary goal is to give Googlebot an avenue to crawl your sites content regardless of internal linking structure.
-
Hi
Go with option 2, there is no scaling issue here. I have worked with and for sites that have a high multiplier on the number of sitemaps and pages that they're submitting, in some cases up to 100M pages. In all cases, Google was totally fine in crawling and processing the data that was there. As long as you follow the guidelines (max 50K URLs in a sitemap) you're fine as you're just providing another file that usually doesn't exceed about 50MB (depending on if you also add images to the sitemap). If you have an engineering team build the right infrastructure you can easily deal with thousands of these files and run them automated every day/week.
My main focus on big sites is also to streamline their sitemaps to have sitemaps with just the last 50.000 pages and the same for the last 50.000 pages that were updated. This way you're able to also monitor the indexation level of these pages. If you are able to, for example, combine the data from log file analysis you can say: we added 50K pages and Google in the last days were able to crawl X percentage of that.
Hope this gives you some extra insights.
Martijn.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemap.xml strategy for site with thousands of pages
I have a client that has a HUGE website with thousands of product pages. We don't currently have a sitemap.xml because it would take so much power to map the sitemap. I have thought about creating a sitemap for the key pages on the website - but didn't want to hurt the SEO on the thousands of product pages. If you have a sitemap.xml that only has some of the pages on your site - will it negatively impact the other pages, that Google has indexed - but are not listed on the sitemap.xml.
Technical SEO | | jerrico10 -
Should we use Cloudflare
Hi all, we want to speed up our website (hosted in Wordpress, traffic around 450,000 page views monthly), we use lots of images. And we're wondering about setting up on Cloudflare, however after searching a bit in Google I have seen some people say the change in IP, or possible sharing of Its with bad neighbourhoods, can really hit search rankings. So, I was wondering what the latest thinking is on this subject, would the increased speed and local server locations be a boost for SEO, moreso than a potential loss of rankings for changing IP? Thanks!
Technical SEO | | tiromedia1 -
If I'm using a compressed sitemap (sitemap.xml.gz) that's the URL that gets submitted to webmaster tools, correct?
I just want to verify that if a compressed sitemap file is being used, then the URL that gets submitted to Google, Bing, etc and the URL that's used in the robots.txt indicates that it's a compressed file. For example, "sitemap.xml.gz" -- thanks!
Technical SEO | | jgresalfi0 -
Automate XML Sitemaps
Quick question, which is the best method that people have for automating sitemaps. We publish around 200 times a day and I would like to make sure as soon as we publish it gets updated in the site map. What is the best method of updating a sitemap so it gets updated immediately after it is published.
Technical SEO | | mattdinbrooklyn0 -
What is the best way to redirect visitors to certain pages of your site based on their location?
One website I manage wants to redirect users to state specific pages based on their location. What is the best way to accomplish this? For example a user enters the through site.com but they are in Colorado so we want to direct them to site.com/colorado.
Technical SEO | | Firestarter-SEO0 -
Is there a maximum sitemap size?
Hi all, Over the last month we've included all images, videos, etc. into our sitemap and now its loading time is rather high. (http://www.troteclaser.com/sitemap.xml) Is there any maximum sitemap size that is recommended from Google?
Technical SEO | | Troteclaser0 -
302 redirect used, submit old sitemap?
The website of a partner of mine was recently migrated to a new platform. Even though the content on the pages mostly stayed the same, both the HTML source (divs, meta data, headers, etc.) and URLs (removed index.php, removed capitalization, etc) changed heavily. Unfortunately, the URLs of ALL forum posts (150K+) were redirected using a 302 redirect, which was only recently discovered and swiftly changed to a 301 after the discovery. Several other important content pages (150+) weren't redirected at all at first, but most now have a 301 redirect as well. The 302 redirects and 404 content pages had been live for over 2 weeks at that point, and judging by the consistent day/day drop in organic traffic, I'm guessing Google didn't like the way this migration went. My best guess would be that Google is currently treating all these content pages as 'new' (after all, the source code changed 50%+, most of the meta data changed, the URL changed, and a 302 redirect was used). On top of that, the large number of 404's they've encountered (40K+) probably also fueled their belief of a now non-worthy-of-traffic website. Given that some of these pages had been online for almost a decade, I would love Google to see that these pages are actually new versions of the old page, and therefore pass on any link juice & authority. I had the idea of submitting a sitemap containing the most important URLs of the old website (as harvested from the Top Visited Pages from Google Analytics, because no old sitemap was ever generated...), thereby re-pointing Google to all these old pages, but presenting them with a nice 301 redirect this time instead, hopefully causing them to regain their rankings. To your best knowledge, would that help the problems I've outlined above? Could it hurt? Any other tips are welcome as well.
Technical SEO | | Theo-NL0 -
Index forum sites
Hi Moz Team, somehow the last question i raised a few days ago not only wasnt answered up until now, it was also completely deleted and the credit was not "refunded" - obviously there was some data loss involved with your restructuring. Can you check whether you still find the last question and answer it quickly? I need the answer 🙂 Here is one more question: I bought a website that has a huge forum, loads of pages with user generated content. Overall around 500.000 Threads with 9 Million comments. The complete forum is noindex/nofollow when i bought the site, now i am thinking about what is the best way to unleash the potential. The current system is vBulletin 3.6.10. a) Shall i first do an update of vbulletin to version 4 and use the vSEO tool to make the URLs clean, more user and search engine friendly before i switch to index/follow? b) would you recommend to have the forum in the folder structure or on a subdomain? As far as i know subdomain does take lesser strenght from the TLD, however, it is safer because the subdomain is seen as a separate entity from the regular TLD. Having it in he folder makes it easiert to pass strenght from the TLD to the forum, however, it puts my TLD at risk c) Would you release all forum sites at once or section by section? I think section by section looks rather unnatural not only to search engines but also to users, however, i am afraid of blasting more than a millionpages into the index at once. d) Would you index the first page of a threat or all pages of a threat? I fear duplicate content as the different pages of the threat contain different body content but the same Title and possibly the same h1. Looking forward to hear from you soon! Best Fabian
Technical SEO | | fabiank0