Sitemap use for very large forum-based community site
-
I work on a very large site with two main types of content, static landing pages for products, and a forum & blogs (user created) under each product. Site has maybe 500k - 1 million pages. We do not have a sitemap at this time.
Currently our SEO discoverability in general is good, Google is indexing new forum threads within 1-5 days roughly. Some of the "static" landing pages for our smaller, less visited products however do not have great SEO.
Question is, could our SEO be improved by creating a sitemap, and if so, how could it be implemented? I see a few ways to go about it:- Sitemap includes "static" product category landing pages only - i.e., the product home pages, the forum landing pages, and blog list pages. This would probably end up being 100-200 URLs.
- Sitemap contains the above but is also dynamically updated with new threads & blog posts.
Option 2 seems like it would mean the sitemap is unmanageably long (hundreds of thousands of forum URLs). Would a crawler even parse something that size? Or with Option 1, could it cause our organically ranked pages to change ranking due to Google re-prioritizing the pages within the sitemap?
Not a lot of information out there on this topic, appreciate any input. Thanks in advance. -
Agreed, you'll likely want to go with option #2. Dynamic sitemaps are a must when you're dealing with large sites like this. We advise them on all of our clients with larger sites. If your forum content is important for search then these are definitely important to include as the content likely changes often and might be naturally deeper in the architecture.
In general, I'd think of sitemaps from a discoverability perspective instead of a ranking one. The primary goal is to give Googlebot an avenue to crawl your sites content regardless of internal linking structure.
-
Hi
Go with option 2, there is no scaling issue here. I have worked with and for sites that have a high multiplier on the number of sitemaps and pages that they're submitting, in some cases up to 100M pages. In all cases, Google was totally fine in crawling and processing the data that was there. As long as you follow the guidelines (max 50K URLs in a sitemap) you're fine as you're just providing another file that usually doesn't exceed about 50MB (depending on if you also add images to the sitemap). If you have an engineering team build the right infrastructure you can easily deal with thousands of these files and run them automated every day/week.
My main focus on big sites is also to streamline their sitemaps to have sitemaps with just the last 50.000 pages and the same for the last 50.000 pages that were updated. This way you're able to also monitor the indexation level of these pages. If you are able to, for example, combine the data from log file analysis you can say: we added 50K pages and Google in the last days were able to crawl X percentage of that.
Hope this gives you some extra insights.
Martijn.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Which url should i use? Thanks!
I have a question regarding how to use my url, we are a Swedish-based website which have the url, http://interimslösning.se/ (that contains the Swedish letter “ö”) so the url can also be written as http://xn--interimslsning-3pb.se/. Which of the following url should I use for my backlinks, http://interimslösning.se/ or http://xn--interimslsning-3pb.se/ ? What is the difference between them regarding SEO? And is it good or bad to use letter like "ö" or other characters like that in your url? I was thinking that maybe it is good to use the letter "ö" for local search optimization in sweden, but i don't know.. Thanks in advance! Greetings,
Technical SEO | | Kiwibananlime
Paul Linderoth0 -
Representing categories on my site
My site serves a consumer-focused industry that has about 15-20 well recognized categories, which act as a pretty obvious way to segment our content. Each category supports it's own page (with some useful content) and a series of articles relevant to that category. In short, the categories are pretty focal to what we do. I am moving from DNN to WordPress as my CMS/blog. I am taking the opportunity to review and fix SEO-related issues as I migrate. One such area is my URL structure. On my existing site (on DNN), I have the following types of pages for each topic: / <topic>- this is essentially the landing page for the topic and links to articles</topic> /<topic>/articles/ <article-name>- topics have 3-15 articles with this URL structure</article-name></topic> With WordPress, I am considering moving to articles being under the root. So, an article on (making this up) how to make a widget would be under /how-to-make-a-widget, instead of /<widgets>/article/how-to-make-a-widget I will be using WordPress categories to reflect the topics taxonomy, so I can flag my articles using standard WordPress concepts.</widgets> Anyway, I'm trying to get my head around whether it makes sense to "flatten" my URL structure such that the URLs for each article no longer include the topic (the article page will link to the topic page though). Thoughts?
Technical SEO | | MarkWill1 -
Site Migration Questions
Hello everyone, We are in the process of going from a .net to a .com and we have also done a complete site redesign as well as refreshed all of our content. I know it is generally ideal to not do all of this at once but I have no control over that part. I have a few questions and would like any input on avoiding losing rankings and traffic. One of my first concerns is that we have done away with some of our higher ranking pages and combined them into one parallax scrolling page. Basically, instead of having a product page for each product they are now all on one page. This of course has made some difficulty because search terms we were using for the individual pages no longer apply. My next concern is that we are adding keywords to the ends of our urls in attempt to raise rankings. So an example: website.com/product/product-name/keywords-for-product if a customer deletes keywords-for-product they end up being re-directed back to the page again. Since the keywords cannot be removed is a redirect the best way to handle this? Would a canonical tag be better? I'm trying to avoid duplicate content since my request to remove the keywords in urls was denied. Also when a customer deletes everything but website.com/product/ it goes to the home page and the url turns to website.com/product/#. Will those pages with # at the end be indexed separately or does google ignore that? Lastly, how can I determine what kind of loss in traffic we are looking at upon launch? I know some is to be expected but I want to avoid it as much as I can so any advice for this migration would be greatly appreciated.
Technical SEO | | Sika220 -
Which address do I use for citations
Hello, When I created my google places, I entered my address and when I got my google places activated I noticed that the address google places was displaying was a short abbreviation of my address. So my question is when it comes to creating citations for my listing do I grab the address google places generated for me in the listing or the long version of my address? I've just heard when it comes to creating citations, you need to make sure it is identical across the board. I hope this makes sense. Thanks!
Technical SEO | | fbbcseo0 -
Lost with conical, nofollow noindex. Not sure how to use it on a dyanmic php site with multiple region select options
I have a site with multiple regions the main page after a region is selected is login.php but the regions are defined by ?rid=11 , 12, etc. These are being picked up as duplicate content but they are all different regions. As i hired external php coders to develop most of the site I am scared to start meddling with any of the raw code and would like some advise on how to not show these as duplicate content. should i use noindex nofollow or connical? if Connical how do i set it up on the main login.php page? p.s. i am an extreme nube to seo
Technical SEO | | moby1230 -
When to use canonical urls
I will be the first to admit I am never really 100% sure when to use canonical urls. I have a quick question and I am not really sure if this is a situation for a canonical or not. I am looking at a my friends building website and there are issues with what pages are ranking. Basically there homepage is focusing on the building refurbishment location but for some reason in internal page is ranking for that keyword and it is not mentioned at all on that page. Would this be a time to add the homepage url and a canonical on the ranking page (using yoast plugin) to tell Google that the homepage is the preferred page? Thanks Paul
Technical SEO | | propertyhunter0 -
Tips on How to Optimize a Forum
Hi, I am looking for some good tips and tricks on how to get the most of a vBulletin Forum, we already got vbseo.com which works quite nice for text/titles and many other stuff, but it would be great to know some insights on optimizing this kind of forums or any other forum. thanks!
Technical SEO | | andresgmontero0