Are there any negative side effects of having millions of URLs on your site?
-
After a site upgrade, we found that we have over 3.7 million URLs on our site. Many of these URLs are due to the facet options. Each facet combination yields a different URL. However, we need to do a deeper analysis into these URLs to see if this is the only reason why so many are returning.
Does anyone know if there are any negatives of having so many URLs crawled, other than the fact that Google only spends so much time crawling a site? Is the number of URLs something that should be concerning?
Any insight appreciated!
-
Agree with the points above with one exception. Yes, you have to find a way to deal with duplicate and quality content at scale. Yes, Robots.txt, nofollow links and index sitemaps are your friends. I would not use rel=canonical unless I had to. Better to get those extra pages de-indexed and then not let Google crawl the urls with the extra parameters to start with. Why waste Google's time in crawling pages that are just resorted versions of another? If you use the directives wisely you probably "only" have 200,000 pages worth crawling if you have that many sort parameters.
Good luck!
-
I'll echo Robert's concern about duplicate content. If those facet combinations are creating many pages with very similar content, that could be an issue for you.
If, let's say, there are 100 facet combinations that create essentially the same basic page content, then consider taking facet elements that do NOT substantially change the page content, and use rel=canonical to tell Google that those are all really the same page. For instance, let's say one of the facets is packaging size, and product X comes in boxes of 1, 10, 100, or 500 units. Let's say another facet is color, and it comes in blue, green, or red. Let's say the URLs for these look like this:
www.mysite.com/product.php?pid=12345&color=blue&pkgsize=1
www.mysite.com/product.php?pid=12345&color=green&pkgsize=10
www.mysite.com/product.php?pid=12345&color=red&pkgsize=100
You would want to set the rel=canonical on all of these to:
www.mysite.com/product.php?pid=12345
Be sure that your XML sitemap, your on-page meta robots, and your rel=canonicals are all in agreement. In other words, if a page has meta robots "noindex,follow", it should NOT show up in your XML sitemap. If the pages above have their rel=canonicals set as described, then your sitemap should contain www.mysite.com/product.php?pid=12345 and NONE of the three example URLs with the color and pkgsize parameters above.
-
There are several concerns to be addressed with this scenario:
- Organization
This is going to be very difficult to keep track of. If you are well-organized or the pages will not need much adjusting, this is probably okay.
- Duplicate Content
This is going to be a pain the behind. That being said, most site auditing tools will allow you to make adjustments as necessary.
- Broken Links
With a site of this size, broken links and 404's are going to be inevitable. This could lead to some negative SEO impacts and will have to be kept on top of.
- Hacking
This is a big reason why some sites have enormous numbers of URLs. This would likely be the biggest concern on my mind and worth looking in to. Going through that many pages will be impossible, so it might be worth taking a look at the link profile and determining where most of your links are coming from. If these are coming from spammy sites, you may have a problem there.
All this being said, the size of a website is normally not a cause for concern. Just make sure that your main pages (Home, Landing Pages) are properly handled and optimized and you shouldn't have too much trouble. I would add that unwieldy htaccess files (large ones) can result in slower loading times, which can impact your rankings with Google.
Let me know if there is anything specific concerning you and I will be happy to help. Congrats on the upgrade and hope it works out!
Rob
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Structure On Site - Currently it's domain/product-name NOT domain/category/product name is this bad?
I have a eCommerce site and the site structure is domain/product-name rather than domain/product-category/product-name Do you think this will have a negative impact SEO Wise? I have seen that some of my individual product pages do get better rankings than my categories.
Technical SEO | | the-gate-films0 -
URL Structure
I'm going through the process of redesigning our website, and the URL structure was brought up. We currently have our URLs structured as domain.com/keyword. It seems that some people think setting your URLs up to look like: domain.com/directory/keyword makes more sense from a user's perspective, and from a search engine's perspective. With our directories labeled as services, solutions, clients - I see no value in adding directories as it dilutes the keyword and brings the keyword further away from the domain. Are there situations where adding a directory before the page in the URL makes sense? If anyone has data showing the difference between the two that'd be great! Thanks, Brian
Technical SEO | | PrasoonGoel0 -
Site-wide Links
Hey y'all, I know this question has been asked many times before but I wanted to see what your stance was on this particular case. The organisation I work for is a group of 12 companies - each with its own website. On some of the sites we have a link to the other sites within the group on every single page of that site. Our organic search traffic has dropped a bit but not significantly and we haven't received any manual penalties from Google. It's also worth mentioning that the referral traffic for these sites from the other sites I control is quite good and the bounce rate is extremely low. If you were in my shoes would you remove the links, put a nofollow tag on the links or leave the links as they are? Thanks guys 🙂
Technical SEO | | AAttias0 -
URL Structure Question
We are building a job board website that will have a decent amount of "career resources" type content and want to make sure we set up our url structure correctly. After researching on Google and here I have an idea how to structure it but would like some insight if we are on the right track. We are using Wordpress for the content part of our website. We will have about 5 content categories (like resume-tips, job-interviews, job-search etc.) The two options we are considering; www.domain.com/career-resources/index.html As content start page www.domain.com/career-resources/resume-tips/index.html category start page www.domain.com/career-resources/resume-tips/top-5-resume-mistakes.html article name is the /career-resources/ folder really needed or can we go something like; www.domain.com/career-resources/index.html As content start page www.domain.com/resume-tips/index.html category start page www.domain.com/resume-tips/top-5-resume-mistakes.html article name Are we on the right track... and is one way better for SEO that the other? Thanks! Shaun
Technical SEO | | aactive0 -
Can we use our existing site content on new site?
We added 1000s of pages unique content on our site and soon after google release penguin and we loose our ranking for major keywords and after months of efforts we decided to start a new site. If we use all the existing site content on new domain does google going to penalized the site for duplicate content or it will be treated as unique? Thanks
Technical SEO | | mozfreak0 -
Regarding Canonical Url
We have a e-commerce website. Our own homegrown:-) We recently visited Google Webmaster tools and could see that Google mention we have double Meta tags for some main and subcategories. Each Product Category on our site have a subcategory/ Sub url - "Bestseller", "On Sale", "just arrived". The sub url is not a really a real category and we can therefore not make totally unique description and title for does urls. domain.com/category domain.com/category/bestseller
Technical SEO | | areygie
domain.com/category/on-sale
domain.com/category/just-arrived We are thinking about 2 solutions. 1. Canonical Url on subcategory pointing to main category.
2. Or add a word bestseller, on sale or just arrived in front of the meta title/description. We can do this from code. I personally opt for option 1. But I am little unsure what is the best way to go. Thanks in advance for your advice0 -
Site command
How reliable is site command? Is there any other way to check indexed pages.
Technical SEO | | gmk15670