Are there any negative side effects of having millions of URLs on your site?
-
After a site upgrade, we found that we have over 3.7 million URLs on our site. Many of these URLs are due to the facet options. Each facet combination yields a different URL. However, we need to do a deeper analysis into these URLs to see if this is the only reason why so many are returning.
Does anyone know if there are any negatives of having so many URLs crawled, other than the fact that Google only spends so much time crawling a site? Is the number of URLs something that should be concerning?
Any insight appreciated!
-
Agree with the points above with one exception. Yes, you have to find a way to deal with duplicate and quality content at scale. Yes, Robots.txt, nofollow links and index sitemaps are your friends. I would not use rel=canonical unless I had to. Better to get those extra pages de-indexed and then not let Google crawl the urls with the extra parameters to start with. Why waste Google's time in crawling pages that are just resorted versions of another? If you use the directives wisely you probably "only" have 200,000 pages worth crawling if you have that many sort parameters.
Good luck!
-
I'll echo Robert's concern about duplicate content. If those facet combinations are creating many pages with very similar content, that could be an issue for you.
If, let's say, there are 100 facet combinations that create essentially the same basic page content, then consider taking facet elements that do NOT substantially change the page content, and use rel=canonical to tell Google that those are all really the same page. For instance, let's say one of the facets is packaging size, and product X comes in boxes of 1, 10, 100, or 500 units. Let's say another facet is color, and it comes in blue, green, or red. Let's say the URLs for these look like this:
www.mysite.com/product.php?pid=12345&color=blue&pkgsize=1
www.mysite.com/product.php?pid=12345&color=green&pkgsize=10
www.mysite.com/product.php?pid=12345&color=red&pkgsize=100
You would want to set the rel=canonical on all of these to:
www.mysite.com/product.php?pid=12345
Be sure that your XML sitemap, your on-page meta robots, and your rel=canonicals are all in agreement. In other words, if a page has meta robots "noindex,follow", it should NOT show up in your XML sitemap. If the pages above have their rel=canonicals set as described, then your sitemap should contain www.mysite.com/product.php?pid=12345 and NONE of the three example URLs with the color and pkgsize parameters above.
-
There are several concerns to be addressed with this scenario:
- Organization
This is going to be very difficult to keep track of. If you are well-organized or the pages will not need much adjusting, this is probably okay.
- Duplicate Content
This is going to be a pain the behind. That being said, most site auditing tools will allow you to make adjustments as necessary.
- Broken Links
With a site of this size, broken links and 404's are going to be inevitable. This could lead to some negative SEO impacts and will have to be kept on top of.
- Hacking
This is a big reason why some sites have enormous numbers of URLs. This would likely be the biggest concern on my mind and worth looking in to. Going through that many pages will be impossible, so it might be worth taking a look at the link profile and determining where most of your links are coming from. If these are coming from spammy sites, you may have a problem there.
All this being said, the size of a website is normally not a cause for concern. Just make sure that your main pages (Home, Landing Pages) are properly handled and optimized and you shouldn't have too much trouble. I would add that unwieldy htaccess files (large ones) can result in slower loading times, which can impact your rankings with Google.
Let me know if there is anything specific concerning you and I will be happy to help. Congrats on the upgrade and hope it works out!
Rob
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Will using query string in the URL and swapping H1s for filtered view of the blog impact SEO negatively?
This is a blog revamp we are trying to personalize the experience for 2 separate audiences.We are revamping our blog the user starts on the blog that shows all stories (first screen) then can filter to a more specific blog (ESG or News blog). The filtered version for ESG or the News blog is done through a query string in the URL. We also swap out the page’s H1s accordingly in this process, will this impact SEO negatively?
Technical SEO | | lina_digital0 -
How to fix these unwanted URLs?
Right now i have wordpress, one page website, but google also show wp-content. KIndly check below in google. site:http://baltimoreelite.com/ How I can fix this issue?
Technical SEO | | marknorman0 -
Does Google differentiate between a site with spammy link building practices from a victim of a negative SEO attack?
I've be tasked with figuring out how to recover our rankings as we are likely being hurt by an algorithmic penalty. I have no idea if this was the workings of a previously hired SEO or the result of negative SEO, **how does Google differentiate between a site with bad/spammy link building practices from a victim of a negative SEO attack? **
Technical SEO | | Syed_Raza0 -
301 Multiple Sites to Main Site
Over the past couple years I had 3 sites that sold basically the same products and content. I later realized this had no value to my customers or Google so I 301 redirected Site 2 and Site 3 to my main site (Site 1). Of course this pushed a lot of page rank over to Site 1 and the site has been ranking great. About a week ago I moved my main site to a new eCommerce platform which required me to 301 redirect all the url's to the new platform url's which I did for all the main site links (Site 1). During this time I decided it was probably better off if I DID NOT 301 redirect all the links from the other 2 sites as well. I just didn't see the need as I figured Google realized at this point those sites were gone and I started fearing Google would get me for Page Rank munipulation for 301 redirecting 2 whole sites to my main site. Now I am getting over 1,000 404 crawl errors in GWT as Google can no longer find the URL's for Site 2 and Site 3. Plus my rankings have dropped substantially over the past week, part of which I know is from switching platforms. Question, did I make a mistake not 301 redirecting the url's from the old sites (Site 2 and Site 3) to my new ecommerce url's at Site 1?
Technical SEO | | SLINC0 -
No Keyword in URL
SEOMoz (and other platforms) advise that I need to add my keyword to the page URL, however as far as I'm concerned it has been, so why don't these platforms see it. My home page URL is www.salesandinternetmarketing.com, but apparently I haven't added the keyword internet marketing to the URL, what advice can you give me please? Lindsay
Technical SEO | | lindsayjhopkins1 -
Cross links between sites
hi, We have several ecommerce sites and we cross linked 3 of them by mistake. We realize that the sites were linked through WMT, We have shut down 2 of the sites about 2 months ago, but WMT still shows the links coming from those 2 sites. how do we make sure that google will see the sites are shut down. Is there a better of way resolving this issue. We are no longer using those sites, so do not need them to be active. whats the best solution to show google that the links are no longer there. Crawler shows that it was able to crawl the site 45 days after it is shut down. thanks nick
Technical SEO | | orion680 -
Keywords in Vanity URL
If I set up a vanity URL that just 301's to the main site, do the search engines look at the keywords in the vanity URL when determing how to rank the site. For example, if I set up a vanity URL of www.coolnewtechgear.com, and redirect it to www.company.com/products/, would the search engines view the keywords of cool, new, tech, and gear and associate that with the page it's getting redirected to? Or does it ignore the vanity URL and only look at the content of the page itself?
Technical SEO | | ryanwats0