Crawl Budget and Faceted Navigation

Webpresence

Hi, we have an ecommerce website with facetted navigation for the various options available.

Google has 3.4 million webpages indexed. Many of which are over 90% duplicates.

Due to the low domain authority (15/100) Google is only crawling around 4,500 webpages per day, which we would like to improve/increase.

We know, in order not to waste crawl budget we should use the robots.txt to disallow parameter URL’s (i.e. ?option=, ?search= etc..). This makes sense as it would resolve many of the duplicate content issues and force Google to only crawl the main category, product pages etc.

However, having looked at the Google Search Console these pages are getting a significant amount of organic traffic on a monthly basis.

Is it worth disallowing these parameter URL’s in robots.txt, and hoping that this solves our crawl budget issues, thus helping to index and rank the most important webpages in less time.

Or is there a better solution?

Many thanks in advance.

Lee.

jcnotfound2083

Hello, I have also been in a similar situation. What I did was to disallow the urls with parameters using the robots.txt and place (in only the pages with parameters) the following two html tags:

This will expressly indicate to google not to index these pages. I still have some errors but I guess they will disappear in a few months.

Regards

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Crawl Budget and Faceted Navigation

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Question regarding Site and URL structure + Faceted Navigation (Endeca)

Improving Crawl Efficieny

Can Google Crawl AJAX filters?

List of Search Engines subscribing to the ajax crawling scheme?

After Receiving a "Googlebot can't access your site" would this stop your site from being crawled?

Faceted Navigation and Dupe Content

Page Crawling Check after Modification Done without staying 7 days

Old pages still crawled by SE returning 404s. Better to put 301 or block with robots.txt ?