Block search engines from URLs created by internal search engine?

Voonie

Hey guys,

I've got a question for you all that I've been pondering for a few days now. I'm currently doing an SEO Technical Audit for a large scale directory.

One major issue that they are having is that their internal search system (Directory Search) will create a new URL everytime a search query is entered by the user. This creates huge amounts of duplication on the website.

I'm wondering if it would be best to block search engines from crawling these URLs entirely with Robots.txt?

What do you guys think? Bearing in mind there are probably thousands of these pages already in the Google index?

Thanks

Kim

Dr-Pete

That sounds perfect - if the user-generated URLs are getting enough traffic, make them permanent pages and 301-redirect or canonical. If not, weed them out of the index.

Voonie

Thanks for your reply Dr. Meyers. I think you're probably right.

Yes I'm recommending they define a canonical set of pages that are the most popular searches, categories and locations which can be reached via internal links and we'll get all those duplicates re-directed back to that canonical set.

But for pages that fall outside those categories and locations, I'll recommend a meta-no-index tag.

Dr-Pete

It can be a complicated question on a very large site, but in most cases I'd META NOINDEX those pages. Robots.txt isn't great at removing content that's already been indexed. Admittedly, NOINDEX will take a while to work (virtually any solution will), as Google probably doesn't crawl these pages very often.

Generally, though, the risk of having your index explode with custom search pages is too high for a site like yours (especially post-Panda). I do think blocking those pages somehow is a good bet.

The only exception I would add is if some of the more popular custom searches are getting traffic and/or links. I assume you have a solid internal link structure and other paths to these listings, but if it looks like a few searches (or a few dozen) have attracted traffic and back-links, you'll want to preserve those somehow.

Voonie

Sure, check below and some of the duplication I mean:

Capitalization Duplication

http://yellow.co.nz/yellow+pages/Car+dealer/Auckland+Region

http://yellow.co.nz/yellow+pages/Car+Dealer/Auckland+Region

With a few URL parameters

http://yellow.co.nz/yellow+pages/Car+Dealer/Auckland+Region?encodedRefinement=refineterms..%3D..%5E%22Car+%26+Truck+Dealers+-+Used.%3A.Makes.%3D.Toyota%22%24..%26..Makes+%28Toyota%29&display=&stageName=Composite+search&suppressMobileListings=false

And with location duplication

http://yellow.co.nz/yellow+pages/Car+Dealer/Auckland

Let me know if you need any more info!

Cheers

Kim

Mark_Jay_Apsey_Jr.

Whats the content look like on the new url? Can you give us an example?

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Block search engines from URLs created by internal search engine?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Should we optimise our internal links?

We 410'ed URLs to decrease URLs submitted and increase crawl rate, but dynamically generated sub URLs from pagination are showing as 404s. Should we 410 these sub URLs?

Internal links and URL shortners

What URL parameter settings in GWT to choose for search results parameter?

Blocking some countries and redirecting that traffic

Why are these m. results showing as blocked?

Google: How to See URLs Blocked by Robots?

Search Refinement URLs