Similar pages: noindex or rel:canonical or disregard parameters?!

Philoups

Hey all!

We have a hotel booking website that has search results pages per destinations (e.g. hotels in NYC is dayguest.com/nyc). Pages are also generated for destinations depending on various parameters, that can be star rating, amenities, style of the properties, etc. (e.g. dayguest.com/nyc/4stars, dayguest.com/nyc/luggagestorage, dayguest.com/nyc/luxury, etc.).

In general, all of these pages are very similar, as for example, there might be 10 hotels in NYC and all of them will offer luggage storage. Pages can be nearly identical. Come the problems of duplicate content and loss of juice by dilution.

I was wondering what was the best practice in such a situation: should I just put all pages except the most important ones (e.g. dayguest.com/nyc) as noindex? Or set it as canonical page for all variations? Or in google webmaster tool ask google to disregard the URLs for various parameters? Or do something else altogether?!

Thanks for the help!

Dr-Pete

Sorry, I don't think I explained (1) very well. What I mean is that you may want to gradually change the site architecture so that not all of the search options are crawlable pages. This could mean putting some filters in form variables, for example (instead of links). It could also mean making sure that certain paths always converge. There's no easy solution. This is a problem all big sites face, and it's very dependent on the platform/CMS.

With (2), a "level" could be anything. Maybe there are major cities you need to cover but everything else could stay out of the index. This really depends on your information architecture, but there's always something that's high priority and something that's low priority. If you can focus Google on the high-priority pages, it can definitely work in your favor. The trick is figuring out how to build the logic such that you can code that dynamically. I've found there's almost always an answer, but it can take some creative thinking. I definitely don't encourage doing it manually.

If the results are easy to group by city and you can code that logic, the canonical may be fine. Since the search results could be different in some cases, canonical isn't technically the best choice, but it does often work. It really depends on how different they can be, so it's a bit tricky.

Philoups

Honestly, option 1 would be a nightmare. Imagine that we add one property in a city not covered. There are about 50 amenities, and most hotels feature most, so as much new pages generated. That would become quickly unmanageable, to handle manually.

Not sure I understand your second option. There are not several "level", only one under the "city" in which the property is. But mutliplied by several cities, they quickly become hundreds, if not thousands.

Why would it not be possible/desirable to code all such pages as canonical pages of each city?

Dr-Pete

Ugh - that's what I was afraid you'd say. Unfortunately, the coincidental problem can't really be easily solved with code, which makes it hard to use canonical tags. There's no good way to tell the site when to use them.

So, a couple of options:

(1) Try to gradually rework the structure so that there are less of these paths.

(2) Consider using META NOINDEX on some lower-value paths. Internal search results don't have great value for Google, so you could let the major categories/options be indexed, but the cut off a certain level (index nothing "below" it). That may be more feasible from a code standpoint.

(3) Use rel=prev/next, use unique TITLEs if possible (based on the query) and just clean things up the best you can, but leave everything indexed.

It depends a lot on your scope, structure, and your future plans. I'm not sure there's one "right" answer.

Dr-Pete

Ugh - that's what I was afraid you'd say. Unfortunately, the coincidental problem can't really be easily solved with code, which makes it hard to use canonical tags. There's no good way to tell the site when to use them.

So, a couple of options:

(1) Try to gradually rework the structure so that there are less of these paths.

(2) Consider using META NOINDEX on some lower-value paths. Internal search results don't have great value for Google, so you could let the major categories/options be indexed, but the cut off a certain level (index nothing "below" it). That may be more feasible from a code standpoint.

(3) Use rel=prev/next, use unique TITLEs if possible (based on the query) and just clean things up the best you can, but leave everything indexed.

It depends a lot on your scope, structure, and your future plans. I'm not sure there's one "right" answer.

Philoups

These pages return the same results coincidentally, that's the issue... The more properties we get on board, the less likely it is that these pages will be similar. But it might take a long time to build that up, and we may never achieve it.

Dr-Pete

Ah, got it - yeah, I think rel=canonical would be fine there, but I'd want to understand your architecture better. Are these pages returning the same results coincidentally, or are these two URLs that basically land on the same combination of search options/filters. If it's the former, it's a lot tougher, because that's just a coincidence happening at large scale. If it's the latter, a solid canonical scheme could help a lot, but I'd also explore whether these paths are useful (or should be indexed at all). In other words, in the long term, it might be better to use one URL consistently, even if people navigate by different paths to reach it.

Philoups

That's odd, they were supposed to be the same. And yeah, results come and go as properties are added/removed from our inventory.

The following is what I wanted to highlight:

http://www.dayguest.com/rome-dayuse/concierge

http://www.dayguest.com/rome-dayuse/air-conditioning

As you can see, the pages are identical, except that one has 5 properties and the other one has 6. Most overlap. There are so manies property "features" or "category", that some list have exactly the same list. Actually, SEOMOZ find that I have over 1700 pages with duplicate content, most being search results page with closely similar contents such as these.

Hence my issue...

Dr-Pete

Are they duplicates in the sense that there are currently no results? I wouldn't generally use rel=canonical on these, because the search results should (theoretically) be different. These are distinct regions and, I assume, have unique properties.

If they're just returning no results, I'd actually consider a META NOINDEX until there are results available. Otherwise, this is likely to be treated as a soft 404 by Google (not a disaster, honestly). It depends on whether results come and go or if you're just building out the site and there will be data later. If the data isn't ready, I think META NOINDEX is a good way to go. Until results are available, these pages have no search value.

Philoups

Well, let me give you an example, look at this page: http://www.dayguest.com/milan-city-centre-dayuse?amenities=10

And this page: http://www.dayguest.com/milan-central-station-dayuse?amenities=10

Do you see what I'm talking about? The pages are identical but for the page title/description & a few words on the page.

So, you'd go for canonical?

Philoups

The relation is more hierarchal then next/previous. Judging from the post you mentioned, canonical would be more appropriate...

Dr-Pete

Sorry, I'm not clear on whether these are paginated search results or actual property pages that vary only by a small amount. As @SEO5 said, if these are paginated search results, you could use rel=prev/next. It's a bit tricky to set up with search filters (you need rel=prev/next + rel=canonical).

If these are nearly identical property pages, then it depends on how they differ. If they only differ by one attribute, I'd probably lean toward the canonical tag.

SEO5Team

You could use the pagination attribute on those pages to indicate a relationship between the different URL's.

Link to Google's official explanation on pagination here

Good post by Dr. Pete on this here . Interesting info in the comments of this post as well.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Similar pages: noindex or rel:canonical or disregard parameters?!

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Blog archive pages are meta noindexed but still flagged as duplicate

Why are my 301 redirects and duplicate pages (with canonicals) still showing up as duplicates in Webmaster Tools?

Pages with Duplicate Page Content Crawl Diagnostics

Effect of 302 redirects from empty parent page to sub page

I know I'm missing pages with my page level 301 re-directs. What can I do?

Rel=Canonical, WWW vs non WWW and SEO

Q Parameters

Duplicate Page Content and Title for product pages. Is there a way to fix it?