Similar pages: noindex or rel:canonical or disregard parameters?!
-
Hey all!
We have a hotel booking website that has search results pages per destinations (e.g. hotels in NYC is dayguest.com/nyc). Pages are also generated for destinations depending on various parameters, that can be star rating, amenities, style of the properties, etc. (e.g. dayguest.com/nyc/4stars, dayguest.com/nyc/luggagestorage, dayguest.com/nyc/luxury, etc.).
In general, all of these pages are very similar, as for example, there might be 10 hotels in NYC and all of them will offer luggage storage. Pages can be nearly identical. Come the problems of duplicate content and loss of juice by dilution.
I was wondering what was the best practice in such a situation: should I just put all pages except the most important ones (e.g. dayguest.com/nyc) as noindex? Or set it as canonical page for all variations? Or in google webmaster tool ask google to disregard the URLs for various parameters? Or do something else altogether?!
Thanks for the help!
-
Sorry, I don't think I explained (1) very well. What I mean is that you may want to gradually change the site architecture so that not all of the search options are crawlable pages. This could mean putting some filters in form variables, for example (instead of links). It could also mean making sure that certain paths always converge. There's no easy solution. This is a problem all big sites face, and it's very dependent on the platform/CMS.
With (2), a "level" could be anything. Maybe there are major cities you need to cover but everything else could stay out of the index. This really depends on your information architecture, but there's always something that's high priority and something that's low priority. If you can focus Google on the high-priority pages, it can definitely work in your favor. The trick is figuring out how to build the logic such that you can code that dynamically. I've found there's almost always an answer, but it can take some creative thinking. I definitely don't encourage doing it manually.
If the results are easy to group by city and you can code that logic, the canonical may be fine. Since the search results could be different in some cases, canonical isn't technically the best choice, but it does often work. It really depends on how different they can be, so it's a bit tricky.
-
Honestly, option 1 would be a nightmare. Imagine that we add one property in a city not covered. There are about 50 amenities, and most hotels feature most, so as much new pages generated. That would become quickly unmanageable, to handle manually.
Not sure I understand your second option. There are not several "level", only one under the "city" in which the property is. But mutliplied by several cities, they quickly become hundreds, if not thousands.
Why would it not be possible/desirable to code all such pages as canonical pages of each city?
-
Ugh - that's what I was afraid you'd say. Unfortunately, the coincidental problem can't really be easily solved with code, which makes it hard to use canonical tags. There's no good way to tell the site when to use them.
So, a couple of options:
(1) Try to gradually rework the structure so that there are less of these paths.
(2) Consider using META NOINDEX on some lower-value paths. Internal search results don't have great value for Google, so you could let the major categories/options be indexed, but the cut off a certain level (index nothing "below" it). That may be more feasible from a code standpoint.
(3) Use rel=prev/next, use unique TITLEs if possible (based on the query) and just clean things up the best you can, but leave everything indexed.
It depends a lot on your scope, structure, and your future plans. I'm not sure there's one "right" answer.
-
Ugh - that's what I was afraid you'd say. Unfortunately, the coincidental problem can't really be easily solved with code, which makes it hard to use canonical tags. There's no good way to tell the site when to use them.
So, a couple of options:
(1) Try to gradually rework the structure so that there are less of these paths.
(2) Consider using META NOINDEX on some lower-value paths. Internal search results don't have great value for Google, so you could let the major categories/options be indexed, but the cut off a certain level (index nothing "below" it). That may be more feasible from a code standpoint.
(3) Use rel=prev/next, use unique TITLEs if possible (based on the query) and just clean things up the best you can, but leave everything indexed.
It depends a lot on your scope, structure, and your future plans. I'm not sure there's one "right" answer.
-
These pages return the same results coincidentally, that's the issue... The more properties we get on board, the less likely it is that these pages will be similar. But it might take a long time to build that up, and we may never achieve it.
-
Ah, got it - yeah, I think rel=canonical would be fine there, but I'd want to understand your architecture better. Are these pages returning the same results coincidentally, or are these two URLs that basically land on the same combination of search options/filters. If it's the former, it's a lot tougher, because that's just a coincidence happening at large scale. If it's the latter, a solid canonical scheme could help a lot, but I'd also explore whether these paths are useful (or should be indexed at all). In other words, in the long term, it might be better to use one URL consistently, even if people navigate by different paths to reach it.
-
That's odd, they were supposed to be the same. And yeah, results come and go as properties are added/removed from our inventory.
The following is what I wanted to highlight:
http://www.dayguest.com/rome-dayuse/concierge
http://www.dayguest.com/rome-dayuse/air-conditioning
As you can see, the pages are identical, except that one has 5 properties and the other one has 6. Most overlap. There are so manies property "features" or "category", that some list have exactly the same list. Actually, SEOMOZ find that I have over 1700 pages with duplicate content, most being search results page with closely similar contents such as these.
Hence my issue...
-
Are they duplicates in the sense that there are currently no results? I wouldn't generally use rel=canonical on these, because the search results should (theoretically) be different. These are distinct regions and, I assume, have unique properties.
If they're just returning no results, I'd actually consider a META NOINDEX until there are results available. Otherwise, this is likely to be treated as a soft 404 by Google (not a disaster, honestly). It depends on whether results come and go or if you're just building out the site and there will be data later. If the data isn't ready, I think META NOINDEX is a good way to go. Until results are available, these pages have no search value.
-
Well, let me give you an example, look at this page: http://www.dayguest.com/milan-city-centre-dayuse?amenities=10
And this page: http://www.dayguest.com/milan-central-station-dayuse?amenities=10
Do you see what I'm talking about? The pages are identical but for the page title/description & a few words on the page.
So, you'd go for canonical?
-
The relation is more hierarchal then next/previous. Judging from the post you mentioned, canonical would be more appropriate...
-
Sorry, I'm not clear on whether these are paginated search results or actual property pages that vary only by a small amount. As @SEO5 said, if these are paginated search results, you could use rel=prev/next. It's a bit tricky to set up with search filters (you need rel=prev/next + rel=canonical).
If these are nearly identical property pages, then it depends on how they differ. If they only differ by one attribute, I'd probably lean toward the canonical tag.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blog archive pages are meta noindexed but still flagged as duplicate
Hi all. I know there several threads related to noindexing blog archives and category pages, so if this has already been answered, please direct me to that post. My blog archive pages have preview text from the posts. Each time I post a blog, the last post on any given archive page shifts to the first spot on the next archive page. Moz seems to report these as new duplicate content issues each week. I have my archive pages set to meta noindex, so can I feel good about continuing to ignore these duplicate content issues, or is there something else I should be doing to prevent penalties? TIA!
Technical SEO | | mkupfer1 -
Why are my 301 redirects and duplicate pages (with canonicals) still showing up as duplicates in Webmaster Tools?
My guess is that in time Google will realize that my duplicate content is not actually duplicate content, but in the meantime I'd like to get your guys feedback. The reporting in Webmaster Tools looks something like this. Duplicates /url1.html /url2.html /url3.html /category/product/url.html /category2/product/url.html url3.html is the true canonical page in the list above._ url1.html,_ and url2.html are old URLs that 301 to url3.html. So, it seems my bases are covered there. _/category/product/url.html _and _/category2/product/url.html _ do not redirect. They are the same page as url3.html. Each of the category URLs has a canonical URL of url3.html in the header. So, it seems my bases are covered there as well. Can I expect Google to pick up on this? Why wouldn't it understand this already?
Technical SEO | | bearpaw0 -
Pages with Duplicate Page Content Crawl Diagnostics
I have Pages with Duplicate Page Content in my Crawl Diagnostics Tell Me How Can I solve it Or Suggest Me Some Helpful Tools. Thanks
Technical SEO | | nomyhot0 -
Effect of 302 redirects from empty parent page to sub page
A client's website has links to their service pages which then redirect (302 through a php "Location:" header) to that service's first sub-page. For example, our-services/service-x redirects to our-services/service-x/about-service-x I can only think this has been done because there is no actual content for the parent page and to maintain some kind of structure for navigation and URLs. Really there's no reason why the 'about-service-x' page can't be removed and its content transferred to the main 'service-x' page. Then the redirects can be removed also - it's not how a 302 should be used for a start. I'm just wondering what kind of effect this current redirection has on SEO, as I know 302s don't pass any link juice? Thanks for your help.
Technical SEO | | driftingbass0 -
I know I'm missing pages with my page level 301 re-directs. What can I do?
I am implementing page level re-directs for a large site but I know that I will inevitably miss some pages. Is there an additional safety net root level re-direct that I can use to catch these pages and send them to the homepage?
Technical SEO | | VMLYRDiscoverability0 -
Rel=Canonical, WWW vs non WWW and SEO
Okay so I'm a bit of a loss here. For what ever reason just about every single Wordpress site I has will turn www.mysite.com into mysite.com in the browser bar. I assume this is the rel=canonical tag at work, there are no 301s on my site. When I use the Open Site Explorer and type in www.mysite.com it shows a domain authority of around 40 and a few hundred backlinks... and then I get the message. Oh Hey! It looks like that URL redirects to XXXXXX. Would you like to see data for <a class="clickable redirects">that URL instead</a>? So if I click to see this data instead I have less than half of that domain authority and about 2 backlinks. *** Does this make a difference SEO wise? Should my non WWW be redirecting to my WWW instead because that's where the domain authority and backlinks are? Why am I getting two different domain authority and backlink counts if they are essentially the same? Or am I wrong and all that link juice and authority passes just the same?
Technical SEO | | twilightofidols0 -
Q Parameters
I'm having several site issues and I want to see if the Q parameter in the URL is the issue. Both of these index. Any capitalization combination brings up another indexed page: http://www.website.com/index.php?q=contact-us. and http://www.website.com/index.php?q=cOntact-us The other issue is Google crawl errors. The website has received increasingly more spam crawl errors. I've read that this is a common issue and most likely is a Google Bot problem. Would removing the q parameter fix this entirely? Here is an example: http://www.website/index.php?q=uk-cheap-chloe-bay-bag-wholesale-shoes
Technical SEO | | DanSpeicher0 -
Duplicate Page Content and Title for product pages. Is there a way to fix it?
We we're doing pretty good with our SEO, until we added product listing pages. The errors are mostly Duplicate Page Content/Title. e.g. Title: Masterpet | New Zealand Products MasterPet Product page1 MasterPet Product page2 Because the list of products are displayed on several pages, the crawler detects that these two URLs have the same title. From 0 Errors two weeks ago, to 14k+ errors. Is this something we could fix or bother fixing? Will our SERP ranking suffer because of this? Hoping someone could shed some light on this issue. Thanks.
Technical SEO | | Peter.Huxley590