Similar pages: noindex or rel:canonical or disregard parameters?!
-
Hey all!
We have a hotel booking website that has search results pages per destinations (e.g. hotels in NYC is dayguest.com/nyc). Pages are also generated for destinations depending on various parameters, that can be star rating, amenities, style of the properties, etc. (e.g. dayguest.com/nyc/4stars, dayguest.com/nyc/luggagestorage, dayguest.com/nyc/luxury, etc.).
In general, all of these pages are very similar, as for example, there might be 10 hotels in NYC and all of them will offer luggage storage. Pages can be nearly identical. Come the problems of duplicate content and loss of juice by dilution.
I was wondering what was the best practice in such a situation: should I just put all pages except the most important ones (e.g. dayguest.com/nyc) as noindex? Or set it as canonical page for all variations? Or in google webmaster tool ask google to disregard the URLs for various parameters? Or do something else altogether?!
Thanks for the help!
-
Sorry, I don't think I explained (1) very well. What I mean is that you may want to gradually change the site architecture so that not all of the search options are crawlable pages. This could mean putting some filters in form variables, for example (instead of links). It could also mean making sure that certain paths always converge. There's no easy solution. This is a problem all big sites face, and it's very dependent on the platform/CMS.
With (2), a "level" could be anything. Maybe there are major cities you need to cover but everything else could stay out of the index. This really depends on your information architecture, but there's always something that's high priority and something that's low priority. If you can focus Google on the high-priority pages, it can definitely work in your favor. The trick is figuring out how to build the logic such that you can code that dynamically. I've found there's almost always an answer, but it can take some creative thinking. I definitely don't encourage doing it manually.
If the results are easy to group by city and you can code that logic, the canonical may be fine. Since the search results could be different in some cases, canonical isn't technically the best choice, but it does often work. It really depends on how different they can be, so it's a bit tricky.
-
Honestly, option 1 would be a nightmare. Imagine that we add one property in a city not covered. There are about 50 amenities, and most hotels feature most, so as much new pages generated. That would become quickly unmanageable, to handle manually.
Not sure I understand your second option. There are not several "level", only one under the "city" in which the property is. But mutliplied by several cities, they quickly become hundreds, if not thousands.
Why would it not be possible/desirable to code all such pages as canonical pages of each city?
-
Ugh - that's what I was afraid you'd say. Unfortunately, the coincidental problem can't really be easily solved with code, which makes it hard to use canonical tags. There's no good way to tell the site when to use them.
So, a couple of options:
(1) Try to gradually rework the structure so that there are less of these paths.
(2) Consider using META NOINDEX on some lower-value paths. Internal search results don't have great value for Google, so you could let the major categories/options be indexed, but the cut off a certain level (index nothing "below" it). That may be more feasible from a code standpoint.
(3) Use rel=prev/next, use unique TITLEs if possible (based on the query) and just clean things up the best you can, but leave everything indexed.
It depends a lot on your scope, structure, and your future plans. I'm not sure there's one "right" answer.
-
Ugh - that's what I was afraid you'd say. Unfortunately, the coincidental problem can't really be easily solved with code, which makes it hard to use canonical tags. There's no good way to tell the site when to use them.
So, a couple of options:
(1) Try to gradually rework the structure so that there are less of these paths.
(2) Consider using META NOINDEX on some lower-value paths. Internal search results don't have great value for Google, so you could let the major categories/options be indexed, but the cut off a certain level (index nothing "below" it). That may be more feasible from a code standpoint.
(3) Use rel=prev/next, use unique TITLEs if possible (based on the query) and just clean things up the best you can, but leave everything indexed.
It depends a lot on your scope, structure, and your future plans. I'm not sure there's one "right" answer.
-
These pages return the same results coincidentally, that's the issue... The more properties we get on board, the less likely it is that these pages will be similar. But it might take a long time to build that up, and we may never achieve it.
-
Ah, got it - yeah, I think rel=canonical would be fine there, but I'd want to understand your architecture better. Are these pages returning the same results coincidentally, or are these two URLs that basically land on the same combination of search options/filters. If it's the former, it's a lot tougher, because that's just a coincidence happening at large scale. If it's the latter, a solid canonical scheme could help a lot, but I'd also explore whether these paths are useful (or should be indexed at all). In other words, in the long term, it might be better to use one URL consistently, even if people navigate by different paths to reach it.
-
That's odd, they were supposed to be the same. And yeah, results come and go as properties are added/removed from our inventory.
The following is what I wanted to highlight:
http://www.dayguest.com/rome-dayuse/concierge
http://www.dayguest.com/rome-dayuse/air-conditioning
As you can see, the pages are identical, except that one has 5 properties and the other one has 6. Most overlap. There are so manies property "features" or "category", that some list have exactly the same list. Actually, SEOMOZ find that I have over 1700 pages with duplicate content, most being search results page with closely similar contents such as these.
Hence my issue...
-
Are they duplicates in the sense that there are currently no results? I wouldn't generally use rel=canonical on these, because the search results should (theoretically) be different. These are distinct regions and, I assume, have unique properties.
If they're just returning no results, I'd actually consider a META NOINDEX until there are results available. Otherwise, this is likely to be treated as a soft 404 by Google (not a disaster, honestly). It depends on whether results come and go or if you're just building out the site and there will be data later. If the data isn't ready, I think META NOINDEX is a good way to go. Until results are available, these pages have no search value.
-
Well, let me give you an example, look at this page: http://www.dayguest.com/milan-city-centre-dayuse?amenities=10
And this page: http://www.dayguest.com/milan-central-station-dayuse?amenities=10
Do you see what I'm talking about? The pages are identical but for the page title/description & a few words on the page.
So, you'd go for canonical?
-
The relation is more hierarchal then next/previous. Judging from the post you mentioned, canonical would be more appropriate...
-
Sorry, I'm not clear on whether these are paginated search results or actual property pages that vary only by a small amount. As @SEO5 said, if these are paginated search results, you could use rel=prev/next. It's a bit tricky to set up with search filters (you need rel=prev/next + rel=canonical).
If these are nearly identical property pages, then it depends on how they differ. If they only differ by one attribute, I'd probably lean toward the canonical tag.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pages with Duplicate Page Content Crawl Diagnostics
I have Pages with Duplicate Page Content in my Crawl Diagnostics Tell Me How Can I solve it Or Suggest Me Some Helpful Tools. Thanks
Technical SEO | | nomyhot0 -
GWT Malware notification for meta noindex'ed pages ?
I was wondering if GWT will send me Malware notification for pages that are tagged with meta noindex ? EG: I have a site with pages like example.com/indexed/content-1.html
Technical SEO | | Saijo.George
example.com/indexed/content-2.html
example.com/indexed/content-3.html
....
example.com/not-indexed/content-1.html
example.com/not-indexed/content-2.html
example.com/not-indexed/content-3.html
.... Here all the pages like the ones below, are tagged with meta noindex and does not show up in search.
example.com/not-indexed/content-1.html
example.com/not-indexed/content-2.html
example.com/not-indexed/content-3.html Now one fine day example.com/not-indexed/content-2.html page on the site gets hacked and starts to serve malware, none of the other pages are affected .. Will GWT send me a warning for this ? What if the pages are blocked by Robots.txt instead of meta noindex ? Regard
Saijo UPDATE hope this helps someone else : https://plus.google.com/u/0/109548904802332365989/posts/4m17sUtPyUS0 -
ECommerce Problem with canonicol , rel next , rel prev
Hi I was wondering if anyone willing to share your experience on implementing pagination and canonical when it comes to multiple sort options . Lets look at an example I have a site example.com ( i share the ownership with the rest of the world on that one 😉 ) and I sell stuff on the site example.com/for-sale/stuff1 example.com/for-sale/stuff2 example.com/for-sale/stuff3 etc I allow users to sort it by date_added, price, a-z, z-a, umph-value, and so on . So now we have example.com/for-sale/stuff1?sortby=date_added example.com/for-sale/stuff1?sortby=price example.com/for-sale/stuff1?sortby=a-z example.com/for-sale/stuff1?sortby=z-a example.com/for-sale/stuff1?sortby=umph-value etc example.com/for-sale/stuff1 **has the same result as **example.com/for-sale/stuff1?sortby=date_added ( that is the default sort option ) similarly for stuff2, stuff3 and so on. I cant 301 these because these are relevant for users who come in to buy from the site. I can add a view all page and rel canonical to that but let us assume its not technically possible for the site and there are tens of thousands of items in each of the for-sale pages. So I split it up in to pages of x numbers and let us assume we have 50 pages to sort through. example.com/for-sale/stuff1?sortby=date_added&page=2 to ...page=50 example.com/for-sale/stuff1?sortby=price&page=2 to ...page=50 example.com/for-sale/stuff1?sortby=a-z&page=2 to ...page=50 example.com/for-sale/stuff1?sortby=z-a&page=2 to ...page=50 example.com/for-sale/stuff1?sortby=umph-value&page=2 to ...page=50 etc This is where the shit hits the fan. So now if I want to avoid duplicate issue and when it comes to page 30 of stuff1 sorted by date do I add rel canonical = example.com/for-sale/stuff1 rel next = example.com/for-sale/stuff1?sortby=date_added&page=31 rel prev = example.com/for-sale/stuff1?sortby=date_added&page=29 or rel canonical = example.com/for-sale/stuff1?sortby=date_added rel next = example.com/for-sale/stuff1?sortby=date_added&page=31 rel prev = example.com/for-sale/stuff1?sortby=date_added&page=29 or rel canonical = example.com/for-sale/stuff1 rel next = example.com/for-sale/stuff1?page=31 rel prev = example.com/for-sale/stuff1?page=29 or rel canonical = example.com/for-sale/stuff1?page=30 rel next = example.com/for-sale/stuff1?sortby=date_added&page=31 rel prev = example.com/for-sale/stuff1?sortby=date_added&page=29 or rel canonical = example.com/for-sale/stuff1?page=30 rel next = example.com/for-sale/stuff1?page=31 rel prev = example.com/for-sale/stuff1?page=29 None of this feels right to me . I am thinking of using GWT to ask G-bot not to crawl any of the sort parameters ( date_added, price, a-z, z-a, umph-value, and so on ) and use rel canonical = example.com/for-sale/stuff1?sortby=date_added&page=30 rel next = example.com/for-sale/stuff1?sortby=date_added&page=31 rel prev = example.com/for-sale/stuff1?sortby=date_added&page=29 My doubts about this is that , will the link value that goes in to the pages with parameters be consolidated when I choose to ignore them via URL Parameters in GWT ? what do you guys think ?
Technical SEO | | Saijo.George0 -
Best way to handle pages with iframes that I don't want indexed? Noindex in the header?
I am doing a bit of SEO work for a friend, and the situation is the following: The site is a place to discuss articles on the web. When clicking on a link that has been posted, it sends the user to a URL on the main site that is URL.com/article/view. This page has a large iframe that contains the article itself, and a small bar at the top containing the article with various links to get back to the original site. I'd like to make sure that the comment pages (URL.com/article) are indexed instead of all of the URL.com/article/view pages, which won't really do much for SEO. However, all of these pages are indexed. What would be the best approach to make sure the iframe pages aren't indexed? My intuition is to just have a "noindex" in the header of those pages, and just make sure that the conversation pages themselves are properly linked throughout the site, so that they get indexed properly. Does this seem right? Thanks for the help...
Technical SEO | | jim_shook0 -
How can I change the page title "two" (artigos/page/2.html) in each category ?
I have some categories and photo galleries that have more than one page (i.e.: http://www.buffetdomicilio.com/category/artigos and http://www.buffetdomicilio.com/category/artigos/page/2). I think that I must change the tittle and description, but I don't how. I would like to know how can I change the title of each of them without stay with duplicate title and description. Thank you! ahcAORR.jpg
Technical SEO | | otimizador20130 -
Have a client that migrated their site; went live with noindex/nofollow and for last two SEOMoz crawls only getting one page crawled. In contrast, G.A. is crawling all pages. Just wait?
Client site is 15 + pages. New site had noindex/nofollow removed prior to last two crawls.
Technical SEO | | alankoen1230 -
Do I have a canonical problem?
Does this site www.davidclick.com have a canonical problem because the home page can be requested via 3 different urls http://www.davidclick.com/
Technical SEO | | Nightwing
http://davidclick.com/
http://www.davidclick.com/index.htm but I'm confused in terms of applying a fix for example all advice here http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139066#1 says i need to identify the duplicate files and add So my question is please if I do have a canonical problem how can i fix it when I only have one file for my home page, there are no duplicates 😞 Any insights welcome 🙂0 -
Keywords Ranking Dropped from 1st Page to Above 5th Page
Hello, My site URL is http://bit.ly/161NeE and our site was ranked first page for over hundred keywords before March, 30. But all of a sudden, all the keywords on first page dropped to 5th or 6th page. When we search for our site name without ".com", the results appeared on first page are all from other sites. And our page can only be seen on 6th page. We think we have been penalized by Google. But we don't know the exact reason. Can anyone please help? Some extra info on our site: 1. We have been building links by posting blog, articles and PR. All the articles are unique, written by the writers we hire. It has been working fine all the time. We also varied the anchor text a lot. 2. We didn't make any change to the website. But one real problem with our site is that the server is very slow recently and when google crawl our website, many errors were found, mostly 503, 404 errors. And the total number of errors have reach to over 50,000. Do you think this might be a problem for Google not displaying us on first page? Our technicals are working hard to solve server problem. And if it is solved, shall our rankings be back? Please advise. Thanks.
Technical SEO | | Milanoocom0