What could cause Google to not honor canonical URLs?
-
I have a strange situation on a website, when I do a Google query of site:example.com all the top indexed results appear to be queries that users can perform on the website. So any random term the user searches for on the website for some reason is causing the search result page to get indexed - like example.com/search/query/random-keywords
However, the search results page has a canonical tag on it that points to example.com/search, but that doesn't seem to be doing anything. Any thoughts or ideas why this could be happening?
-
Hi there,
First of all, its a mistake to think that when searching with _site: _operator, the first results are the most important nor the more relevant. Google has said a few times that we shouldn't rely that much on what that search in terms of what's being shown.
Blocking search results with robots.txt wont be of help, as it will not remove already indexed pages and cant prevent for new pages to be indexed (if there's an external link to a robots.txt blocked page, google can still index it) it'll only prevent Googlebot from discovering new ones FROM YOUR SITE.
Again, i'd try to dig deeper to understand where are the links to internal searches that google is finding. Googlebot will not do any search in your site.
The thing with GSC, might be related to quite a few reasons. I cant say much because I don't know any more specifics, but from what you are telling me it looks like you are getting impressions in searches that you don't relate to your site and that land on pages that google is noindexing. Yeah im repeating the obvious, hehe.
In my experience, Google can have these strange behaviours. You know, there are cases when a page is canonicalized, but it can still be shown in SERPS. Dont ask me why, but it happens. It takes a little time to google fully replace it with the correct one.
I'd wait a little longer to see how Google is handling them.I don't know if im helping you.
it kinda took me a few minutes to understand/process what you wrote and come up with an answer.Please, feel free ask again or comment on my reply if I misunderstood something.
Best luck,
Gaston -
Hi here's some more background info on this situation that makes it even stranger. I can perform some pretty specific searches on Google where these indexed search result pages show up. And I can look in Google Search Console under the performance section and see that those pages receive impressions and clicks. However, if I inspect the URL, Search Console says it is not included in Google's index, and the reason it gives under indexing is because it says it is honoring the canonical URL. So search console is saying it isn't indexed because of the canonical, but I can do searches and find that exact URL in the index. Any ideas what this could be from?
-
Hi Gaston,
Thanks for the response. I can confirm that the example, /search and /search?q=foo are pretty much identical. However that may not always be the case, only when a user searches for something that would return no results. So, a website that sells widgets, /search and /search?q=widgets would not be identical, and in that case it would make sense that Google would not honor the canonical link. What's really strange is if I search google for the site: operator of the domain, the top pages are not user queries for things that make sense. The top indexed pages are random, non-relevant user searches.
I do not have a way with this system to control noindex tags on these search result pages. The only thing I could do is take the nuclear option and just block it all with robots.txt using wildcards. But that means no search result pages would get indexed, relevant or not.
-
Hi there,
in my experience, when google doesn't honor Canonicals, is because pages arent similar.
In its definition, canonical are there for two or more pages that have the same content.If you are finding it problematic, i'd suggest to use noindex tags for that search pages.
I'd investigate If there are links pointing to those internal search pages, as its not common for google to discover search pages.Hope it helps,
Best luck.
Gaston
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate, submitted URL not selected as canonical
Hi all, A number of our pages have dropped out of search rankings. It seems they are being marked as "Duplicate, submitted URL not selected as canonical" However, the page Google is choosing as the canonical is totally different - different headings, titles, metadata, content on the page. We are completely mystified as to why this is happening. If anyone can shed any light, it would be hugely appreciated! Example URL is this one:
Technical SEO | | Eric_S
https://www.vouchedfor.co.uk/IFA-financial-advisor-mortgage/london Which Google seems to think is a duplicate of this: https://www.vouchedfor.co.uk/solicitor/london0 -
URL has caps, but canonical does not. Now what?
Hi, Just started working with a site that has the occasional url with a capital, but then the url in the canonical as lower case. Neither, when entered in a browser, resolves to the other. It's a Shopify site. What do you think I should do?
Technical SEO | | 945010 -
URL Structure
Hi, Hope you are all well. On our website we have a 'blog' and a 'news' section. The blog is located on "/blog" - but when you click on a post the url structure changes to /name-of-article and the blog subdomain isn't included. Would it be better to have "blog/name-of-article as this would then make the blog perform better in search results? Also, if our news page is under /news - but when you click on an article it changes to /news-article/name-of-article Wouldn't it be better to have /news/name-of-article Thanks a lot!! 🙂
Technical SEO | | National-Homebuyers0 -
Canonical Expert question!
Hello, I am looking for some help here with an estate agent property web site. I recently finished the MoZ crawling report and noticed that MoZ sees some pages as duplicate, mainly from pages which list properties as page 1,2,3 etc. Here is an example: http://www.xxxxxxxxx.com/property-for-rent/london/houses?page=2
Technical SEO | | artdivision
http://www.xxxxxxxxx.com/property-for-rent/london/houses?page=3 etc etc Now I know that the best practise says I should set a canonical url to this page:
http://www.xxxxxxxxx.com/property-for-rent/london/houses?page=all but here is where my problem is. http://www.xxxxxxxxx.com/property-for-rent/london/houses?page=1 contains good written content (around 750 words) before the listed properties are displayed while the "page=all" page do not have that content, only the properties listed. Also http://www.xxxxxxxxx.com/property-for-rent/london/houses?page=1 is similar with the originally designed landing page http://www.xxxxxxxxx.com/property-for-rent/london/houses I would like yoru advise as to what is the best way to can url this and sort the problem. My original thoughts were to can=url to this page http://www.xxxxxxxxx.com/property-for-rent/london/houses instead of the "page=all" version but your opinion will be highly appreciated.0 -
Single URL not indexed
Hi everyone! Some days ago, I noticed that one of our URLs (http://www.access.de/karriereplanung/webinare) is no longer in the Google index. We never had any form of penalty, link warning etc. Our traffic by Google is constantly growing every month. This single page does not have an external link pointing to it - only internal links. The page has been indexed all the time. The HTTP status code is 200, there is no noindex or something in the code. I submitted the URL on GWMT to let Google send it to the index. It was crawled successfully by Google, sent to the index 5 days ago - nothing happened, still not indexed. Do you have any suggestions why this page is no longer indexed? It is well linked internally and one click away from the home page. There is still the PR of 5 showing, I always thought that pages with PR are indexed.......
Technical SEO | | accessKellyOCG0 -
Google webmaster tool doestn allow me to send 'URL and all linked pages"
Hello! I made a lot of optimization changes in my site ( seo urls, and a lot more ) , I always use Google Webmaster tools, fetch as Google Bot to refresh my site but now it doesnt allow me to 'Send URL and all linked pages' check the attachment Thank you
Technical SEO | | matiw0 -
Do I have a canonical problem?
Does this site www.davidclick.com have a canonical problem because the home page can be requested via 3 different urls http://www.davidclick.com/
Technical SEO | | Nightwing
http://davidclick.com/
http://www.davidclick.com/index.htm but I'm confused in terms of applying a fix for example all advice here http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139066#1 says i need to identify the duplicate files and add So my question is please if I do have a canonical problem how can i fix it when I only have one file for my home page, there are no duplicates 😞 Any insights welcome 🙂0 -
Is the full URL necessary for successful Canonical Links?
Hi, my first question and hopefully an easy enough one to answer. Currently in the head element of our pages we have canonical references such as: (Yes, untidy URL...we are working on it!) I am just trying to find out whether this snippet of the full URL is adequete for canonicalization or if the full domain is needed aswell. My reason for asking is that the SEOmoz On-Page Optimization grading tool is 'failing' all our pages on the "Appropriate Use of Rel Canonical" value. I have been unable to find a definitive answer on this, although admittedly most examples do use the full URL. (I am not the site developer so cannot simply change this myself, but rather have to advise him in a weekly meeting). So in short, presumably using the full URL is best practise, but is it essential to its effectiveness when being read by the search engines? Or could there be another reason why the "Appropriate Use of Rel Canonical" value is not being green ticked? Thank you very much, I appreciate any advice you can give.
Technical SEO | | rmkjersey0