What could cause Google to not honor canonical URLs?
-
I have a strange situation on a website, when I do a Google query of site:example.com all the top indexed results appear to be queries that users can perform on the website. So any random term the user searches for on the website for some reason is causing the search result page to get indexed - like example.com/search/query/random-keywords
However, the search results page has a canonical tag on it that points to example.com/search, but that doesn't seem to be doing anything. Any thoughts or ideas why this could be happening?
-
Hi there,
First of all, its a mistake to think that when searching with _site: _operator, the first results are the most important nor the more relevant. Google has said a few times that we shouldn't rely that much on what that search in terms of what's being shown.
Blocking search results with robots.txt wont be of help, as it will not remove already indexed pages and cant prevent for new pages to be indexed (if there's an external link to a robots.txt blocked page, google can still index it) it'll only prevent Googlebot from discovering new ones FROM YOUR SITE.
Again, i'd try to dig deeper to understand where are the links to internal searches that google is finding. Googlebot will not do any search in your site.
The thing with GSC, might be related to quite a few reasons. I cant say much because I don't know any more specifics, but from what you are telling me it looks like you are getting impressions in searches that you don't relate to your site and that land on pages that google is noindexing. Yeah im repeating the obvious, hehe.
In my experience, Google can have these strange behaviours. You know, there are cases when a page is canonicalized, but it can still be shown in SERPS. Dont ask me why, but it happens. It takes a little time to google fully replace it with the correct one.
I'd wait a little longer to see how Google is handling them.I don't know if im helping you.
it kinda took me a few minutes to understand/process what you wrote and come up with an answer.Please, feel free ask again or comment on my reply if I misunderstood something.
Best luck,
Gaston -
Hi here's some more background info on this situation that makes it even stranger. I can perform some pretty specific searches on Google where these indexed search result pages show up. And I can look in Google Search Console under the performance section and see that those pages receive impressions and clicks. However, if I inspect the URL, Search Console says it is not included in Google's index, and the reason it gives under indexing is because it says it is honoring the canonical URL. So search console is saying it isn't indexed because of the canonical, but I can do searches and find that exact URL in the index. Any ideas what this could be from?
-
Hi Gaston,
Thanks for the response. I can confirm that the example, /search and /search?q=foo are pretty much identical. However that may not always be the case, only when a user searches for something that would return no results. So, a website that sells widgets, /search and /search?q=widgets would not be identical, and in that case it would make sense that Google would not honor the canonical link. What's really strange is if I search google for the site: operator of the domain, the top pages are not user queries for things that make sense. The top indexed pages are random, non-relevant user searches.
I do not have a way with this system to control noindex tags on these search result pages. The only thing I could do is take the nuclear option and just block it all with robots.txt using wildcards. But that means no search result pages would get indexed, relevant or not.
-
Hi there,
in my experience, when google doesn't honor Canonicals, is because pages arent similar.
In its definition, canonical are there for two or more pages that have the same content.If you are finding it problematic, i'd suggest to use noindex tags for that search pages.
I'd investigate If there are links pointing to those internal search pages, as its not common for google to discover search pages.Hope it helps,
Best luck.
Gaston
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google webmaster tools says access denied for 77 urls
Hi i am looking in google webmaster tools and i have seen a major problem which i hope people can help me sort out. The problem is, i am being told that 77 urls are being denied access. The message when i look for more information says the below Googlebot couldn't crawl your URL because your server either requires login to access the page, or is blocking Googlebot from accessing your site. the responce code is 403 here is a couple of examples http://www.in2town.co.uk/Entertainment-Magazine http://www.in2town.co.uk/Weight-Loss-Hypnotherapy-helped-woman-lose-3-stone i think the problem could be that i have sent them to another url in my httaccess file using the 403 re-direct but why would it bring up that google bot could not crawl them any help would be great
Technical SEO | | ClaireH-1848860 -
How to keep a URL social equity during a URL structure/name change?
We are in the process of making significant URL name/structure change to one of our property and we want to keep the social equity (likes, share, +1, tweets) from the old to the new URL. We have been trying many different option without success. We are running our social "button" in an iframe. Thanks
Technical SEO | | OlivierChateau0 -
How to find original URLS after Hosting Company added canonical URLs, URL rewrites and duplicate content.
We recently changed hosting companies for our ecommerce website. The hosting company added some functionality such that duplicate content and/or mirrored pages appear in the search engines. To fix this problem, the hosting company created both canonical URLs and URL rewrites. Now, we have page A (which is the original page with all the link juice) and page B (which is the new page with no link juice or SEO value). Both pages have the same content, with different URLs. I understand that a canonical URL is the way to tell the search engines which page is the preferred page in cases of duplicate content and mirrored pages. I also understand that canonical URLs tell the search engine that page B is a copy of page A, but page A is the preferred page to index. The problem we now face is that the hosting company made page A a copy of page B, rather than the other way around. But page A is the original page with the seo value and link juice, while page B is the new page with no value. As a result, the search engines are now prioritizing the newly created page over the original one. I believe the solution is to reverse this and make it so that page B (the new page) is a copy of page A (the original page). Now, I would simply need to put the original URL as the canonical URL for the duplicate pages. The problem is, with all the rewrites and changes in functionality, I no longer know which URLs have the backlinks that are creating this SEO value. I figure if I can find the back links to the original page, then I can find out the original web address of the original pages. My question is, how can I search for back links on the web in such a way that I can figure out the URL that all of these back links are pointing to in order to make that URL the canonical URL for all the new, duplicate pages.
Technical SEO | | CABLES0 -
Google +1 not recognizing rel-canonical
So I have a few pages with the same content just with a different URL. http://nadelectronics.com/products/made-for-ipod/VISO-1-iPod-Music-System http://nadelectronics.com/products/speakers/VISO-1-iPod-Music-System http://nadelectronics.com/products/digital-music/VISO-1-iPod-Music-System All pages rel-canonical to:
Technical SEO | | kevin4803
http://nadelectronics.com/products/made-for-ipod/VISO-1-iPod-Music-System My question is... why can't google + (or facebook and twitter for that matter) consolidate all these pages +1. So if the first two had 5 +1 and the rel-canonical page had 5 +1's. It would be nice for all pages to display 15 +1's not 5 on each. It's my understanding that Google +1 will gives the juice to the correct page. So why not display all the +1's at the same time. Hope that makes sense.0 -
Canonical efficiency
Hi, I'm creating recommendations for one of my client's site. It's a news site highly based on a regional aspect. One of the main features would be that you can navigate on a high level, we call it inter-regional (with all the regions news) and on the regional level (with only news related to the region) which act as a filter which means that most of my content will be duplicate. To allow the user to navigate the site on the two levels means that all the news pages will be duplicated, one with the inter-regional URL and one with the regional URL. Example: http://www.sitename.com/category/2011/11/07/name-of-the-article http://www.sitename.com/region-name/category/2011/11/07/name-of-the-article The regional URL is the official one, since it has all the keywords I want, and I'm planning to have a canonical on both version with the regional URL. Is there a risk that this would affect my ranking? Any alternatives? I read that I could prevent SE to crawl inter-regional articles using my robot.txt but I'm not fond of that. Thanks!
Technical SEO | | Pherogab0 -
URL query strings and canonical tag
Hi, I have recently been getting my comparison website redesigned and developed onto wordpress and the site is now 90% complete. Part of the redesign has meant that there are now dynamic urls in the format: http://www.mywebsite.com/10-pounds-productss/?display=cost&value=10 I have other pages similar to this but with different content for the different price ranges and these are linked to from the menus: http://www.mywebsite.com/20-pounds-products/?display=cost&value=20 Now my questions are: 1. I am using Joost's All-in-one SEO plugin and this adds a canonical tag to the page that is pointing to http://www.mywebsite.com/10-pounds-products/ which is the permalink. Is this OK as it is or should i change this to http://www.mywebsite.com/10-pounds-products/?display=cost&value=10 2. Which URL will get indexed, what gets shown as the display URL in the SERPs and what page will users land on? I'm a bit confused so apologies if these seem like silly questions. Thanks
Technical SEO | | bizarro10000 -
How to disallow google and roger?
Hey Guys and girls, i have a question, i want to disallow all robots from accessing a certain root link: Get rid of bots User-agent: * Disallow: /index.php?_a=login&redir=/index.php?_a=tellafriend%26productId=* Will this make the bots not to access any web link that has the prefix you see before the asterisk? And at least google and roger will get away by reading "user-agent: *"? I know this isn't the standard proceedure but if it works for google and seomoz bot we are good.
Technical SEO | | iFix0 -
Duplicate Content from Google URL Builder
Hello to the SEOmoz community! I am new to SEOmoz, SEO implementation, and the community and recently set up a campaign on one of the sites I managed. I was surprised at the amount of duplicate content that showed up as errors and when I took a look in deeper, the majority of errors were caused by pages on the root domain I put through Google Analytics URL Builder. After this, I went into webmaster tools and changed the parameter handling to ignore all of the tags the URL Builder adds to the end of the domain. SEOmoz recently recrawled my site and the errors being caused by the URL Builder are still being shown as duplicates. Any suggestions on what to do?
Technical SEO | | joshuaopinion0