Pages excluded from Google's index due to "different canonicalization than user"
-
Hi MOZ community,
A few weeks ago we noticed a complete collapse in traffic on some of our pages (7 out of around 150 blog posts in question). We were able to confirm that those pages disappeared for good from Google's index at the end of January '18, they were still findable via all other major search engines.
Using Google's Search Console (previously Webmastertools) we found the unindexed URLs in the list of pages being excluded because "Google chose different canonical than user". Content-wise, the page that Google falsely determines as canonical instead has little to no similarity to the pages it thereby excludes from the index.
About our setup:
We are a SPA, delivering our pages pre-rendered, each with an (empty) rel=canonical tag in the HTTP header that's then dynamically filled with a self-referential link to the pages own URL via Javascript. This seemed and seems to work fine for 99% of our pages but happens to fail for one of our top performing ones (which is why the hassle
).
What we tried so far:
- going through every step of this handy guide: https://moz.com/blog/panic-stations-how-to-handle-an-important-page-disappearing-from-google-case-study --> inconclusive (healthy pages, no penalties etc.)
- manually requesting re-indexation via Search Console --> immediately brought back some pages, others shortly re-appeared in the index then got kicked again for the aforementioned reasons
- checking other search engines --> pages are only gone from Google, can still be found via Bing, DuckDuckGo and other search engines
Questions to you:
- How does the Googlebot operate with Javascript and does anybody know if their setup has changed in that respect around the end of January?
- Could you think of any other reason to cause the behavior described above?
Eternally thankful for any help!
-
Hi SvenRi, that's an interesting one! The message you're getting from Google suggests that, rather than not finding the canonical tag, the system has reason to believe that the canonical is not representative of the best content.
One thing I'd bear in mind is that Google doesn't take canonical tags as gospel, but rather guidance, so it can just ignore them without there necessarily being a problem in how you've implemented that tag. Another is that while Google says that their crawlers can parse JavaScript, there's evidence that it doesn't parse the page content perfectly.
What happens when you fetch and render the pages in question using Search Console (both the page you want to rank and the page Google is selecting)? Can you see all of the content? Google uses the same JavaScript rendering as Chrome 41 (see here) have you tried accessing with that? You could also try a tool like Screaming Frog with JavaScript rendering switched on to see what kind of page content comes back. It could be worth making sure the canonical is generated properly but I'd also be checking that the page content is being rendered properly to make sure Google is seeing the pages as different as you describe. I'd also check to make sure there isn't a second, conflicting, canonical tag on the page. I know some SPA frameworks can have issues with double-opening HTML tags when one page is accessed after another, that could be something that would confuse a crawler so you could double-check that.
As ever, there are the rumours that Google will start giving much more weight to mobile in terms of indexing. Given your question about things changing recently - does your site have desktop and mobile parity?
If it looks as though everything is kosher, is it possible that the page Google is suggesting is much more heavily linked to internally or externally? If internally you could consider reviewing your internal linking (Will wrote a post about ways to think about internal linking here). You could use a tool like Majestic to look at who is linking to these pages externally, it may be worth double checking that all the links are genuine.
TL;DR I would start with the whole page content, not just the search directives, to make sure that's always being understood properly, then I would look in to linking. These are mainly areas of investigation and next debug steps, hopefully they'll help narrow down the search for you!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best to Fix Duplicate Content Issues on Blog If URLs are Set to "No-Index"
Greetings Moz Community: I purchased a SEMrush subscription recently and used it to run a site audit. The audit detected 168 duplicate content issues mostly relating to blog posts tags. I suspect these issues may be due to canonical tags not being set up correctly. My developer claims that since these blog URLs are set to "no-index" these issues do not need to be corrected. My instinct would be to avoid any risk with potential duplicate content. To set up canonicalization correctly. In addition, even if these pages are set to "no-index" they are passing page rank. Further more I don't know why a reputable company like SEMrush would consider these errors if in fact they are not errors. So my question is, do we need to do anything with the error pages if they are already set to "no-index"? Incidentally the site URL is www.nyc-officespace-leader.com. I am attaching a copy of the SEMrush audit. Thanks, Alan BarjWaO SqVXYMy
Intermediate & Advanced SEO | | Kingalan10 -
Help my site it's not being indexed
Hello... We have a client, that had arround 17K visits a month... Last september he hired a company to do a redesign of his website....They needed to create a copy of the site on a different subdomain on another root domain... so I told them to block that content in order to not affect my production site, cause it was going to be an exact replica of the content but different design.... The developmet team did it wrong and blocked the production site (using robots.txt), so my site lost all it's organica traffic, which was 85-90% of the total traffic and now only get a couple of hundreds visits a month... First I thought we had been somehow penalized, however when I the other site recieving new traffic and being indexed i realized so I switched the robots.txt and created 301 redirect from the subdomain to the production site. After resending sitemaps, links to google+ and many things I can't get google to reindex my site.... when i do a site:domain.com search in google I only get 3 results. Its been now almost 2 month and honestly dont know what to do.... Any help would be greatly appreciated Thanks Dan
Intermediate & Advanced SEO | | daniel.alvarez0 -
Does Google make continued attempts to crawl an old page one it has followed a 301 to the new page?
I am curious about this for a couple of reasons. We have all dealt with a site who switched platforms and didn't plan properly and now have 1,000's of crawl errors. Many of the developers I have talked to have stated very clearly that the HTacccess file should not be used for 1,000's of singe redirects. I figured If I only needed them in their temporarily it wouldn't be an issue. I am curious if once Google follows a 301 from an old page to a new page, will they stop crawling the old page?
Intermediate & Advanced SEO | | RossFruin0 -
Link Building for "State" informational pages
I have a webpage for all 50 states for specific info relating to relocation and was wondering if there are any recommended links to work at getting for these pages. I would like to do "state" specific and possibly health related links for each page to help in the SEO rankings. I can see that if I just wanted to get 10 links on each page that is going to be 500 links I have to build and it is going to be very time consuming but I feel it is necessary. Thank you, Poo
Intermediate & Advanced SEO | | Boodreaux0 -
Getting Google in index but display "parent" pages..
Greetings esteemed SEO experts - I'm hunting for advice: We operate an accommodation listings website. We monetize by listing position in search results, i.e. you pay more to get higher placing in the page. Because of this, while we want individual detailed listing pages to be indexed to get the value of the content, we don't really want them appearing in Google search results. We ideally want the "content value" to be attributed to the parent page - and google to display this as the link in the search results instead of the individual listing. Any ideas on how to achieve this?
Intermediate & Advanced SEO | | AABAB0 -
Is it allowed to have different alt on same image on different pages?
Hi, I have images that match several different keywords and I wondered if I can give them different alts based on the page that they are displayed or will Google be angry with me? Thanks
Intermediate & Advanced SEO | | BeytzNet0 -
Is there a way to stop my product pages with the "show all" catagory/attribute from duplicating content?
If there were less pages with the "show all" attribute it would be a simple fix by adding the canonical URL tag. But seeing that there are about 1,000 of them I was wondering if their was a broader fix that I could apply.
Intermediate & Advanced SEO | | cscoville0 -
Should "View All Products" be the canonical page?
We currently have "view 12" as the default setting when someone arrives to www.mysite.com/subcategory-page.aspx. We have been advised to change the default to "view all products" and make that the canonical page to ensure all of our products get indexed. My concern is that doing this will increase the page load time and possibly hurt rankings. Does it make sense to change all our our subcategory pages to show all the products when someone visits the page? Most sites seem to have a smaller number of products as the default.
Intermediate & Advanced SEO | | pbhatt0