Should we use the rel-canonical tag?

annieplaskett

We have a secure version of our site, as we often gather sensitive business information from our clients.

Our https pages have been indexed as well as our http version.

Could it still be a problem to have an http and an https version of our site indexed by Google? Is this seen as being a duplicate site?
If so can this be resolved with a rel=canonical tag pointing to the http version?

Thanks

Dr-Pete

Agreed - this is generally an issue with relative paths, and job one is to fix it. In most cases, you really don't want these crawled at all. I do think rel=canonical is a good bet here - 301 redirects can get really tricky with http/https, and you can end up creating loops. It can be done right, but it's also easy to screw up, in my experience.

TakeshiYoung

Yes, having 2 versions of the same content can be seen duplicate content and could cause issues.
Yes, include a canonical tag in the header (assuming both http & https pages are close to identical). This will help Google's crawler figure out which version of the page to show in the search results.

Jinx14678

Yes, would suggest canonical as the easiest resolution -

And Irving is right PDF's are most definitely indexed, I am not sure how they are interpreted and if they would specifically count a dup content, but not sure this idea would EVER be something i would suggest as it it seems to have lots of negative repercussions.

I would most definitely agree that relative links is probably your issue, and if you canonical and remove inline relative links and make them http absolute this should resolve itself in a month or so.

irvingw

I disagree

a) pdfs are both indexed AND read by crawlers.

b) even if you don't have navigation to the file sometimes Google can find it if it's in a folder that you are not blocking in robots.txt.

c) if someone links to it once on the web it's getting crawled and indexed.

If you have a https section that content should be behind a login and not accessible to the engines. Your problem sounds like your https pages have relative links on them and Google is crawling the https page and then following the relative links staying on https so you need to fix that and this will fix your site getting http pages indexed as dupe https.

Absolute http canonical tags will help but it not the solution. you need to fix the https leaking on your secure pages.

.

Chenzo

You can "no-index" them within the html - but if you really want a fun trick - when and if you are not able to get around mass amount of duped content and it isn't for the sake of rankings - example, MLS listings, etc

Change the content into a pdf - or file format - thus not being able to be crawled.

Once again - it will NOT be crawled - so don't go doing this to an entire site

But maybe your clients confidential data - can be submitted this way - and it will not get indexed - except for the subpage - but then you can no index that subpage.

Hope this helps.

Your pal

Chenzo

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Should we use the rel-canonical tag?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Rel Canonical for HTTP and HTTPS pages

Using href lang tag for multi-regional targeting on the same page

For those of you that used LINK DETOX.

Advanced Title Tags

Canonical url question

How are PDF image alt tags and "subject" field in document properties used for search

Canonical Not Fixing Duplicate Content

Rel Canonical Syntax