Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Duplicate content and http and https
-
Within my Moz crawl report, I have a ton of duplicate content caused by identical pages due to identical pages of http and https URL's.
For example:
http://www.bigcompany.com/accomodations
https://www.bigcompany.com/accomodations
The strange thing is that 99% of these URL's are not sensitive in nature and do not require any security features. No credit card information, booking, or carts. The web developer cannot explain where these extra URL's came from or provide any further information.
Advice or suggestions are welcome! How do I solve this issue?
THANKS MOZZERS
-
Hard to tell without knowing the site, but it's possible there are external links to "https" versions of the pages. At this point, Google is going to increase the pressure to secure sites, and later this year Chrome will start warning users about all non-secure pages, so it may be worth making the move.
-
I'm reading this response and this is happening on my site as well. How did this happen in the first place? I have duplicate content because of https and http copies of all my web pages. If I type https://www.mywebsite.com I can't get to my site. Could this be coming from my hosting company? I've set up my site to simply be http://www.mywebsite.com. I'm a little worried to change my robots.txt and I would love to know how this happened in the first place.
-
If Google detects both http: and https: versions, they've started to automatically pick the https: version, but that's not consistent yet. In general, I think it's still important to set strong canonicalization signals. Google still separates your http: and https: sites in Google Search Console, too, so even they haven't quite made up their minds.
In general, Google is pushing sites toward https:, but that's a somewhat complex decision that depends on more than just SEO. If you're using https: and the https: URLs are indexed, then you should treat those as canonical and suppress the http: URLs, in most cases.
-
Hate to respond to a 3 year old thread. But does this solution needs to be updated?
Is there any change in response now, as Google is favoring https for most pages. Does google still consider http and https as two different sites? If so which one should be suppressed - http or https?
Aji
-
Hi,
I'm still having problems with redirecting. I only have 1 duplicate page with https and http, that I want to redirect but it's the homepage.
i want to redirect: https://www.domain.com to http://www.domain.com
But keep the rest of the pages the same (half http and the other half https).
How do i do this?
-
Anytime Rand! I only have two simple rules:
1. Talking business on ski days is not allowed
2. Entry into Vermont requires a pound of Seattle's best french roast coffee. In return, you receive some fantastic Vermont maple syrup.
Simple rules to live by LOL
Thanks again for all of your help...
Peter
-
Thanks dude! If I make it to Vermont, I might look you up
-
Thanks James..
Sorry, I was using Big Company as an example and just being generic.
The real URL if interested is www.hawkresort.com
-
I would personally like to thank everyone that responded with an answer. Man O Man, the best part of belonging to SEOMOZ is the community forum. It's incredibly valuable, being able to ask a question and reach out to such talent as all of you.
If anyone ever gets up to Killington or Okemo skiing, the beer is on me! I live right between both ski areas, about 8 miles to either mountain..
Thanks again.
-
I think Harald and James covered the bases here, but a couple of comments on Harald's reply:
(1) Definitely check this. A common cause of indexed https: pages is that a secure section of your site is being crawled (like a shopping cart), and you're using relative navigation links (like ) - when a crawler or visitor hits the nav link from a secure page, the relative link grabs the https: In most cases, you may want to NOINDEX secure pages. Shopping carts and checkout pages have no business in the search index, IMO.
[(2)-(5) I believe this does work, but it's very tricky, so please be careful. If anyone has linked to the https: pages, you'll lose the link-juice this way (you'll just cut those pages off). I honestly don't think it's a good choice for most sites.
(8) I actually believe the 301-redirect is simpler in most cases.
As James said, sitewide canonical tags (or on the affect pages, if they're isolated) will also work.](/contact.php)
-
Hi Serge, I came to know about the "robots_ssl.txt" from the website http://www.seoworkers.com/seo-articles-tutorials/robots-and-https.html
-
I would check your server for a https folder.
add a robots.txt file in the root of the https folder:
User-agent: *
Disallow:/My guess is that the spider is following a link somewhere within your site that links to a https:// url. The spider is than re-indexing the entire site using https://
My 2 cents for what its worth.
-
Harald, " robots_ssl.txt " where did you get that?
-
Hello Hawkvt1, Fisrt of all I want to tell you that the protocols (http/https) are different, they are considered two separate sites, so there’s a good chance to get penalized for duplicate content. If the search engine discovers two identical pages, generally it would take the page it saw first and ignore the other pages.The solutions are described below:
S__olutions:
- Be smart about the site structure: to keep the engines from crawling and indexing HTTPS pages, structure the website so that HTTPs are only accessible through a form submission (log-in, sign-up, or payment pages). The common mistake is making these pages available via a standard link (happens when you are either ignorant or not aware that the secure version of the site is being crawled and indexed).
- Use Robots.txt file to control which pages will be crawled and indexed
- Use.htaccess file. Here’s how to do this:
- Create a file names robots_ssl.txt in your root.
- Add the following code to your .htaccessRewriteCond %{SERVER_PORT} 443 [NC]RewriteRule ^robots.txt$ robots_ssl.txt [L]
- Remove yourdomain.com:443 from the webmaster tools if the pages have already been crawled
- For dynamic pages like php, try< ?phpif ($_SERVER["SERVER_PORT"] == 443){echo “< meta name=” robots ” content=” noindex,nofollow ” > “;}?>
- Dramatic solution (may not always be possible): 301 redirect the HTTPS pages to the HTTP pages – with hopes that the link juice will transfer over.
For more information please refer to this link :
http://www.seomoz.org/ugc/solving-duplicate-content-issues-with-http-and-https
I'm sure that your problem is solved.
-
You could implement the canonical tag onto the HTTP version of the website.
Another problem when having a quick look at this website is that all your title tags are the same with the brand term at the front, this is not advisable at all you want to put the brand term at the end of the title and your generic terms first.
I would look at getting an SEO audit done to fix the issues with the website.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content and Subdirectories
Hi there and thank you in advance for your help! I'm seeking guidance on how to structure a resources directory (white papers, webinars, etc.) while avoiding duplicate content penalties. If you go to /resources on our site, there is filter function. If you filter for webinars, the URL becomes /resources/?type=webinar We didn't want that dynamic URL to be the primary URL for webinars, so we created a new page with the URL /resources/webinar that lists all of our webinars and includes a featured webinar up top. However, the same webinar titles now appear on the /resources page and the /resources/webinar page. Will that cause duplicate content issues? P.S. Not sure if it matters, but we also changed the URLs for the individual resource pages to include the resource type. For example, one of our webinar URLs is /resources/webinar/forecasting-your-revenue Thank you!
Technical SEO | | SAIM_Marketing0 -
Proper 301 redirect code for http to https
I see lots of suggestions on the web for forwarding http to https. I've got several existing sites that want to take advantage of the SSL boost for SEO (however slight) and I don't want to lose SEO placements in the process. I can force all pages to be viewed through the SSL - that's no problem. But for SEO reasons, do I need to do a 301 redirect line of code for every page in the site to the new "https" version? Or is there a way to catch all with one line of code that Google, etc. will recognize & honor?
Technical SEO | | wcksmith10 -
Duplicate content and 404 errors
I apologize in advance, but I am an SEO novice and my understanding of code is very limited. Moz has issued a lot (several hundred) of duplicate content and 404 error flags on the ecommerce site my company takes care of. For the duplicate content, some of the pages it says are duplicates don't even seem similar to me. additionally, a lot of them are static pages we embed images of size charts that we use as popups on item pages. it says these issues are high priority but how bad is this? Is this just an issue because if a page has similar content the engine spider won't know which one to index? also, what is the best way to handle these urls bringing back 404 errors? I should probably have a developer look at these issues but I wanted to ask the extremely knowledgeable Moz community before I do 🙂
Technical SEO | | AliMac260 -
How to change 302 redirect from http to https
Hi gang. Our site currently has a 302 redirect from the HTTP version of the homepage to the HTTPS version of the homepage. I understand this really should be changed to a 301 redirect but I'm having a little trouble figuring out exactly how this should be done. Some places on the internet are telling me I can edit our htaccess file to specify the type of redirect, however our htaccess file seems to be missing some of the information in theirs. Can anyone tell me what needs to be changed in the htaccess file - or if there's a simpler way to change the 302 to a 301? Many thanks 🙂 htaccess: BEGIN WordPress RewriteEngine On RewriteBase / RewriteRule ^index.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] END WordPress EXPIRES CACHING ExpiresActive On ExpiresByType image/jpg "access plus 6 months" ExpiresByType image/jpeg "access plus 6 months" ExpiresByType image/gif "access plus 6 months" ExpiresByType image/png "access plus 6 months" ExpiresByType text/css "access plus 10 days" ExpiresByType application/pdf "access plus 10 days" ExpiresByType application/x-shockwave-flash "access plus 10 days" ExpiresByType image/x-icon "access plus 6 months" ExpiresDefault "access plus 2 days" EXPIRES CACHING
Technical SEO | | davedon0 -
Handling of Duplicate Content
I just recently signed and joined the moz.com system. During the initial report for our web site it shows we have lots of duplicate content. The web site is real estate based and we are loading IDX listings from other brokerages into our site. If though these listings look alike, they are not. Each has their own photos, description and addresses. So why are they appear as duplicates – I would assume that they are all too closely related. Lots for Sale primarily – and it looks like lazy agents have 4 or 5 lots and input the description the same. Unfortunately for us, part of the IDX agreement is that you cannot pick and choose which listings to load and you cannot change the content. You are either all in or you cannot use the system. How should one manage duplicate content like this? Or should we ignore it? Out of 1500+ listings on our web site it shows 40 of them are duplicates.
Technical SEO | | TIM_DOTCOM0 -
Robots.txt on http vs. https
We recently changed our domain from http to https. When a user enters any URL on http, there is an global 301 redirect to the same page on https. I cannot find instructions about what to do with robots.txt. Now that https is the canonical version, should I block the http-Version with robots.txt? Strangely, I cannot find a single ressource about this...
Technical SEO | | zeepartner0 -
Localized domains and duplicate content
Hey guys, In my company we are launching a new website and there's an issue it's been bothering me for a while. I'm sure you guys can help me out. I already have a website, let's say ABC.com I'm preparing a localized version of that website for the uk so we'll launch ABC.co.uk Basically the websites are going to be exactly the same with the difference of the homepage. They have a slightly different proposition. Using GeoIP I will redirect the UK traffic to ABC.co.uk and the rest of the traffic will still visit .com website. May google penalize this? The site itself it will be almost the same but the homepage. This may count as duplicate content even if I'm geo-targeting different regions so they will never overlap. Thanks in advance for you advice
Technical SEO | | fabrizzio0 -
How to resolve this Duplicate content?
Hi , There is page i get when i do proper menu navigation Caratlane.com>jewellery>rings>casualsrings> http://www.caratlane.com/jewellery/rings/casual-rings/leaves-dew-diamond-0-03-ct-peridot-1-ct-ring-18k-yellow-gold.html When i do a site search in my search box by my product code number "JR00219" The same page is appears with different url http://www.caratlane.com/leaves-dew-diamond-0-03-ct-peridot-1-ct-ring-18k-yellow-gold.html So there is a duplicate content. How can we resolve it. Regards, kathir caratlane.com
Technical SEO | | kathiravan0