How to fix duplicate content for homepage and index.html
-
Hello,
I know this probably gets asked quite a lot but I haven't found a recent post about this in 2018 on Moz Q&A, so I thought I would check in and see what the best route/solution for this issue might be. I'm always really worried about making any (potentially bad/wrong) changes to the site, as it's my livelihood, so I'm hoping someone can point me in the right direction.
Moz, SEMRush and several other SEO tools are all reporting that I have duplicate content for my homepage and index.html (same identical page).
According to Moz, my homepage (without index.html) has PA 29 and index.html has PA 15. They are both showing Status 200. I read that you can either do a 301 redirect or add rel=canonical
I currently have a 301 setup for my http to https page and don't have any rel=canonical added to the site/page. What is the best and safest way to get rid of duplicate content and merge the my non index and index.html homepages together these days? I read that both 301 and canonical pass on link juice but I don't know what the best route for me is given what I said above.
Thank you for reading, any input is greatly appreciated!
-
OK, Paul, I hear what you are saying. It's a very open and obvious diss.
I'm not sure what you are saying makes any difference to the argument that the canonical way here is not the way to go. I was explaining in the simplest way, I would not want, and I'm sure you would not want either, a live page like this - the home page, live and canonicalised.
(It's a given that the canonical works like a 301, passing link juice to the preferred version.)
So thanks but it makes no difference - delete & 301 every time.
Google is heightening its distrust of canonicals - the new Seach Console tool reveals which pages are the preferred canonical and it's something of a surprise to SEOs!
If you feel like playing top trumps again then why not PM me? - it's so much better and the uninitiated do not need to see it!
Cheers Nigel
-
A proper canonical tag does a lot more than "just be telling Google not to rank it" When used properly (i.e. pages that truly do contain the same content), the canonicalised page passes its ranking signals back to the canonical source.
I agree with Kristina - while a 301 would be preferable (it's a directive, while canonical tags are taken as suggestions), a canonical tag would be vastly better than not doing anything about the issue. At least until the dev can get the problem with the 301-redirect properly resolved.
Paul
-
It's best practice to redirect, but if that's not an option, the canonical route should help the problem a lot! You'll probably lose some link equity with this route, but it should clear up duplicate content issues from Google's side.
-
Hi Dre
If you just do a canonical then the page will still be live, you will just be telling Google not to rank it. Best practice is to remove it all together and 301. It is bad practice having more than one version of your home page, (any page) live!
Regards Nigel
-
Thank you so much for all the responses. So it sounds like 301 redirect through htaccess is the way to go. What is the difference between using the 301 through htaccess vs using rel=canonical in my case? Does the 301 provide better link juice vs rel=canonical or is canonical just not applicable in this case? Thanks for all the replies and helpful suggestions again!
EDIT: I spoke to my developer (who is hosting and maintaining my site now).. he said he tried to do 301 through htaccess but it seems to be crashing the site (and trust me he is very good at what he does). Part of the problem is that my site is VERY old (originally build about 10 years ago and NOT updated once since).. he has been slowly updating and cleaning up the site slowly and he will try to figure out why the 301 is crashing the site and not working but in the mean time how safe is it to use rel=canonical instead of a 301?
Thanks again!
-
Hi dre
Your site really shouldn't be generating an index.html in the first place but if it is you must make sure that there is a 301 in the htaccess file sending all traffic to the single homepage URL as Lynn correctly points out this will be a permanent redirect.
It is very simple to do. Both versions are treated as separate pages (as http and https) so you are essentially showing a duplicate site to Google so your rankings will be terrible until you change.
Regards Nigel
-
Hello there,
You can use .htaccess URL rewrite to remove all the .html from your URL, here's the rewrite rules.
RewriteEngine On
RewriteRule ^index.html$ / [R=301,L]
RewriteRule ^(.*)/index.html$ /$1/ [R=301,L]Once you added this rules you should also fix all your internal links make sure they link to the URL without .html
Hope this helps,
Joseph Yap
-
"I currently have a 301 setup for my http to https page" - great! Also, you should check if your inner pages redirecting from HTTP-versions to HTTPS too.
index.html should redirect to the homepage main version with 301 Permanent Redirect.
-
Google consider HTTP and HTTPS as two separate protocols. Since the contents are same on both versions, google bots consider it as duplicate content. Adding a canonical URL will solve this problem. If you have any doubts, feel free to ask.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate page content
Hi Crawl errors is showing 2 pages of duplicate content for my clients WordPress site: /news/ & /category/featured/ Yoast is installed so how best to resolve this ? i see that both pages are canonicalised to themselves so presume just need to change the canonical tag on /category/featured/ to reference /news/ ?(since news is the page with higher authority and the main page for showing this info) or is there other way in Yoast or WP to deal with this & prevent from happening again ? Cheers Dan
On-Page Optimization | | Dan-Lawrence0 -
Is tracking code added to the end of a URL considered duplicate content
I have two URLs one with a tracking coded and one without. http://www.towermarketing.net/lets-talk-ux-baby and http://www.towermarketing.net/lets-talk-ux-baby/**#.U6ghgLEz64I ** My question is will this be considered as two separate URLs, will Google consider this as two pages with duplicate content. Any recommendations would be much appreciated.
On-Page Optimization | | TowerMarketing0 -
301 redirected Duplicate Content, still showing up as duplicate after new crawl.
We launched a site where key landing pages were not showing up in google. After running the seomoz crawl it returned a lot of duplicate pages which may expalin this. The actual url of the page is /design and it was telling me the following were dupes: /design/family-garden-design
On-Page Optimization | | iterate
/design/small-garden-design
/design/large-rural-garden-design
/Design All of these URL's were in fact pointing to the /design landing page. I 301 redirected all of the pages so they all now resolve to /design After running another crawl the day after doing this it's still showing up as duplicate content on seomoz. Does seomoz evaluate the new changes right away?0 -
Duplicated Content with joomla multi language website
Dear Seomoz Community I am running a multi language joomla website (www.siam2nite.com) with 2 active languages. The first and primary language is english. the second language is thai. Most of the content (articles, event descriptions ...) is in english only. What we did is a thai translation for the navigation bars, headers, titles etc (translation of all joomla language files) those texts are static and only help the user navigate / understand our site in their thai language. Now I facing a problem with duplicated content. Lets take our Q&A component as example. the url structure looks like this: english - www.siam2nite.com/en/questions/ thai - www.siam2nite.com/th/questions/ Every question asked will create two URL, one for each language. The content itself (user questions & answers) is identical on both URL's. Only the GUI language is different. If you take a look at this question you will understand what i mean: ENGLISH VERSION: http://www.siam2nite.com/en/questions/where-to-celebrate-halloween-in-bangkok THAI VERSION: http://www.siam2nite.com/th/questions/where-to-celebrate-halloween-in-bangkok As you can see each page has a unique title (H1) and introduction text in the correct language (same for menu, buttons, etc.) but the questions and answers are only available in one language. Now my question 😉 I guess Google will see this pages as duplicated content. How should I proceed with this problem: put all thai links /th/questions/ in the robots.txt and block them or make a canonical tag for the english versions? Not sure if I set a canonical tag google will still index the thai title and introduction texts (they have important thai keywords in them) Would really appreciate your help on this 😉 Regards, Menelik
On-Page Optimization | | menelik0 -
Duplicating content on multiple domains
Hey guys, I've started working with a new client recently called Resource Investing News. I'm more a Social Media person, though I do have SEO experience. RIN has about 40 URLs all of which have original news content published on them. One SEO-related issue that I can see here though is that the primary domain re-publishes all of the original content that the other URLs do. In other words: resourceinvestingnews.com will have an article on it that is also published on goldinvestingnews.com with the same date stamp and a link out to the original article. E.g. http://resourceinvestingnews.com/42539-molybdenum-goes-far-beyond-steelmaking.html http://molyinvestingnews.com/5301-molybdenum-steelmaking-vehicle-demand-electronics-lubricant.html Does anyone have an idea if this is something that should be reviewed and/or whether the content is being negatively affected in search? Many thanks!
On-Page Optimization | | blahblahblah20150 -
Duplicate Content- Best Practise Usage of the canonical url
Canonical urls stop self competition - from duplicate content. So instead of a 2 pages with a rank of 5 out of 10, it is one page with a rank of 7 out of 10.
On-Page Optimization | | WMA
However what disadvantages come from using canonical urls. For example am I excluding some products like green widet, blue widget. I have a customer with 2 e-commerce websites(selling different manufacturers of a type jewellery). Both websites have massive duplicate content issues.
It is a hosted CMS system with very little SEO functionality, no plugins etc. The crawling report- comes back with 1000 of pages that are duplicates. It seems that almost every page on the website has a duplicate partner or more. The problem starts in that they have 2 categorys for each product type, instead of one category for each product type.
A wholesale category and a small pack category. So I have considered using a canonical url or de-optimizing the small pack category as I believe it receives less traffic than the whole category. On the original website I tried de- optimizing one of the pages that gets less traffic. I did this by changing the order of the meta title(keyword at the back, not front- by using small to start of with). I also removed content from the page. This helped a bit. Or I was thinking about just using a canonical url on the page that gets less traffic.
However what are the implications of this? What happens if some one searches for "small packs" of the product- will this no longer be indexed as a page. The next problem I have is the other 1000s of pages that are showing as duplicates. These are all the different products within the categories. The CMS does not have a front office that allows for canonical urls to be inserted. Instead it would have to be done going into the html of the pages. This would take ages. Another issue is that these product pages are not actually duplicate, but I think it is because they have such little content- that the rodger(seo moz crawler, and probably googles one too) cant tell the difference.
Also even if I did use the canonical url - what happened if people searched for the product by attributes(the variations of each product type)- like blue widget, black widget, brown widget. Would these all be excluded from Googles index.
On the one hand I want to get rid of the duplicate content, but I also want to have these pages included in the search. Perhaps I am taking too idealistic approach- trying to optimize a website for too many keywords. Should I just focus on the category keywords, and forget about product variations. Perhaps I look into Google Analytics, to determine the top landing pages, and which ones should be applied with a canonical. Also this website(hosted CMS) seems to have more duplicate content issues than I have seen with other e-commerce sites that I have applied SEO MOZ to On final related question. The first website has 2 landing pages- I think this is a techical issue. For example www.test.com and www.test.com/index. I realise I should use a canonical url on the page that gets less traffic. How do I determine this? (or should I just use the SEO MOZ Page rank tool?)0 -
Is is it true that Google will not penalize duplicated content found in UL and LI tags?
I've read in a few places now that if you absolutely have to use a key term several times in a piece of copy, then it is preferable to use li and ul tags, as google will not penalise excessive density of keywords found in these tags. Does anyone know if there is any truth in this?
On-Page Optimization | | jdjamie0 -
What is html text?
I am using the "page attributes" tool for a guideline for what I am filling in on each page of my website. I feel like I've done my homework on this, but I can't figure out what "html text" is. URL Page Title Meta Description Meta Keywords H1 H2 **HTML Text ** ... not sure what to put here, how long it should be, what it's purpose is, etc. Any SEOMoz links helping me figure that out? I'm not finding it in the "Basics of Search Engine Design" article posted here on SEOMoz.
On-Page Optimization | | amandahx20