Duplicate Content for index.html
-
In the Crawl Diagnostics Summary, it says that I have two pages with duplicate content which are:
I read in a Dream Weaver tutorial that you should name your home page "index.html" and then you can let www.mywebsite.com automatically direct the user to index.html. Is this a bug in SEOMoz's crawler or is it a real problem with my site?
Thank you,
Dan
-
The code should definitely go into the websites root directory's .htaccess, however .htaccess can be weird, a few days ago I ran into a similar issue with a client's website, and I was able to remedy the issue with a variation of the code.
index Redirect RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)index.(php|html|htm|asp)\ HTTP/ RewriteRule ^(([^/]+/))index.(php|html|htm|asp)$ http://yoursite.com/$1 [R=301,L]
If you give me the URL for the site I will take a look at it and let you know what would be feasible.
-
Hi Daniel, can you share with us the URL of your site? We can take a look at it and give you a more precise answer that way. Thanks!
-
I eventually figured out that your method was a 301 redirect and I definitely broke my site trying to use the code you posted. .. haha. Its ok though. I just removed the code and it went back to normal. At first, I was editing the .htaccess file in the public_html folder which wasnt working. Then I tried the root folder for the site (I created the .htaccess file since it did not exist.) Neither of those worked. (I am using Bluehost so I do not think that I have root access and I am not sure if it is a Linux server or not.)
If there is an easy way to explain what I am doing wrong, please do so. Otherwise, I will use canonical.
Thanks for everything!
-
@Dan
Thanks for your reply. It seems like there are lots of different ways to solve this problem. I just watched this video on Matt Cutt's blog where he discusses his preference for 301 redirects over rel canonical tag.
Where would you say your solution fits in?
sorry about the delay of this response, i didn't realize the that you were asking me a question right away. When placing the code I provided in my previous answer this will cause a 301 perminant redirect to the original URL. That's actually what the
[R=301,L]
portion of the code is stating (R) redirect (301) status is referring to. After reviewing the Matt Cutts video, I realize that I should have asked you if you were operating on a Linux server that you had root access to. We actually utilize both redirects and canonical tags since it was recommended by the on-page optimization reports. Heck Google uses them, I would assume because it's easier for the user to be referred to a single page URL. Obviously though if you don't have server header access, and are not familiar with .htaccess (you can accidentally break your site) then the canonical solution is appropriate
-
Josh,
Thanks for your reply. It seems like there are lots of different ways to solve this problem. I just watched this video on Matt Cutt's blog where he discusses his preference for 301 redirects over rel canonical tag.
Where would you say your solution fits in?
Thanks,
Dan -
use the link rel tag for all my homepages for the http://www.yoursite.com
-
Odd enough I just recently answered this question. The SEOmoz crawler is correct, because without a redirect you will be able to access both versions of the page in your browser.
To resolve this issue simply rewrite the index.html to the root url by placing the following code into your .htaccess file into your root directory.
Options +FollowSymlinks RewriteEngine on
Index Rewrite RewriteRule ^index.(htm|html|php) http://www.yoursite.com/ [R=301,L] RewriteRule ^(.*)/index.(htm|html|php) http://www.yoursite.com/$1/ [R=301,L]
You can also do the same with the index file in any subdirectories that you might create, by simply placing a .htaccess into those sub directories and using variations of the above code. This is how you create nice tight URLs without the duplicate content issue that look like - http://www.semclix.com/design/business/
-
It is a problem which you need to fix. You need to canonicalize your pages.
Those are all various URLs which most likely lead to the same web page. I say "most likely" because these URLs can actually lead to different pages.
You need to tell crawlers and search engines how you organize your site. There are several ways to achieve canonicalization. The method I prefer is to add the following line of code to each page:
The URL provided should be the preferred URL for your page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirects Not Working / Issue with Duplicate Page Titles
Hi all We are being penalised on Webmaster Tools and Crawl Diagnostics for duplicate page titles and I'm not sure how to fix it.We recently switched from HTTP to HTTPS, but when we first switched over, we accidentally set a permanent redirect from HTTPS to HTTP for a week or so(!).We now have a permanent redirect going the other way, HTTP to HTTPS, and we also have canonical tags in place to redirect to HTTPS.Unfortunately, it seems that because of this short time with the permanent redirect the wrong way round, Google is confused as sees our http and https sites as duplicate content.Is there any way to get Google to recognise this new (correct) permanent redirect and completely forget the old (incorrect) one?Any ideas welcome!
Web Design | | HireSpace0 -
Fixing my sites problem with duplicate page content
My site has a problem with duplicate page content. SEO MOZ is telling me 725 pages worth. I have looked a lot into the 301 Re direct and the Rel=canonical Tag and I have a few questions: First of all, I'm not sure which on I should use in this case. I have read that the 301 Redirect is the most popular path to take. If I take this path do I need to go in and change the URL of each of these pages or does it automatically change with in the redirect when I plug in the old URL and the new one? Also, do I need to just go to each page that SEO MOZ is telling me is a duplicate and make a redirect of that page? One thing that I am very confused about is the fact that some of these duplicates listed out are actually different pages on my site. So does this just mean the URL's are too similar to each other, and there fore need the redirect to fix them? Then on the other hand I have a log in page that says it has 50 duplicates. Would this be a case in which I would use the Canonical Tag and would put it into each duplicate so that the SE knew to go to the original file? Sorry for all of the questions in this. Thank you for any responses.
Web Design | | JoshMaxAmps0 -
How much does on-site duplicated content affect SERPs?
Hi, We've recently gotten into Moz, with our E-commerce websites, and discovered that it's crawler takes note of about 2500 pages which it thinks are the same (duplicated). We've now begun to completely rewrite every description of every product (including Meta Title/Description) so that this number may be reduced. Since this is the biggest issue Moz spots I'm wondering what the effect of fixing it will be on our position in the SERP (mainly Google). Does anybody have some stories or experience about this topic? Thanks in Advance! 🙂 Alexander
Web Design | | WebmasterAlex0 -
Increasing content, adding rich snippets... and losing tremendous amounts of organic traffic. Help!
I know dramatic losses in organic traffic is a common occurrence, but having looked through the archives I'm not sure that there's a recent case that replicates my situation. I've been working to increase the content on my company's website and to advise it on online marketing practices. To that end, in the past four months, I've created about 20% more pages — most of which are very high quality blog posts; adopted some rich snippets (though not all that I would like to see at this point); improved and increased internal links within the site; removed some "suspicious" pages as id'd by Moz that had a lot of links on it (although the content was actually genuine navigation); and I've also begun to guest blog. All of the blog content I've written has been connected to my G+ account, including most of the guest blogging. And... our organic traffic is preciptiously declining. Across the board. I'm befuddled. I can see no warnings (redirects &c) that would explain this. We haven't changed the site structure much — I think the most invasive thing we did was optimize our title tags! So no URL changes, nothing. Obviously, we're all questioning all the work I've done. It just seems like we've sunk SO much energy into "doing the right thing" to no effect (this site was slammed before for its shady backlink buying — though not from any direct penalty, just as a result of the Penguin update). We noticed traffic taking a particular plunge at the beginning of June. Can anyone offer insights? Very much appreciated.
Web Design | | Novos_Jay0 -
How to change the entire contents and design in my site without getting troubles with google?
Hello everyone This is my first post over here. In the next few weeks we going to change the entire content and design in our site. The site has 240 pages with poor contents and design. Except 301 redirects for all the old url’s I wanted to consult with you what is the right way to do it without harm my organic traffic that come from google? How google refers to this kind of changes? Which steps should I need to take to do it properly? Hope to get your help in the issue. Tahnks in advance.
Web Design | | JonsonSwartz0 -
Question #1: Does Google index https:// pages? I thought they didn't because....
generally the difference between https:// and http:// is that the s (stands for secure I think) is usually reserved for payment pages, and other similar types of pages that search engines aren't supposed to index. (like any page where private data is stored) My site that all of my questions are revolving around is built with Volusion (i'm used to wordpress) and I keep finding problems like this one. The site was hardcoded to have all MENU internal links (which was 90% of our internal links) lead to **https://**www.example.com/example-page/ instead of **http://**www.example.com/example-page/ To double check that this was causing a loss in Link Juice. I jumped over to OSE. Sure enough, the internal links were not being indexed, only the links that were manually created and set to NOT include the httpS:// were being indexed. So if OSE wasn't counting the links, and based on the general ideology behind secure http access, that would infer that no link juice is being passed... Right?? Thanks for your time. Screens are available if necessary, but the OSE has already been updated since then and the new internal links ARE STILL NOT being indexed. The problem is.. is this a volusion problem? Should I switch to Wordpress? here's the site URL (please excuse the design, it's pretty ugly considering how basic volusion is compared to wordpress) http://www.uncommonthread.com/
Web Design | | TylerAbernethy0 -
How can i write content rich descriptions?
we have recently started using seomoz. how can i make descriptions more content rich?
Web Design | | WCGAdmin0 -
SEO tricks for a one page site with commented html content
Hi, I am building a website that is very similar to madebysofa.com : means it is one page site with entire content loaded (however are commented in html) and by clicking on sections it modify the DOM to make specific section visible. It is very interesting from UX point of view but as far as I know, since this way most of my content is always commented and hidden from crawlers, I will loose points regarding SEO. Is there any workaround you can recommend or you think sites like madebysofa.com are doomed to loose SEO points by nature? Best regards,
Web Design | | Ashkan10