Duplicate Content for index.html
-
In the Crawl Diagnostics Summary, it says that I have two pages with duplicate content which are:
I read in a Dream Weaver tutorial that you should name your home page "index.html" and then you can let www.mywebsite.com automatically direct the user to index.html. Is this a bug in SEOMoz's crawler or is it a real problem with my site?
Thank you,
Dan
-
The code should definitely go into the websites root directory's .htaccess, however .htaccess can be weird, a few days ago I ran into a similar issue with a client's website, and I was able to remedy the issue with a variation of the code.
index Redirect RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)index.(php|html|htm|asp)\ HTTP/ RewriteRule ^(([^/]+/))index.(php|html|htm|asp)$ http://yoursite.com/$1 [R=301,L]
If you give me the URL for the site I will take a look at it and let you know what would be feasible.
-
Hi Daniel, can you share with us the URL of your site? We can take a look at it and give you a more precise answer that way. Thanks!
-
I eventually figured out that your method was a 301 redirect and I definitely broke my site trying to use the code you posted. .. haha. Its ok though. I just removed the code and it went back to normal. At first, I was editing the .htaccess file in the public_html folder which wasnt working. Then I tried the root folder for the site (I created the .htaccess file since it did not exist.) Neither of those worked. (I am using Bluehost so I do not think that I have root access and I am not sure if it is a Linux server or not.)
If there is an easy way to explain what I am doing wrong, please do so. Otherwise, I will use canonical.
Thanks for everything!
-
@Dan
Thanks for your reply. It seems like there are lots of different ways to solve this problem. I just watched this video on Matt Cutt's blog where he discusses his preference for 301 redirects over rel canonical tag.
Where would you say your solution fits in?
sorry about the delay of this response, i didn't realize the that you were asking me a question right away. When placing the code I provided in my previous answer this will cause a 301 perminant redirect to the original URL. That's actually what the
[R=301,L]
portion of the code is stating (R) redirect (301) status is referring to. After reviewing the Matt Cutts video, I realize that I should have asked you if you were operating on a Linux server that you had root access to. We actually utilize both redirects and canonical tags since it was recommended by the on-page optimization reports. Heck Google uses them, I would assume because it's easier for the user to be referred to a single page URL. Obviously though if you don't have server header access, and are not familiar with .htaccess (you can accidentally break your site) then the canonical solution is appropriate
-
Josh,
Thanks for your reply. It seems like there are lots of different ways to solve this problem. I just watched this video on Matt Cutt's blog where he discusses his preference for 301 redirects over rel canonical tag.
Where would you say your solution fits in?
Thanks,
Dan -
use the link rel tag for all my homepages for the http://www.yoursite.com
-
Odd enough I just recently answered this question. The SEOmoz crawler is correct, because without a redirect you will be able to access both versions of the page in your browser.
To resolve this issue simply rewrite the index.html to the root url by placing the following code into your .htaccess file into your root directory.
Options +FollowSymlinks RewriteEngine on
Index Rewrite RewriteRule ^index.(htm|html|php) http://www.yoursite.com/ [R=301,L] RewriteRule ^(.*)/index.(htm|html|php) http://www.yoursite.com/$1/ [R=301,L]
You can also do the same with the index file in any subdirectories that you might create, by simply placing a .htaccess into those sub directories and using variations of the above code. This is how you create nice tight URLs without the duplicate content issue that look like - http://www.semclix.com/design/business/
-
It is a problem which you need to fix. You need to canonicalize your pages.
Those are all various URLs which most likely lead to the same web page. I say "most likely" because these URLs can actually lead to different pages.
You need to tell crawlers and search engines how you organize your site. There are several ways to achieve canonicalization. The method I prefer is to add the following line of code to each page:
The URL provided should be the preferred URL for your page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Title Issues using # anchor tags
Our homepage navigation uses anchor tags (?TabNumb=1#, ?TabNumb=1# etc) rather than directly linking to different pages to decrease load time (and simplify the build process I owuld imagine). These anchor links are showing up as duplicate titles in Moz. I am pretty sure if I were to use noindex or rel tags, that could have a negative affect on my search results. Any way to tackle this outside of a complete redesign of the structure? http://www.dedoose.com/about-us/?TabNum=2# as an example
Web Design | | sbnjl0 -
Content thin for new home page been told to change it? any suggestions?
Hi guys, I'm newbie.... I have been told that my home page is content thin, and if I want to rank really well in the search i need to have more relevant content on my homepage - the site is only new 2months and I can see we are now at 39th place in the search, if i make changes to the home page design and add more content will this effect this current ranking?
Web Design | | edward-may0 -
Should Blog Category Archive URLs be Set to "No-Index" in Wordpress?
It appears that Google Webmaster Tools is listing about 120 blog archives URLs in Google Index>Index Status that should not be listed. Our site map contains 650 pages, but Google shows 860. Pages like: <colgroup><col width="464"></colgroup>
Web Design | | Kingalan1
| http://www.nyc-officespace-leader.com/blog/category/manhattan-office-space | With Titles Like: <colgroup><col width="454"></colgroup>
| Manhattan Office Space Archives - Metro Manhattan Office Space | Are listed when in the Rogerbot crawl report for the site. How can we remove such pages from Google Webmaster Tools, Index Status? Our site map shows about 650 pages, yet Google show these extra pages. We would prefer that they not be indexed. Note that these pages do not appear when we run a site:www.nyc-officespace-leader.com search. The site has suffered a drop in ranking since May and we feel it prudent to keep Google from indexing useless URLs. Before May 650 pages showed on the Webmaster Tools Index status, and suddenly in early June when we upgraded the site the index grew by about 175 pages. I suspect the 120 blog archives URLs may have something to do with it. How can we get them removed? Can we set them to "No-Index", or should the robot text be used to remove them? Or can some type of removal request be made to Google? My developers have been struggling with this issue since early June. The bloat on the site is about 175 URLs not on the site map. Is there any go to authority on this issue (it is apparently rather complicated) that can provide a definitive answer? Thanks!!
Alan0 -
Why is Google Webmaster suddenly started showing hundreds of HTML Improvements
Why is Google Webmaster suddenly started showing hundreds of HTML Improvements I mean to ask, my hundreds pages are been shown as duplicate - despite canonical marked correctly Below are sample url - which are been crawled in own way. I have rechecked canonical tag - which is correct as URL - 1, in all 3 url Do i need to worry about anything or shall i presume its a flaw from search engine to report this as an issue (This only pertain to Forum section) http://www.mycarhelpline.com/index.php?option=com_easydiscuss&view=post&id=1683&Itemid=78 http://www.mycarhelpline.com/?id=1683&Itemid=78&option=com_easydiscuss&view=post http://www.mycarhelpline.com/index.php?option=com_easydiscuss&view=post&id=1683 ps - i know these are dynamic url and not sef friendly url, but its been 3 yrs and , due to our ignorance and site builder took advantage of this. now - nothing can be done much to make them sef friendly as site has several thousand pages and touchwood - these dynamic url are not impacting much
Web Design | | Modi0 -
Is it bad to have /index.php at the end of a uri?
Is it bad for SEO if traffic is directed to "http://www.example.com/someuri/index.php" instead of "http://www.example.com/someuri/" and would it be works setting up a redirect rule at htaccess level?
Web Design | | NoisyLittleMonkey1 -
Question #1: Does Google index https:// pages? I thought they didn't because....
generally the difference between https:// and http:// is that the s (stands for secure I think) is usually reserved for payment pages, and other similar types of pages that search engines aren't supposed to index. (like any page where private data is stored) My site that all of my questions are revolving around is built with Volusion (i'm used to wordpress) and I keep finding problems like this one. The site was hardcoded to have all MENU internal links (which was 90% of our internal links) lead to **https://**www.example.com/example-page/ instead of **http://**www.example.com/example-page/ To double check that this was causing a loss in Link Juice. I jumped over to OSE. Sure enough, the internal links were not being indexed, only the links that were manually created and set to NOT include the httpS:// were being indexed. So if OSE wasn't counting the links, and based on the general ideology behind secure http access, that would infer that no link juice is being passed... Right?? Thanks for your time. Screens are available if necessary, but the OSE has already been updated since then and the new internal links ARE STILL NOT being indexed. The problem is.. is this a volusion problem? Should I switch to Wordpress? here's the site URL (please excuse the design, it's pretty ugly considering how basic volusion is compared to wordpress) http://www.uncommonthread.com/
Web Design | | TylerAbernethy0 -
Content Stacking - CSS positioning
I was curious to know what everyone thinks about CSS positioning so that the spiders will read a optimal bulk of content first - before it reads the others. Say I have some Tab's set up for navigational purposes, where the content in the last tab is actually what I want the bots to see first. What would be the best practices for accomplishing something like this?
Web Design | | imageworks-2612900 -
How much content is too much? Best Pages For Content?
To my understanding content has a lot to do with organic rankings if written correctly. My question is, how much content is too much and what pages are best to place content. Our company sells very costly products. Our customers call to purchase, we do not have an eCommerce site. Write now we have on average 350 words per page. We have about 200+ pages. Each page is written for that general category and each product has its own unique content. It seems to me that the pages with less content, tend to rank a bit better. As we are in the process of redoing our website, is there any recommendations on writing content, or adjusting the amount of text. I am thinking a lot of our text is informative only to a certain extent. Would writing content just for the main category page be better, and then on the actual product page, have only about 250 words as a description? Are there any other recommendations for SEO that are fairly new? Besides the Title, Description, Heading Tags, Image Alts, URLS etc.
Web Design | | hfranz0