How to resolve Duplicate Page Content issue for root domain & index.html?
-
SEOMoz returns a Duplicate Page Content error for a website's index page, with both domain.com and domain.com/index.html isted seperately. We had a rewrite in the htacess file, but for some reason this has not had an impact and we have since removed it. What's the best way (in an HTML website) to ensure all index.html links are automatically redirected to the root domain and these aren't seen as two separate pages?
-
great code Josh...but , after i saved it on .htaccess , a "?" appeared on the link..
http://www.domain.com/?/example/file.html
Is this ok ? pls advice/
Thank you,
-
You touched on a good point here "We set up our site to utilize a index redirect for all of our sub directories as well, so with this method you simply name your sub directories to match the url path that you desire. Each sub directory has it's own index which you redirect with a variation of the above code. By doing this you can have nice clean url paths like http://www.semclix.com/design/ecommerce/ - and mitigate the duplicate content issue. We hope that this helps."
Too often I see sites where they get the home page right but miss the re-write on the directories.
-
Here's the .htaccess rewrite command that you can use for the index.html redirect -
Options +FollowSymlinks RewriteEngine on
Index Rewrite RewriteRule ^index.(htm|html|php) http://www.amarasoftware.com/ [R=301,L] RewriteRule ^(.*)/index.(htm|html|php) http://www.amarasoftware.com/$1/ [R=301,L]
We set up our site to utilize a index redirect for all of our sub directories as well, so with this method you simply name your sub directories to match the url path that you desire. Each sub directory has it's own index which you redirect with a variation of the above code. By doing this you can have nice clean url paths like http://www.semclix.com/design/ecommerce/ - and mitigate the duplicate content issue. We hope that this helps.
-
I'd check it with some other software too... i.e. Raven Tools free trial or something, that will tell you if there's canonicalization problems... of course I'm not advocating Raven Tools over SEOmoz tools (I'm a member here and not there for good reasons), I just think best to try a few different tests before deciding if it's a problem. There might just be an issue with the SEOmoz campaign tool for the moment, which I'm sure they'll fix as soon as they realise.
Hey, aren't you the tutor I had in my SEC usability course?
-
Unfortunately I can't speak for how SEOmoz handles rewrites like this if it's already crawled the page.
The rewrite rule you're using looks like it's only rewriting the www portion of the URL, not index.html. So alone it wouldn't do anything to solve dupe content issues. (someone please correct me if I'm misreading the rewrite rule)
Here's a link to what I used to write a redirect for index.html on another site.
http://www.webmasterworld.com/forum92/6375.htm
I think it is a fairly safe assumption to make that SEOmoz is smart enough to realize if you're got a redirect in there (providing that its working). I'd still recommend taking a look to see if Google has cached or indexed an index.html version, though.
Edit: my personal, highly technical, acid-test for an index.html redirect is just going there and manually entering the url with index.html on the end, rather than waiting for a recrawl to see if you're heading in the right direction.
-
RewriteEngine on RewriteCond %{HTTP_HOST} ^([a-z.]+)?amarasoftware.com$ [NC] RewriteCond %{HTTP_HOST} !^www. [NC] RewriteRule .? http://www.%1amarasoftware.com%{REQUEST_URI} [R=301,L] Is what I use. In Seomoz this leads to www.amarasoftware.com and index.html so 2 different URL's, both with different incoming links, and a different authority, which has an impact on my ranking if correct. in SEomoz this a returns a duplicate title and meta tags errors. If SEOmoz finds 2 pages instead of one I may assume that Google agrees with this.
-
As you did, I'd normally handle this with a 301 from index.html to the root domain. When you say that it's "not had an impact" do you mean that the SEOmoz dashboard continues to show an error after it re-crawls, or that the search engines are not picking up the redirect?
SEOmoz dashboard does a great job, but I'd check to see how the search engines are actually indexing yourdomain.com/index.html vs. yourdomain.com also. If the search engines are indexing it as you want them to, then I'd be inclined to ignore the dashboard error.
I apologize if this is a stupid question, but I assume you manually checked that the redirect worked?
-
You wish to canonicalize the pages. That is the SEO word which describes exactly what you are trying to achieve.
Above are 5 URLs which can possibly lead to the exact same page. If you add the following HTML in the code then the pages will be canonicalized.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Penalties for duplicate content
Hello!We have a website with various city tours and activities listed on a single page (http://vaiduokliai.lt/). The list changes accordingly depending on filtering (birthday in Vilnius, bachelor party in Kaunas, etc.). The URL doesn't change. Content changes dynamically. We need to make URL visible for each category, then optimize it for different keywords (for example city tours in Vilnius for a list of tours and activities in Vilnius with appropriate URL /tours-in-Vilnius).The problem is that activities overlap very often in different categories, so there will be a lot of duplicate content on different pages. In such case, how severe penalty could be for duplicate content?
Intermediate & Advanced SEO | | jpuzakov0 -
Date of page first indexed or age of a page?
Hi does anyone know any ways, tools to find when a page was first indexed/cached by Google? I remember a while back, around 2009 i had a firefox plugin which could check this, and gave you a exact date. Maybe this has changed since. I don't remember the plugin. Or any recommendations on finding the age of a page (not domain) for a website? This is for competitor research not my own website. Cheers, Paul
Intermediate & Advanced SEO | | MBASydney0 -
Duplicate page content errors stemming from CMS
Hello! We've recently relaunched (and completely restructured) our website. All looks well except for some duplicate content issues. Our internal CMS (custom) adds a /content/ to each page. Our development team has also set-up URLs to work without /content/. Is there a way I can tell Google that these are the same pages. I looked into the parameters tool, but that seemed more in-line with ecommerce and the like. Am I missing anything else?
Intermediate & Advanced SEO | | taylor.craig0 -
Duplicate content for hotel websites - the usual nightmare? is there any solution other than producing unique content?
Hiya Mozzers I often work for hotels. A common scenario is the hotel / resort has worked with their Property Management System to distribute their booking availability around the web... to third party booking sites - with the inventory goes duplicate page descriptions sent to these "partner" websites. I was just checking duplication on a room description - 20 loads of duplicate descriptions for that page alone - there are 200 rooms - so I'm probably looking at 4,000 loads of duplicate content that need rewriting to prevent duplicate content penalties, which will cost a huge amount of money. Is there any other solution? Perhaps ask booking sites to block relevant pages from search engines?
Intermediate & Advanced SEO | | McTaggart0 -
All Thin Content removed and duplicate content replaced. But still no success?
Good morning, Over the last three months i have gone about replacing and removing all the duplicate content (1000+ page) from our site top4office.co.uk. Now it been just under 2 months since we made all the changes and we still are not showing any improvements in the SERPS. Can anyone tell me why we aren't making any progress or spot something we are not doing correctly? Another problem is that although we have removed 3000+ pages using the removal tool searching site:top4office.co.uk still shows 2800 pages indexed (before there was 3500). Look forward to your responses!
Intermediate & Advanced SEO | | apogeecorp0 -
Can I Use Cross Domain Canonical For Duplicate Categories & Product Pages?
I want to fix issue regarding duplicate categories & product pages on my multiple eCommerce websites. http://www.vistastores.com/patio-umbrellas-fiberbuilt-umbrellas-llc-7gcrw-teal.html - Want to rank with this... http://www.vistapatioumbrellas.com/patio-umbrellas-fiberbuilt-umbrellas-llc-7gcrw-teal.html - Duplicate one! http://www.vistastores.com/patio-umbrellas - Want to rank with this... http://www.vistapatioumbrellas.com/patio-umbrellas - Duplicate one!
Intermediate & Advanced SEO | | CommercePundit0 -
BEING PROACTIVE ABOUT CONTENT DUPLICATION...
So we all know that duplicate content is bad for SEO. I was just thinking... Whenever I post new content to a blog, website page etc...there should be something I should be able to do to tell Google (in fact all search engines) that I just created and posted this content to the web... that I am the original source .... so if anyone else copies it they get penalised and not me... Would appreciate your answers... 🙂 regards,
Intermediate & Advanced SEO | | TopGearMedia0 -
Duplicate Content, Campaign Explorer & Rel Canonical
Google Advises to use Rel Canonical URL's to advise them which page with similiar information is more relevant. You are supposed to put a rel canonical on the non-preferred pages to point back to the desired page. How do you handle this with a product catalog using ajax, where the additional pages do not exist? An example would be: <colgroup><col width="470"></colgroup>
Intermediate & Advanced SEO | | eric_since1910.com
| .com/productcategory.aspx?page=1 /productcategory.aspx?page=2 /productcategory.aspx?page=3 /productcategory.aspx?page=4 The page=1,2,3 and 4 do not physically exist, they are simply referencing additional products I have rel canonical urls' on the main page www.examplesite.com/productcategory.aspx, but I am not 100% sure this is correct or how else it could be handled. Any Ideas Pro mozzers? |0