Duplicate Content From Indexing of non- File Extension Page
-
Google somehow has indexed a page of mine without the .html extension. so they indexed www.samplepage.com/page, so I am showing duplicate content because Google also see's www.samplepage.com/page.html How can I force google or bing or whoever to only index and see the page including the .html extension? I know people are saying not to use the file extension on pages, but I want to, so please anybody...HELP!!!
-
Yeah I looked further into the URL removal, but I guess technically I did not meet the criteria....and honestly I am fearful other potential implications of removal....I guess I will just have to wait for the 301 to ick in. I just cant believe there is not a simple .htaccess code to cause all URL's to show the .html extension. I mean it is a simple thing to implement the reverse and have the extension dropped...I mean....good lord...
Thanks for all your help though Mike, I truly appreciate the efforts!
-
LAME! You may just want to let the 301 redirect you have in place take its course or remove the URL from Google's index since it was added by mistake anyway.
Mike
-
Nope. .....good lord....
-
Nope.
-
If that does not work, give this a whirl:
RewriteCond %{REQUEST_URI} !\.[a-zA-Z0-9]{3,4}
RewriteCond %{REQUEST_URI} !/$
RewriteRule ^(.*)$ $1.html
-
Try:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]*[^./])$ /$1.html [R=301,L] -
That caused the same "500 Internal Server Error" .......
-
Try my code without all the other redirects, and see if it works. If it does, then add back the other redirects one by one until everything works.
-
Oh, and my site auditor is seeing it as a directory with a file in it??? Ugghhh....
-
Nope. Didn't work. I am seriously about to lose my mind with this....
-
Maybe give this a whirl:
If URL does not contain a period or end with a slash
RewriteCond %{REQUEST_URI} !(.|/$)
append .html to requested URL
RewriteRule (.*) /$1.html [L]
-
I get a server error when I do this? Sooo confused... Here is the htaccess changes I made. FYI...I have removed the code you told me to put in there temporarily so the site's not down. I attached the server error screenshot too...
Options +FollowSymlinks
RewriteEngine OnRewriteCond %{REQUEST_URI} ! .html$
RewriteCond %{REQUEST_URI} ! /$
RewriteRule ^(.*)$ $1.htmlRewriteCond %{HTTP_HOST} ^hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index.html\ HTTP/
RewriteRule ^index.html$ http://www.hanneganremodeling.com/ [R=301,L]RewriteBase /
RewriteCond %{HTTP_HOST} ^hanneganremodeling.com$ [NC]
RewriteRule ^(.*)$ http://www.hanneganremodeling.com/$1 [R=301,L] -
You repeat this code a few times, maybe that's the problem? Pretty sure you only need it once:
RewriteEngine On
Options +FollowSymlinks
RewriteBase /The line:
RewriteEngine On
Also only needs to be included once in an htaccess file. You may want to remove all the other instances.
Try adding this code at the very top, after the first "RewriteEngine On":
RewriteCond %{REQUEST_URI} ! .html$
RewriteCond %{REQUEST_URI} ! /$
RewriteRule ^(.*)$ $1.html -
Thanks Mike, you are awesome! I actually was thinking to do that, but I was concerned that it might have some larger implications?
I also just resubmitted a sitemap so hopefully that "might" speed up the crawl process...
Thanks again!
-
"I accidentally manually submitted the url to google and manually in submitted it to index and that when this issue began...."
It sounds like you accidently added this URL to the index. You can follow the procedure outlined below to request Google remove the specific URL from the index:
https://support.google.com/webmasters/bin/answer.py?hl=en&answer=59819
I checked your site's structure using Screaming Frog and it does not appear that you are linking to any non-.html versions. If I perform a scan using one of your non-.html pages, it appears that it only links to itself.
Since you have the 301 redirect in place, you can choose to wait it out and Google should correct things eventually; otherwise, requesting Google remove the URL is a faster... PERMANENT process.
Good luck.
Mike
-
No it's not a wordpress, it was created with Dreamweaver. I didn't make sample and sample.html same page, but google is treating it that way.... I have implemented the 301, so I guess I just have to wait for a crawl
-
Thank you very for your input! When I implement into my .htacces what you suggested I get a "Internet 500 Server Error" ? Maybe it would help if I list what I currently have in my .htaccess I had to redirect some old domains and did canonical redirects and default non .index....I hope this help, I am at my wit's end... I also attached a screenshot of the webmaster warning... THANKS!!!
Options +FollowSymlinks
RewriteEngine OnRewriteCond %{HTTP_HOST} ^hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteEngine on
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index.html\ HTTP/
RewriteRule ^index.html$ http://www.hanneganremodeling.com/ [R=301,L]RewriteEngine On
Options +FollowSymlinks
RewriteBase /
RewriteCond %{HTTP_HOST} ^hanneganremodeling.com$ [NC]
RewriteRule ^(.*)$ http://www.hanneganremodeling.com/$1 [R=301,L]Options +FollowSymLinks
RewriteEngine On
RewriteBase / -
Is this a wordpress based site ? What CMS are you using ? How were you able to get domain.com/sample and domain.com/sample.html be the same page ? Either way, canonical tag is the correct solution in this case. There's no need for a 301 and if you do 301 redirects, you are not really fixing the issue caused by your CMS System.
I would therefore strongly advise to use the canonical tag. That's the intended use of that tag.
-
A canonical tag won't physically redirect you when you visit the page, it just lets the search engines know which is the right page to index.
If you want to actually redirect using .htaccess, try using this code
RewriteEngine On
RewriteCond %{REQUEST_URI} ! .html$
RewriteCond %{REQUEST_URI} ! /$
RewriteRule ^(.*)$ $1.html
-
I tried the canonical and when I enter the url without the .html, it doesn't resolve to the url with the .html extension. I tried an .htaccess reirect...I am stumped, I can't get it to redirect automatically the the .html I accidentally manually submitted the url to google and manually in submitted it to index and that when this issue began....
-
Add a canonical tag to your header so that Google/Bing knows which version of your page they should be indexing.
You can also try looking into where the link to the non-html page is coming from. If it's an internal link, just change it so that Google doesn't continue to crawl it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Page Content
We have different plans that you can signup for - how can we rectify the duplicate page content and title issue here? Thanks. | http://signup.directiq.com/?plan=100 | 0 | 1 | 32 | 1 | 200 |
Intermediate & Advanced SEO | | directiq
| http://signup.directiq.com/?plan=104 | 0 | 1 | 32 | 1 | 200 |
| http://signup.directiq.com/?plan=116 | 0 | 1 | 32 | 1 | 200 |
| http://signup.directiq.com/?plan=117 | 0 | 1 | 32 | 1 | 200 |
| http://signup.directiq.com/?plan=102 | 0 | 1 | 32 | 1 | 200 |
| http://signup.directiq.com/?plan=119 | 0 | 1 | 32 | 1 | 200 |
| http://signup.directiq.com/?plan=101 | 0 | 1 | 32 | 1 | 200 |
| http://signup.directiq.com/?plan=103 | 0 | 1 | 32 | 1 | 200 |
| http://signup.directiq.com/?plan=5 |0 -
Duplicated privacy policy pages
I work for a small web agency and I noticed that many of the sites that we build have been using the same privacy policy. Obviously it can be a bit of a nightmare to write a unique privacy policy for each client so is Google likely to class this as duplicate content and result in a penalty? They must realise that privacy policies are likely to be the same or very similar as most legal writing tends to be! I can block the content in robots.txt or meta no-index it if necesarry but I just wanted to get some feedback to see if this is necessary!
Intermediate & Advanced SEO | | Jamie.Stevens1 -
[E-commerce] Duplicate content due to color variations (canonical/indexing)
Hello, We currently have a lot of color variations on multiple products with almost the same content. Even with our canonicals being set, Moz's crawling tool seems to flag them as duplicate content. What we have done so far: Choosing the best-selling color variation (our "master product") Adding a rel="canonical" to every variation (with our "master product" as the canonical URL) In my opinion, it should be enough to address this issue. However, being given the fact that it's flagged as duplicate by Moz, I was wondering if there is something else we should do? Should we add a "noindex,follow" to our child products and "index,follow" to our master product? (sounds to me like such a heavy change) Thank you in advance
Intermediate & Advanced SEO | | EasyLounge0 -
Duplicate content on yearly product models.
TL;DR - Is creating a page that has 80% of duplicated content from the past year's product model where 20% is about the new model changes going to be detrimental to duplicate content issues. Is there a better way to update minor yearly model changes and not have duplicated content? Full Question - We create landing pages for yearly products. Some years the models change drastically and other years there are only a few minor changes. The years where the product features change significantly is not an issue, it's when there isn't much of a change to the product description & I want to still rank on the new year searches. Since I don't want duplicate content by just adding the last year's model content to a new page and just changing the year (2013 to 2014) because there isn't much change with the model, I thought perhaps we could write a small paragraph describing the changes & then including the last year's description of the product. Since 80% of the content on the page will be duplicated from the last year's model, how detrimental do you think this would be for a duplicate content issue? The reason I'm leaving the old model up is to maintain the authority that page has and to still rank on the old model which is still sold. Does anyone else have any other better idea other than re-writing the same information over again in a different way with the few minor changes to the product added in.
Intermediate & Advanced SEO | | DCochrane0 -
Drop in indexed pages!
Hi everybody! I've been working on http://thewilddeckcompany.co.uk/ for a little while now. Until recently, everything was great - good rankings for the key terms of 'bird hides' and 'pond dipping platforms'. However, rankings have tanked over the past few days. I can't point my finger at it yet, but a site:thewilddeckcompany.co.uk search shows only three pages have been indexed. There's only 10 on the site, and it was fine beforehand. Any advice would be much appreciated,
Intermediate & Advanced SEO | | Blink-SEO0 -
How Long Does it Take for Rel Canonical to De-Index / Re-Index a Page?
Hi Mozzers, We have 2 e-commerce websites, Website A and Website B, sharing thousands of pages with duplicate product descriptions. Currently only the product pages on Website B are indexing, and we want Website A indexed instead. We added the rel canonical tag on each of Website B's product pages with a link towards the matching product on Page A. How long until Website B gets de-indexed and Website A gets indexed instead? Did we add the rel canonical tag correctly? Thanks!
Intermediate & Advanced SEO | | Travis-W0 -
Will pages irrelevant to a site's core content dilute SEO value of core pages?
We have a website with around 40 product pages. We also have around 300 pages with individual ingredients used for the products and on top of that we have some 400 pages of individual retailers which stock the products. Ingredient pages have same basic short info about the ingredients and the retail pages just have the retailer name, adress and content details. Question is, should I add noindex to all the ingredient and or retailer pages so that the focus is entirely on the product pages? Thanks for you help!
Intermediate & Advanced SEO | | ArchMedia0 -
How best to handle (legitimate) duplicate content?
Hi everyone, appreciate any thoughts on this. (bit long, sorry) Am working on 3 sites selling the same thing...main difference between each site is physical location/target market area (think North, South, West as an example) Now, say these 3 sites all sell Blue Widgets, and thus all on-page optimisation has been done for this keyword. These 3 sites are now effectively duplicates of each other - well the Blue Widgets page is at least, and whist there are no 'errors' in Webmaster Tools am pretty sure they ought to be ranking better than they are (good PA, DA, mR etc) Sites share the same template/look and feel too AND are accessed via same IP - just for good measure 🙂 So - to questions/thoughts. 1 - Is it enough to try and get creative with on-page changes to try and 'de-dupe' them? Kinda tricky with Blue Widgets example - how many ways can you say that? I could focus on geographical element a bit more, but would like to rank well for Blue Widgets generally. 2 - I could, i guess, no-index, no-follow, blue widgets page on 2 of the sites, seems a bit drastic though. (or robots.txt them) 3 - I could even link (via internal navigation) sites 2 and 3 to site 1 Blue Widgets page and thus make 2 blue widget pages redundant? 4 - Is there anything HTML coding wise i could do to pull in Site 1 content to sites 2 and 3, without cloaking or anything nasty like that? I think 1- is first thing to do. Anything else? Many thanks.
Intermediate & Advanced SEO | | Capote0