Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Duplicate Content From Indexing of non- File Extension Page
-
Google somehow has indexed a page of mine without the .html extension. so they indexed www.samplepage.com/page, so I am showing duplicate content because Google also see's www.samplepage.com/page.html How can I force google or bing or whoever to only index and see the page including the .html extension? I know people are saying not to use the file extension on pages, but I want to, so please anybody...HELP!!!
-
Yeah I looked further into the URL removal, but I guess technically I did not meet the criteria....and honestly I am fearful other potential implications of removal....I guess I will just have to wait for the 301 to ick in. I just cant believe there is not a simple .htaccess code to cause all URL's to show the .html extension. I mean it is a simple thing to implement the reverse and have the extension dropped...I mean....good lord...
Thanks for all your help though Mike, I truly appreciate the efforts!
-
LAME! You may just want to let the 301 redirect you have in place take its course or remove the URL from Google's index since it was added by mistake anyway.
Mike
-
Nope. .....good lord....
-
Nope.
-
If that does not work, give this a whirl:
RewriteCond %{REQUEST_URI} !\.[a-zA-Z0-9]{3,4}
RewriteCond %{REQUEST_URI} !/$
RewriteRule ^(.*)$ $1.html
-
Try:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]*[^./])$ /$1.html [R=301,L] -
That caused the same "500 Internal Server Error" .......
-
Try my code without all the other redirects, and see if it works. If it does, then add back the other redirects one by one until everything works.
-
Oh, and my site auditor is seeing it as a directory with a file in it??? Ugghhh....
-
Nope. Didn't work. I am seriously about to lose my mind with this....
-
Maybe give this a whirl:
If URL does not contain a period or end with a slash
RewriteCond %{REQUEST_URI} !(.|/$)
append .html to requested URL
RewriteRule (.*) /$1.html [L]
-
I get a server error when I do this? Sooo confused... Here is the htaccess changes I made. FYI...I have removed the code you told me to put in there temporarily so the site's not down. I attached the server error screenshot too...
Options +FollowSymlinks
RewriteEngine OnRewriteCond %{REQUEST_URI} ! .html$
RewriteCond %{REQUEST_URI} ! /$
RewriteRule ^(.*)$ $1.htmlRewriteCond %{HTTP_HOST} ^hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index.html\ HTTP/
RewriteRule ^index.html$ http://www.hanneganremodeling.com/ [R=301,L]RewriteBase /
RewriteCond %{HTTP_HOST} ^hanneganremodeling.com$ [NC]
RewriteRule ^(.*)$ http://www.hanneganremodeling.com/$1 [R=301,L] -
You repeat this code a few times, maybe that's the problem? Pretty sure you only need it once:
RewriteEngine On
Options +FollowSymlinks
RewriteBase /The line:
RewriteEngine On
Also only needs to be included once in an htaccess file. You may want to remove all the other instances.
Try adding this code at the very top, after the first "RewriteEngine On":
RewriteCond %{REQUEST_URI} ! .html$
RewriteCond %{REQUEST_URI} ! /$
RewriteRule ^(.*)$ $1.html -
Thanks Mike, you are awesome! I actually was thinking to do that, but I was concerned that it might have some larger implications?
I also just resubmitted a sitemap so hopefully that "might" speed up the crawl process...
Thanks again!
-
"I accidentally manually submitted the url to google and manually in submitted it to index and that when this issue began...."
It sounds like you accidently added this URL to the index. You can follow the procedure outlined below to request Google remove the specific URL from the index:
https://support.google.com/webmasters/bin/answer.py?hl=en&answer=59819
I checked your site's structure using Screaming Frog and it does not appear that you are linking to any non-.html versions. If I perform a scan using one of your non-.html pages, it appears that it only links to itself.
Since you have the 301 redirect in place, you can choose to wait it out and Google should correct things eventually; otherwise, requesting Google remove the URL is a faster... PERMANENT process.
Good luck.
Mike
-
No it's not a wordpress, it was created with Dreamweaver. I didn't make sample and sample.html same page, but google is treating it that way.... I have implemented the 301, so I guess I just have to wait for a crawl
-
Thank you very for your input! When I implement into my .htacces what you suggested I get a "Internet 500 Server Error" ? Maybe it would help if I list what I currently have in my .htaccess I had to redirect some old domains and did canonical redirects and default non .index....I hope this help, I am at my wit's end... I also attached a screenshot of the webmaster warning... THANKS!!!
Options +FollowSymlinks
RewriteEngine OnRewriteCond %{HTTP_HOST} ^hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteEngine on
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index.html\ HTTP/
RewriteRule ^index.html$ http://www.hanneganremodeling.com/ [R=301,L]RewriteEngine On
Options +FollowSymlinks
RewriteBase /
RewriteCond %{HTTP_HOST} ^hanneganremodeling.com$ [NC]
RewriteRule ^(.*)$ http://www.hanneganremodeling.com/$1 [R=301,L]Options +FollowSymLinks
RewriteEngine On
RewriteBase / -
Is this a wordpress based site ? What CMS are you using ? How were you able to get domain.com/sample and domain.com/sample.html be the same page ? Either way, canonical tag is the correct solution in this case. There's no need for a 301 and if you do 301 redirects, you are not really fixing the issue caused by your CMS System.
I would therefore strongly advise to use the canonical tag. That's the intended use of that tag.
-
A canonical tag won't physically redirect you when you visit the page, it just lets the search engines know which is the right page to index.
If you want to actually redirect using .htaccess, try using this code
RewriteEngine On
RewriteCond %{REQUEST_URI} ! .html$
RewriteCond %{REQUEST_URI} ! /$
RewriteRule ^(.*)$ $1.html
-
I tried the canonical and when I enter the url without the .html, it doesn't resolve to the url with the .html extension. I tried an .htaccess reirect...I am stumped, I can't get it to redirect automatically the the .html I accidentally manually submitted the url to google and manually in submitted it to index and that when this issue began....
-
Add a canonical tag to your header so that Google/Bing knows which version of your page they should be indexing.
You can also try looking into where the link to the non-html page is coming from. If it's an internal link, just change it so that Google doesn't continue to crawl it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page with metatag noindex is STILL being indexed?!
Hi Mozers, There are over 200 pages from our site that have a meta tag "noindex" but are STILL being indexed. What else can I do to remove them from the Index?
Intermediate & Advanced SEO | | yaelslater0 -
No Index thousands of thin content pages?
Hello all! I'm working on a site that features a service marketed to community leaders that allows the citizens of that community log 311 type issues such as potholes, broken streetlights, etc. The "marketing" front of the site is 10-12 pages of content to be optimized for the community leader searchers however, as you can imagine there are thousands and thousands of pages of one or two line complaints such as, "There is a pothole on Main St. and 3rd." These complaint pages are not about the service, and I'm thinking not helpful to my end goal of gaining awareness of the service through search for the community leaders. Community leaders are searching for "311 request service", not "potholes on main street". Should all of these "complaint" pages be NOINDEX'd? What if there are a number of quality links pointing to the complaint pages? Do I have to worry about losing Domain Authority if I do NOINDEX them? Thanks for any input. Ken
Intermediate & Advanced SEO | | KenSchaefer0 -
Password Protected Page(s) Indexed
Hi, I am wondering if my website can get a penalty if some password protected pages are showing up when I search on google: site:www.example.com/sub-group/pass-word-protected-page That shows that my password protected page was indexed either before or after adding the password protection. I've seen people suggest no indexing the page. Is that the best method to take care of this? What if we are planning on pushing the page live later on? All of these pages have no title tag, meta description, image alt text, etc. Should I add them for each page? I am wondering what is the best step, especially if we are planning on pushing the page(s) live. Thanks for any help!
Intermediate & Advanced SEO | | aua0 -
Same content, different languages. Duplicate content issue? | international SEO
Hi, If the "content" is the same, but is written in different languages, will Google see the articles as duplicate content?
Intermediate & Advanced SEO | | chalet
If google won't see it as duplicate content. What is the profit of implementing the alternate lang tag?Kind regards,Jeroen0 -
How to check if the page is indexable for SEs?
Hi, I'm building the extension for Chrome, which should show me the status of the indexability of the page I'm on. So, I need to know all the methods to check if the page has the potential to be crawled and indexed by a Search Engines. I've come up with a few methods: Check the URL in robots.txt file (if it's not disallowed) Check page metas (if there are not noindex meta) Check if page is the same for unregistered users (for those pages only available for registered users of the site) Are there any more methods to check if a particular page is indexable (or not closed for indexation) by Search Engines? Thanks in advance!
Intermediate & Advanced SEO | | boostaman0 -
Google indexing pages from chrome history ?
We have pages that are not linked from site yet they are indexed in Google. It could be possible if Google got these pages from browser. Does Google takes data from chrome?
Intermediate & Advanced SEO | | vivekrathore0 -
Tabs and duplicate content?
We own this site http://www.discountstickerprinting.co.uk/ and just a little concerned as I right clicked open in new tab on the tab content section and it went to a new page For example if you right click on the price tab and click open in new tab you will end up with the url
Intermediate & Advanced SEO | | BobAnderson
http://www.discountstickerprinting.co.uk/#tabThree Does this mean that our content is being duplicated onto another page? If so what should I do?0 -
How to Remove Joomla Canonical and Duplicate Page Content
I've attempted to follow advice from the Q&A section. Currently on the site www.cherrycreekspine.com, I've edited the .htaccess file to help with 301s - all pages redirect to www.cherrycreekspine.com. Secondly, I'd added the canonical statement in the header of the web pages. I have cut the Duplicate Page Content in half ... now I have a remaining 40 pages to fix up. This is my practice site to try and understand what SEOmoz can do for me. I've looked at some of your videos on Youtube ... I feel like I'm scrambling around to the Q&A and the internet to understand this product. I'm reading the beginners guide.... any other resources would be helpful.
Intermediate & Advanced SEO | | deskstudio0