Duplicate Content From Indexing of non- File Extension Page
-
Google somehow has indexed a page of mine without the .html extension. so they indexed www.samplepage.com/page, so I am showing duplicate content because Google also see's www.samplepage.com/page.html How can I force google or bing or whoever to only index and see the page including the .html extension? I know people are saying not to use the file extension on pages, but I want to, so please anybody...HELP!!!
-
Yeah I looked further into the URL removal, but I guess technically I did not meet the criteria....and honestly I am fearful other potential implications of removal....I guess I will just have to wait for the 301 to ick in. I just cant believe there is not a simple .htaccess code to cause all URL's to show the .html extension. I mean it is a simple thing to implement the reverse and have the extension dropped...I mean....good lord...
Thanks for all your help though Mike, I truly appreciate the efforts!
-
LAME! You may just want to let the 301 redirect you have in place take its course or remove the URL from Google's index since it was added by mistake anyway.
Mike
-
Nope. .....good lord....
-
Nope.
-
If that does not work, give this a whirl:
RewriteCond %{REQUEST_URI} !\.[a-zA-Z0-9]{3,4}
RewriteCond %{REQUEST_URI} !/$
RewriteRule ^(.*)$ $1.html
-
Try:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]*[^./])$ /$1.html [R=301,L] -
That caused the same "500 Internal Server Error" .......
-
Try my code without all the other redirects, and see if it works. If it does, then add back the other redirects one by one until everything works.
-
Oh, and my site auditor is seeing it as a directory with a file in it??? Ugghhh....
-
Nope. Didn't work. I am seriously about to lose my mind with this....
-
Maybe give this a whirl:
If URL does not contain a period or end with a slash
RewriteCond %{REQUEST_URI} !(.|/$)
append .html to requested URL
RewriteRule (.*) /$1.html [L]
-
I get a server error when I do this? Sooo confused... Here is the htaccess changes I made. FYI...I have removed the code you told me to put in there temporarily so the site's not down. I attached the server error screenshot too...
Options +FollowSymlinks
RewriteEngine OnRewriteCond %{REQUEST_URI} ! .html$
RewriteCond %{REQUEST_URI} ! /$
RewriteRule ^(.*)$ $1.htmlRewriteCond %{HTTP_HOST} ^hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index.html\ HTTP/
RewriteRule ^index.html$ http://www.hanneganremodeling.com/ [R=301,L]RewriteBase /
RewriteCond %{HTTP_HOST} ^hanneganremodeling.com$ [NC]
RewriteRule ^(.*)$ http://www.hanneganremodeling.com/$1 [R=301,L] -
You repeat this code a few times, maybe that's the problem? Pretty sure you only need it once:
RewriteEngine On
Options +FollowSymlinks
RewriteBase /The line:
RewriteEngine On
Also only needs to be included once in an htaccess file. You may want to remove all the other instances.
Try adding this code at the very top, after the first "RewriteEngine On":
RewriteCond %{REQUEST_URI} ! .html$
RewriteCond %{REQUEST_URI} ! /$
RewriteRule ^(.*)$ $1.html -
Thanks Mike, you are awesome! I actually was thinking to do that, but I was concerned that it might have some larger implications?
I also just resubmitted a sitemap so hopefully that "might" speed up the crawl process...
Thanks again!
-
"I accidentally manually submitted the url to google and manually in submitted it to index and that when this issue began...."
It sounds like you accidently added this URL to the index. You can follow the procedure outlined below to request Google remove the specific URL from the index:
https://support.google.com/webmasters/bin/answer.py?hl=en&answer=59819
I checked your site's structure using Screaming Frog and it does not appear that you are linking to any non-.html versions. If I perform a scan using one of your non-.html pages, it appears that it only links to itself.
Since you have the 301 redirect in place, you can choose to wait it out and Google should correct things eventually; otherwise, requesting Google remove the URL is a faster... PERMANENT process.
Good luck.
Mike
-
No it's not a wordpress, it was created with Dreamweaver. I didn't make sample and sample.html same page, but google is treating it that way.... I have implemented the 301, so I guess I just have to wait for a crawl
-
Thank you very for your input! When I implement into my .htacces what you suggested I get a "Internet 500 Server Error" ? Maybe it would help if I list what I currently have in my .htaccess I had to redirect some old domains and did canonical redirects and default non .index....I hope this help, I am at my wit's end... I also attached a screenshot of the webmaster warning... THANKS!!!
Options +FollowSymlinks
RewriteEngine OnRewriteCond %{HTTP_HOST} ^hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteEngine on
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index.html\ HTTP/
RewriteRule ^index.html$ http://www.hanneganremodeling.com/ [R=301,L]RewriteEngine On
Options +FollowSymlinks
RewriteBase /
RewriteCond %{HTTP_HOST} ^hanneganremodeling.com$ [NC]
RewriteRule ^(.*)$ http://www.hanneganremodeling.com/$1 [R=301,L]Options +FollowSymLinks
RewriteEngine On
RewriteBase / -
Is this a wordpress based site ? What CMS are you using ? How were you able to get domain.com/sample and domain.com/sample.html be the same page ? Either way, canonical tag is the correct solution in this case. There's no need for a 301 and if you do 301 redirects, you are not really fixing the issue caused by your CMS System.
I would therefore strongly advise to use the canonical tag. That's the intended use of that tag.
-
A canonical tag won't physically redirect you when you visit the page, it just lets the search engines know which is the right page to index.
If you want to actually redirect using .htaccess, try using this code
RewriteEngine On
RewriteCond %{REQUEST_URI} ! .html$
RewriteCond %{REQUEST_URI} ! /$
RewriteRule ^(.*)$ $1.html
-
I tried the canonical and when I enter the url without the .html, it doesn't resolve to the url with the .html extension. I tried an .htaccess reirect...I am stumped, I can't get it to redirect automatically the the .html I accidentally manually submitted the url to google and manually in submitted it to index and that when this issue began....
-
Add a canonical tag to your header so that Google/Bing knows which version of your page they should be indexing.
You can also try looking into where the link to the non-html page is coming from. If it's an internal link, just change it so that Google doesn't continue to crawl it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why Is this page de-indexed?
I have dropped out for all my first page KWDs for this page https://www.key.co.uk/en/key/dollies-load-movers-door-skates Can anyone see an issue? I am trying to find one.... We did just migrate to HTTPS but other areas have no problem
Intermediate & Advanced SEO | | BeckyKey0 -
Duplicate content. Competing for rank.
Scenario: An automotive dealer lists cars for sale on their website. The descriptions are very good and in depth at 1,200 words per car. However chunks of the copy are copied from car review websites and weaved into their original copy. Q1: This is flagged in copyscape - how much of an issue is this for Google? Q2: The same stock with the same copy is fed into a popular car listing website - the dealer's website and the classifieds website often rank in the top two positions (sometimes the dealer on top other times the classifieds site). Is this a good or a bad thing? Are you risking being seen as duplicating/scraping content? Thank you.
Intermediate & Advanced SEO | | Bee1590 -
Mixing up languages on the same page + possible duplicate content
I have a site in English hosted under .com with English info, and then different versions of the site under subdirectories (/de/, /es/, etc.) Due to budget constraints we have only managed to translate the most important info of our product pages for the local domains. We feel however that displaying (on a clearly identified tab) the detailed product info in English may be of use for many users that can actually understand English, and may help us get more conversions to have that info. The problem is that this detailed product info is already used on the equivalent English page as well. This basically means 2 things: We are mixing languages on pages We have around 50% of duplicate content of these pages What do you think that the SEO implications of this are? By the way, proper Meta Titles and Meta Descriptions as well as implementation of href lang tag are in place.
Intermediate & Advanced SEO | | lauraseo0 -
Interlinking from unique content page to limited content page
I have a page (page 1) with a lot of unique content which may rank for "Example for sale". On this page I Interlink to a page (page 2) with very limited unique content, but a page I believe is better for the user with anchor "See all Example for sale". In other words, the 1st page is more like a guide with items for sale mixed, whereas the 2nd page is purely a "for sale" page with almost no unique content, but very engaging for users. Questions: Is it risky that I interlink with "Example for sale" to a page with limited unique content, as I risk not being able to rank for either of these 2 pages Would it make sense to "no index, follow" page 2 as there is limited unique content, and is actually a page that exist across the web on other websites in different formats (it is real estate MLS listings), but I can still keep the "Example for sale" link leading to page 2 without risking losing ranking of page 1 for "Example for sale"keyword phrase I am basically trying to work out best solution to rank for "Keyword for sale" and dilemma is page 2 is best for users, but is not a very unique page and page 2 is very unique and OK for users but mixed up writing, pictures and more with properties for sale.
Intermediate & Advanced SEO | | khi50 -
To index or de-index internal search results pages?
Hi there. My client uses a CMS/E-Commerce platform that is automatically set up to index every single internal search results page on search engines. This was supposedly built as an "SEO Friendly" feature in the sense that it creates hundreds of new indexed pages to send to search engines that reflect various terminology used by existing visitors of the site. In many cases, these pages have proven to outperform our optimized static pages, but there are multiple issues with them: The CMS does not allow us to add any static content to these pages, including titles, headers, metas, or copy on the page The query typed in by the site visitor always becomes part of the Title tag / Meta description on Google. If the customer's internal search query contains any less than ideal terminology that we wouldn't want other users to see, their phrasing is out there for the whole world to see, causing lots and lots of ugly terminology floating around on Google that we can't affect. I am scared to do a blanket de-indexation of all /search/ results pages because we would lose the majority of our rankings and traffic in the short term, while trying to improve the ranks of our optimized static pages. The ideal is to really move up our static pages in Google's index, and when their performance is strong enough, to de-index all of the internal search results pages - but for some reason Google keeps choosing the internal search results page as the "better" page to rank for our targeted keywords. Can anyone advise? Has anyone been in a similar situation? Thanks!
Intermediate & Advanced SEO | | FPD_NYC0 -
Frequent FAQs vs duplicate content
It would be helpful for our visitors if we were to include an expandable list of FAQs on most pages. Each section would have its own list of FAQs specific to that section, but all the pages in that section would have the same text. It occurred to me that Google might view this as a duplicate content issue. Each page _does _have a lot of unique text, but underneath we would have lots of of text repeated throughout the site. Should I be concerned? I guess I could always load these by AJAX after page load if might penalize us.
Intermediate & Advanced SEO | | boxcarpress0 -
Duplicate content from development website
Hi all - I've been trawling for duplicate content and then I stumbled across a development URL, set up by a previous web developer, which nearly mirrors current site (few content and structure changes since then, but otherwise it's all virtually the same). The developer didn't take it down when the site was launched. I'm guessing the best thing to do is tell him to take down the development URL (which is specific to the pizza joint btw, immediately. Is there anything else I should ask him to do? Thanks, Luke
Intermediate & Advanced SEO | | McTaggart0 -
Duplicate Content Question
My client's website is for an organization that is part of a larger organization - which has it's own website. We were given permission to use content from the larger organization's site on my client's redesigned site. The SEs will deem this as duplicate content, right? I can "re-write" the content for the new site, but it will still be closely based on the original content from the larger organization's site, due to the scientific/medical nature of the subject material. Is there a way around this dilemma so I do not get penalized? Thanks!
Intermediate & Advanced SEO | | Mills1