Internal file extension canonicalization
-
Ok no doubt this is straightforward, however seem to be finding to hard to find a simple answer; our websites' internal pages have the extension .html. Trying to the navigate to that internal url without the .html extension results in a 404.
The question is; should a 401 be used to direct to the extension-less url to future proof? and should internal links direct to the extension-less url for the same reason?
Hopefully that makes sense and apologies for what I believe is a straightforward answer;
-
As above
example/abc rewrites to example/abc.html
example/abc.html redirects to example/abc
and all internal links link to example/abc
-
Thankyou for the replies.
I will try and clarify what I am trying to get at; apologies in advance for any naivety.
I understand homepage canonicalization; the confusion revolves around how this applies to internal pages.
Logically; I am struggling to see how internal pages are any different to a homepage in terms of the need to avoid multiple urls....and thus an extension-less url seemed appropriate. Not too mention the benefit or cleaner urls, easier to link to, remember etc.
i.e.
example/abc
example/abc.html
example/abc.index.html
-
As nick said, you dont need to do this, but if you are.
1. REWRITE the new url to the old url, as your webserver needs to know the extention
2. REDIRECT the old url to the new one, incase you already have links to the old urls, you dont want5 duplicate content
3. you need to make surer that all internal links point to the new url, you dont want un-necessary redirects as they leak link juice.
-
I'm about to make a whole lot of assumptions about your website to give this answer, just be aware.
Your website is built static, using HTML. Hence the .html file extension. If you're seeing websites that don't have file extension, it's most likely they are using content management systems (or have some serious /folder/index.html stuff going on).
Having a file extension like .html or .aspx or .php is not a bad thing. On websites like yours, it is required (unless you do the above subfolder thing) because it's an actual file the browser is grabbing rather than something being dynamically generated by a CMS. It has nothing to do with future-proofing.
As for 301'ing non-extension URLs to extention'd ones...well I don't know why you'd need to do that for your type of site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does Google index internal anchors as separate pages?
Hi, Back in September, I added a function that sets an anchor on each subheading (h[2-6]) and creates a Table of content that links to each of those anchors. These anchors did show up in the SERPs as JumpTo Links. Fine. Back then I also changed the canonicals to a slightly different structur and meanwhile there was some massive increase in the number of indexed pages - WAY over the top - which has since been fixed by removing (410) a complete section of the site. However ... there are still ~34.000 pages indexed to what really are more like 4.000 plus (all properly canonicalised). Naturally I am wondering, what google thinks it is indexing. The number is just way of and quite inexplainable. So I was wondering: Does Google save JumpTo links as unique pages? Also, does anybody know any method of actually getting all the pages in the google index? (Not actually existing sites via Screaming Frog etc, but actual pages in the index - all methods I found sadly do not work.) Finally: Does somebody have any other explanation for the incongruency in indexed vs. actual pages? Thanks for your replies! Nico
Technical SEO | | netzkern_AG0 -
Removing CSS & JS Files from Index
Hi, Google has indexed a few .CSS and .JS files that belong to our WordPress plugins and themes. I had them blocked via robots, but realized this doesn't prevent indexation (and can likely hurt us since Google wants to access these files). I've since removed the robots instructions, submitted a removal request via Search Console, but want to make sure they don't come back. Is there a way to put a noindex tag within .CSS and .JS files? Or should I do something with .htaccess instead?
Technical SEO | | kirmeliux1 -
What's Worse - 404 errors or a huge .htaccess file
We have changed our site architecture pretty significantly and now have many fewer pages (albeit with more robust content and focused linking). My question is, what should I do about all the 404 errors (keep in mind, I am only finding these in Bing Webmaster tools, not Moz or GWT)? Is it worse to have all those 404 errors (hundreds), or to have a massive htaccess file for pages that are only getting hits by the Bing crawlbot. Any insight would be great. Thanks
Technical SEO | | CleanEdisonInc0 -
Tool to search relative vs absolute internal links
I'm preparing for a site migration from a .co.uk to a .com and I want to ensure all internal links are updated to point to the new primary domain. What tool can I use to check internal links as some are relative and others are absolute so I need to update them all to relative.
Technical SEO | | Lindsay_D0 -
Canonicalization on my website
I am kind of new to all this but I would like to understand canonicalization. I have a website which when you arrive on it is www.mysite.com but once inside and flicking back to the homepage it reverts to www.mysite.com/index.html. Should I be doing something re canonicalization? If so what? Will the link juice be diluted by having two home page versions? Thanks
Technical SEO | | FCAbroad0 -
Any way to modify SERP site extension links?
I am not sure if this is doable but one of my clients have their website's SERP with sitelinks extension. In example. www.example.com -example1 -example2 -example3 -example 4 Anyway to modify the sitelinks on the SERP?
Technical SEO | | William.Lau0 -
Any value in external links to image files?
Let's say you have www.example.com. On this website, you have www.example.com/example-image.jpg. When someone links externally to this image - like below... { is < {a href="www.example.com/example-image.jpg"} {img src="www.example.com/example-image.jpg"} {/a} The external site would be using the image hosted on your site, but the image is also linked back to the same image file on your site. Does this have any value even though the link is back to the image file and not the website? Also - how much value do you guys feel image links have in relation to tech links? In terms of passing link juice and adding to a natural link profile. Thanks!
Technical SEO | | qlkasdjfw1 -
Duplicate Content Penalties, International Sites
We're in the process of rolling out a new domestic (US) website design. If we copy the same theme/content to our International subsidiaries, would the duplicate content penalty still apply? All International sites would carry the Country specific domain, .co.uk, .eu, etc. This question is for English only content, I'm assuming translated content would not carry a penalty.
Technical SEO | | endlesspools0