Internal file extension canonicalization
-
Ok no doubt this is straightforward, however seem to be finding to hard to find a simple answer; our websites' internal pages have the extension .html. Trying to the navigate to that internal url without the .html extension results in a 404.
The question is; should a 401 be used to direct to the extension-less url to future proof? and should internal links direct to the extension-less url for the same reason?
Hopefully that makes sense and apologies for what I believe is a straightforward answer;
-
As above
example/abc rewrites to example/abc.html
example/abc.html redirects to example/abc
and all internal links link to example/abc
-
Thankyou for the replies.
I will try and clarify what I am trying to get at; apologies in advance for any naivety.
I understand homepage canonicalization; the confusion revolves around how this applies to internal pages.
Logically; I am struggling to see how internal pages are any different to a homepage in terms of the need to avoid multiple urls....and thus an extension-less url seemed appropriate. Not too mention the benefit or cleaner urls, easier to link to, remember etc.
i.e.
example/abc
example/abc.html
example/abc.index.html
-
As nick said, you dont need to do this, but if you are.
1. REWRITE the new url to the old url, as your webserver needs to know the extention
2. REDIRECT the old url to the new one, incase you already have links to the old urls, you dont want5 duplicate content
3. you need to make surer that all internal links point to the new url, you dont want un-necessary redirects as they leak link juice.
-
I'm about to make a whole lot of assumptions about your website to give this answer, just be aware.
Your website is built static, using HTML. Hence the .html file extension. If you're seeing websites that don't have file extension, it's most likely they are using content management systems (or have some serious /folder/index.html stuff going on).
Having a file extension like .html or .aspx or .php is not a bad thing. On websites like yours, it is required (unless you do the above subfolder thing) because it's an actual file the browser is grabbing rather than something being dynamically generated by a CMS. It has nothing to do with future-proofing.
As for 301'ing non-extension URLs to extention'd ones...well I don't know why you'd need to do that for your type of site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Search Console rejecting XML sitemap files as HTML files, despite them being XML
Hi Moz folks, We have launched an international site that uses subdirectories for regions and have had trouble getting pages outside of USA and Canada indexed. Google Search Console accounts have finally been verified, so we can submit the correct regional sitemap to the relevant search console account. However, when submitting non-USA and CA sitemap files (e.g. AU, NZ, UK), we are receiving a submission error that states, "Your Sitemap appears to be an HTML page," despite them being .xml files, e.g. http://www.t2tea.com/en/au/sitemap1_en_AU.xml. Queries on this suggest it's a W3 Cache plugin problem, but we aren't using Wordpress; the site is running on Demandware. Can anyone guide us on why Google Search Console is rejecting these sitemap files? Page indexation is a real issue. Many thanks in advance!
Technical SEO | | SearchDeploy0 -
Log files vs. GWT: major discrepancy in number of pages crawled
Following up on this post, I did a pretty deep dive on our log files using Web Log Explorer. Several things have come to light, but one of the issues I've spotted is the vast difference between the number of pages crawled by the Googlebot according to our log files versus the number of pages indexed in GWT. Consider: Number of pages crawled per log files: 2993 Crawl frequency (i.e. number of times those pages were crawled): 61438 Number of pages indexed by GWT: 17,182,818 (yes, that's right - more than 17 million pages) We have a bunch of XML sitemaps (around 350) that are linked on the main sitemap.xml page; these pages have been crawled fairly frequently, and I think this is where a lot of links have been indexed. Even so, would that explain why we have relatively few pages crawled according to the logs but so many more indexed by Google?
Technical SEO | | ufmedia0 -
Will Google Recrawl an Indexed URL Which is No Longer Internally Linked?
We accidentally introduced Google to our incomplete site. The end result: thousands of pages indexed which return nothing but a "Sorry, no results" page. I know there are many ways to go about this, but the sheer number of pages makes it frustrating. Ideally, in the interim, I'd love to 404 the offending pages and allow Google to recrawl them, realize they're dead, and begin removing them from the index. Unfortunately, we've removed the initial internal links that lead to this premature indexation from our site. So my question is, will Google revisit these pages based on their own records (as in, this page is indexed, let's go check it out again!), or will they only revisit them by following along a current site structure? We are signed up with WMT if that helps.
Technical SEO | | kirmeliux0 -
Cant find internal links on one of my pages.
When I run open site explorer for www.kingremodeling.com/ss.php?pid=5 it says there are 40 links to it on my site. However, I cannot find these links on any of the pages that open site explorer lists such as my homepage www.KingRemodeling.com. Totally confused!
Technical SEO | | allb830 -
HTTP 500 Internal Server Error, Need help
Hi, For a few days know google crawlers have been getting 500 errors from our dedicated server whenever they try to crawl the site. Using the "Fetch as Google" tool under health in webmaster tools, I get "Unreachable page" every time I fetch the homepage. Here is exactly what the google crawler is getting: <code>HTTP/1.1 500 Internal Server Error Date: Fri, 21 Jun 2013 19:52:27 GMT Server: Apache/2.2.15 (CentOS) X-Powered-By: PHP/5.3.3 X-Pingback: [http://www.communityadvocate.com/xmlrpc.php](http://www.communityadvocate.com/xmlrpc.php) Connection: close Transfer-Encoding: chunked Content-Type: text/html; charset=UTF-8 http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> My url is [http://www.communityadvocate.com](http://www.communityadvocate.com/)</code> and here's the screenshot from Goolge webmater http://screencast.com/t/FoWvqRRtmoEQ How can i fix that? Thank you
Technical SEO | | Vmezoz0 -
Creating a CSV file for uploading 301 redirect URL map
Hi if i'm bulk uploading 301 redirects whats needed to create a csv file? is it just a case of creating an excel spreadsheet & have the old urls in column A and new urls in column B and then just convert to csv and upload ? or do i need to put in other details or paremeters etc etc ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
Does anchor text penalty apply to internal links?
We already know that over optimsied anchor text for external will cause a penalty. But what about internal links? All of our blog posts include an advertisement linking sales pages. These links all use the exact same anchor text. Is linking to an internal page from so many other pages (blog posts) likely to trigger a penalty? Here is an example: http://www.designquotes.com.au/business-blog/four-ways-to-enhance-your-e-commerce-site-for-busy-shoppers/ This links to http://www.designquotes.com.au/web-design-quotes Many of the posts link to the same page using the anchor text "Compare Web Design Quotes from Local Designers."
Technical SEO | | designquotes0 -
.htaccess file in wordpress blog
I want to redirect non www to www in blog hosted by wordpress. Where can i find .htaccess file ? Shall i have to create a new one ? If yes, where should i upload it ? Thanks
Technical SEO | | seoug_20050