Lately I have noticed Google indexing many files on the site without the .html extension
-
Hello,
Our site, while we convert, remains in HTML 4.0.
Fle names such as http://www.sample.com/samples/index.shtml are being picked up in the SERPS as http://www.sample.com/samples/ even when I use the "rel="canonical" tag and specify the full file name therein as recommended. The link to the truncated URL (http://www.sample.com/samples/) results in what MOZ shows as fewer incoming links than the full file name is shown as having incoming.
I am not sure if this is causing a loss in placement (the MOZ stats are showing a decline of late), which I have seen recently (of course, I am aware of other possible reasons, such as not being in HTML5 yet).
Any help with this would be great.
Thank you in advance
-
Can you clarify what you're concerned about for 301 redirects in terms of link juice?
301 redirects don't carry as much link juice as a direct link, but it doesn't impact correct links, just the links that, otherwise, wouldn't get link juice to your end destination at all. (Though, if your canonical is working correctly, it'll pass the same amount of link juice as a 301 redirect.)
Dr. Pete goes into this a bit more over here: https://moz.com/community/q/do-canonical-tags-pass-all-of-the-link-juice-onto-the-url-they-point-to
-
Many thanks for taking the time to respond Kristina.
-
I don't like to do redirects, as so many have warned of the consequences in terms of link juice
-
No, I don't link to the pages in question using "/" rather than the ".shtml" version of the page indexed.
-
A few external sources use the "/" version (recent linkers) I have found, but they likely only did so as they saw it displayed as such in the SERPs previously. No commercial or other affiliate sites do.
The reason I was really confused is that some pages are indexed using the "/", while others are not -- with no apparent reason I could locate. The "/" version for pages still remains on the first page for keywords, even with far less domain authorities and pages linking to them (for now!). We will be moving to another platform with a different default extension, so I wonder how that will be handled. Endless mysteries.
Thank you again for your time and suggestions,
Greg
-
-
Hmm, that doesn't seem good. It's hard to say whether this is causing the decline in your rankings, but either way, you want to make sure that you're not splitting your link equity between your / and .shtml pages. Here's what I'd do:
- If you can, 301 redirect / pages to .shtml pages. Obviously, it'd be easier if the canonical worked, but it sounds like it doesn't.
- Use ScreamingFrog or DeepCrawl to look through internal pages on your site to see if you're ever linking to the / version of pages rather than the .shtml pages. When Google chooses a different version of a URL over the canonical one, it's often because that's how it sees internal links pointing to the page. Make sure that you only have links to the .shtml version of the page.
- Use a tool like Moz or Ahrefs to find all internal links to your site. For any links that you built or have a partnership with the owners, make sure that they're linking to the .shtml version of the page. I could especially see your ad partners using / because it's a cleaner before parameters than .shtml.
After that, wait and see if Google fixes the problem.
Also worth noting: have you thought about changing your default to /? That's more common today, so you're probably getting a lot of external links with / instead of .shtml, and you'll never be able to fix that problem. If that's a possible solution, you may want to explore it.
Good luck!
Kristina
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Only Indexing Canonical Root URL Instead of Specified URL Parameters
We just launched a website about 1 month ago and noticed that Google was indexing, but not displaying, URLs with "?location=" parameters such as: http://www.castlemap.com/local-house-values/?location=great-falls-virginia and http://www.castlemap.com/local-house-values/?location=mclean-virginia. Instead, Google has only been displaying our root URL http://www.castlemap.com/local-house-values/ in its search results -- which we don't want as the URLs with specific locations are more important and each has its own unique list of houses for sale. We have Yoast setup with all of these ?location values added in our sitemap that has successfully been submitted to Google's Sitemaps: http://www.castlemap.com/buy-location-sitemap.xml I also tried going into the old Google Search Console and setting the "location" URL Parameter to Crawl Every URL with the Specifies Effect enabled... and I even see the two URLs I mentioned above in Google's list of Parameter Samples... but the pages are still not being added to Google. Even after Requesting Indexing again after making all of these changes a few days ago, these URLs are still displaying as Allowing Indexing, but Not On Google in the Search Console and not showing up on Google when I manually search for the entire URL. Why are these pages not showing up on Google and how can we get them to display? Only solution I can think of would be to set our main /local-house-values/ page to noindex in order to have Google favor all of our other URL parameter versions... but I'm guessing that's probably not a good solution for multiple reasons.
Intermediate & Advanced SEO | | Nitruc0 -
Can you index a Google doc?
We have updated and added completely new content to our state pages. Our old state content is sitting in a our Google drive. Can I make these public to get them indexed and provide a link back to our state pages? In theory it sounds like a great link building strategy... TIA!
Intermediate & Advanced SEO | | LindsayE1 -
Staging/Development Site Indexed?
So, my company's site has been pretty tough to try to get moving in the right direction on Google's SERPs. I had believed that it was mainly due to having a shortage of back links and a horrible home page load time. Everything else seems to be set up pretty well. I was messing around and used the site: Google search operator for our staging site. I found stage.site.com and a lot of our other staging pages in the search results. I have to think that this is the problem and causing a duplicate content penalty of the entire site. I guess I now need to 301 redirect the entire site? Has anyone every had this issue before and have fixed it? Thanks for any help.
Intermediate & Advanced SEO | | aua0 -
What happens if one remove the disavow file from a non penalised site
What happens if one remove the disavow file from a site that has not received a manual penalty from Google. Although the site did suffer from a drop in traffic and rankings.
Intermediate & Advanced SEO | | Taiger0 -
URL Parameter Being Improperly Crawled & Indexed by Google
Hi All, We just discovered that Google is indexing a subset of our URL’s embedded with our analytics tracking parameter. For the search “dresses” we are appearing in position 11 (page 2, rank 1) with the following URL: www.anthropologie.com/anthro/category/dresses/clothes-dresses.jsp?cm_mmc=Email--Anthro_12--070612_Dress_Anthro-_-shop You’ll note that “cm_mmc=Email” is appended. This is causing our analytics (CoreMetrics) to mis-attribute this traffic and revenue to Email vs. SEO. A few questions: 1) Why is this happening? This is an email from June 2012 and we don’t have an email specific landing page embedded with this parameter. Somehow Google found and indexed this page with these tracking parameters. Has anyone else seen something similar happening?
Intermediate & Advanced SEO | | kevin_reyes
2) What is the recommended method of “politely” telling Google to index the version without the tracking parameters? Some thoughts on this:
a. Implement a self-referencing canonical on the page.
- This is done, but we have some technical issues with the canonical due to our ecommerce platform (ATG). Even though page source code looks correct, Googlebot is seeing the canonical with a JSession ID.
b. Resubmit both URL’s in WMT Fetch feature hoping that Google recognizes the canonical.
- We did this, but given the canonical issue it won’t be effective until we can fix it.
c. URL handling change in WMT
- We made this change, but it didn’t seem to fix the problem
d. 301 or No Index the version with the email tracking parameters
- This seems drastic and I’m concerned that we’d lose ranking on this very strategic keyword Thoughts? Thanks in advance, Kevin0 -
Our login pages are being indexed by Google - How do you remove them?
Each of our login pages show up under different subdomains of our website. Currently these are accessible by Google which is a huge competitive advantage for our competitors looking for our client list. We've done a few things to try to rectify the problem: - No index/archive to each login page Robot.txt to all subdomains to block search engines gone into webmaster tools and added the subdomain of one of our bigger clients then requested to remove it from Google (This would be great to do for every subdomain but we have a LOT of clients and it would require tons of backend work to make this happen.) Other than the last option, is there something we can do that will remove subdomains from being viewed from search engines? We know the robots.txt are working since the message on search results say: "A description for this result is not available because of this site's robots.txt – learn more." But we'd like the whole link to disappear.. Any suggestions?
Intermediate & Advanced SEO | | desmond.liang1 -
How do I presuade Google to re-consider my site?
A few weeks ago I got an emai from Google that my site is suspected to violating Google guidelines-->suspected links manipulationg Google Page rank. My site dropped to the second page. I have contacted some of the top webmasters who link to me and they have removed the links or added a nofollow. When I asked for re-consideation I got an answear that there are still suspected links. What do I do now? I can't remove all of my links?! BTW this happened before the offical Pinguin Update.
Intermediate & Advanced SEO | | Ofer230 -
Why does google not show my ecommerce category page when I have the same keywords for many products in the product title?
I have found that google removes the google serach listing of a category from my site (ecommerce) when products within the category have the same key words. I sell golf shirts and have a category called "Mens Golf Shirts" Within the category I have added many products but when the too many of the products say mens golf shirt my link on google gets removed. Before i had products named: FUNKTION Mens Short Sleeve Golf Shirt Red / Black but now I have had to change it to: FUNKTION Red / Black I can understand that they may see this a keyword stuffing but how do I get around this to ensure that each product can rank on google for mens golf shirt
Intermediate & Advanced SEO | | funktiongolf0