Removing UpperCase URLs from Indexing
-
This search - site:www.qjamba.com/online-savings/automotix
gives me this result from Google:
Automotix online coupons and shopping - Qjamba
https://www.qjamba.com/online-savings/automotix
Online Coupons and Shopping Savings for Automotix. Coupon codes for online discounts on Vehicles & Parts products.and Google tells me there is another one, which is 'very simliar'. When I click to see it I get:
Automotix online coupons and shopping - Qjamba
https://www.qjamba.com/online-savings/Automotix
Online Coupons and Shopping Savings for Automotix. Coupon codes for online discounts on Vehicles & Parts products.This is because I recently changed my program to redirect all urls with uppercase in them to lower case, as it appears that all lowercase is strongly recommended.
I assume that having 2 indexed urls for the same content dilutes link juice. Can I safely remove all of my UpperCase indexed pages from Google without it affecting the indexing of the lower case urls? And if, so what is the best way -- there are thousands.
-
Hi AMHC,
It makes sense that without hardly any backlinks built up Google wont find my upper case URLS since all the page links have been changed, however, I am writing out all of the urls that are redirected into email, and from that I can tell that Google is finding them--I guess they may have a list of urls from prior indexing that they crawl independent of what their crawler comes up with.
I'll keep looking to see what they have indexed and if it turns out they just aren't crawling certain pages, will put them in a sitemap to be crawled..It's a good idea for taking care of the problem quickly--so if it progresses too slowly I'll do that.
Thanks very much for your answers!
-
Google needs to crawl the bad pages that you 301d. If there are no live links to those pages, then Google can't find them to 301. In short, if you created new lower case URLs, you just increased your duplicate content problem.
To solve this problem, build an HTML sitemap with all of the bad URLs. Have Google fetch and submit the page and all of the pages it links to. Google will crawl all of your old pages and apply the 301s.
-
Thanks AMHC. In my case, I just don't have many back links so I don't have the urgency that you faced with getting Google to see all the redirects. But, I'm still not understanding--it sounds like you believe that once google sees the redirect it removes the old uppercase from its index. It doesn't look to me like that is what happened in my case because Google is currently indexing BOTH, and so that means it has crawled my new lowercase and I know it isn't crawling any uppercase anymore (it cant--all are redirected). So, that's why I wonder if I have to remove those uppercase urls...does that make sense or am I just not understanding it still?
EDIT: I just discovered I wasn't doing a 301 direct so it wasn't considered a permanent move. That, if I understand it right, will remove the upper case from googles index permanently.
-
Canonicals still drain link juice. Canonicals aren't like a 301. The link juice still stays on the canocalized page. All a canonical does is tell Google, in the case of duplicate content, which page is primary. Canonicals handle the duplicate content issue, they do not handle the link juice issue. If I have 2 pages: /product-name/ and /product-name=?khdfpohfo/ that are duplicates, you can via canonical, tell Google to ignore the page with the variable string and rank the page without the variable string. If the page with the variable string has links, the link juice stays on the page.
The HTML Sitemap is there to tell Google about the 301s. the sitemap would look like this:
After you do the 301 redirect, as well as set up parameters in the .htaccess file (I think - not the developer on this), everything should redirect to the lower case URL. The problem is that if you do a 301 redirect for your entire site, Google may not figure it out too quickly. When it crawls your home page downward, it's only going to see the new URLs, and can't crawl the old 301 URLs because there aren't any internal links pointing at them. The only way Google will see the 301 is via an external backlink. The way we solved this was to create an HTML sitemap of all of the old upper case URLs. We then had Google fetch and index/crawl the sitemap. As it crawls the sitemap, where all of the URLs are 301 redirects, it will likewise point all of the Link Juice at the new URLs.
-
I gotcha. Yeah, different thing going on here..these urls can be really difficult! I have uppercase lowercase, https http, urls that have different content(not just formatting) for mobile as desktop and vice versa, mobile urls that dont even exist for desktop, and desktop urls that dont exist for mobile..all under the same domain. 1000s of internal pages....In the desire to create a good website for users I've created an SEO monster because I didn't realize the many consequences with regard to search indexes.
If you know a true expert in these areas I need him/her. 4 years on this site, its live finally (2 months), and now I'm discovering all of these things have to be fixed, but i can't afford thousands of dollars..I'll do the work, I just need the knowledge!
-
I see where you are coming from, and I do not have a good answer then, when I did a lowercase redirect I started by creating the new lowercase pages then setting canonical to them. After a few months I removed the uppercase versions and redirected them to the new lowercase.
-
Hutch, thanks.
The site is dynamic with thousands of pages that are now being redirected to lower case, so I'm not seeing how using canonical would work because the upper case urls aren't on the site anymore. I guess I think of canonical as being useful when you have ongoing content on the site that duplicates one or more other pages on the same site. In my case none of the upper case urls exist anymore so they don't have 'ongoing' content. I'm still new to this so if it sounds like I have it wrong, please correct me.
-
Another quick fix would be to use a canonical tag on all of your pages pointing to the full lowercase versions.
So for the URLs example.com/UPPER; example.com/Upper; and example.com/upper you would place the following into the head so Google knows that these are just variations of the same page, and if will point search to the desired page example.com/upper
-
AMHC, thank you for your response. I'm in the middle of quite a mess, as this is one of several issues, so really appreciate your help. I must confess to not following everything you wrote exactly:
In your situation, I think i understand the redirect -- it is the same reason I am doing a redirect--it is so that anyone coming from to this site with uppercase in it will end up on the lower case page, and in the case of google will then index the page as a lower case page. BTW, for me that has been easy as I am doing it via php -- if the url doesn't equal its strtolower of the url , then I redirect to strtolower.
I think I get what you are saying about the sitemap -- it speeds up google crawling the site and seeing that all those upper cases should be lowercase from your redirect. In my case, i don't have the concern about Google discovering them as you did because my site is only a couple months old. And, I never have given Google a sitemap so many of my pages aren't crawled yet (I am trying to clean up my entire url structure before i submit a sitemap to them--however they have already crawled perhaps 20% of the site, so I'm now trying to examine what google has crawled and how it has been indexed to figure out what needs to be done).
What I'm not understanding is this: It seems to me that what you described should succeed for going forward to getting both Google and your users to the right ending page, but I don't see how it removes the prior uppercase urls from Google's index. What is it that tells Google your prior upper case urls should no longer be in their index? Is it the fact that they aren't in the sitemap you provide now? Or, do they literally have to be removed using some kind of removal or disavow tool? I discovered this (as you see in the op) because Google appears to never have removed the Uppercase ones even though they are indexing the lower case now.
Ted
-
We had the same issue. Boy, was it an education. I had no idea that URLs were case sensitive for Google, and neither did my SEO buddies. I bet if you asked 100 SEOs if URLs were case sensitive for Google, 95 would answer "No". We discovered the problem in GWT and GA when they had different statistics for the mixed case and all lower case versions of the URL. We believed that we had both a duplicate content issue as well as a link juice splitting issue, with backlinks being pointed at both URLs.
We solved the problem by doing a 301 redirect, but as we are an ecommerce site with thousands of products, it was a messy process. We had to redirect pretty much every page on the site since the mixed case categories contaminated subcategories and products.
The 301 went pretty smoothly, and we saw a minor bump up in some of our Rankings. I would strongly suggest that you create an HTML sitemap for every upper case URL that you are going to 301. Here were our thoughts - we could be wrong on this. If we just 301 a page, and don't tell Google, then Google won't know about it unless it tries to crawl the page. We felt like we needed to show Google that all of the pages are being redirected asap. Create an HTML sitemap with all of your upper case URLs. After you do the 301, have Google fetch and index the sitemap page and all of the pages that it links to. Leave the map up for a few days, and then you can take it down. This will expedite moving the link juice to the correct pages as Google will index the 301 for every page in the sitemap.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why has my website been removed from Bing?
I have a website that has recently been removed from Bing's index, but can't figure out why. The website isn't new, and it is indexed just fine on Google. These are the steps I've tried: The website is verified in Bing Webmaster Tools and successfully submitted the sitemap. I tested the URL to ensure that Bingbot is allowed to crawl the site I submitted URLs to Bing via the URL Submission tool There isn't a "noindex" on the site preventing it from being indexed When I do a URL Inspection, an error message comes up saying "The inspected URL is known to Bing but has some issues which are preventing us from serving it to our users. We recommend you to follow Bing Webmaster Guidelines." I contacted Bing to ask whether the website was removed in error, but received a reply that the website doesn't comply with Bing's quality guidelines, but they wouldn't go into detail as to which guidelines the website isn't meeting. The website URL is https://www.pardeehospital.org. Can anyone offer any advice or insight as to why Bing won't index our site? Thank you!
Intermediate & Advanced SEO | | lindsey.steinkamp0 -
Removing the Trailing Slash in Magento
Hi guys, We have noticed trailing slash vs non-trailing slash duplication on one of our sites. Example:
Intermediate & Advanced SEO | | brandonegroup
Duplicate: https://www.example.com.au/living/
Preferred: https://www.example.com.au/living So, SEO-wise, we suggested placing a canonical tag on all trailing slash pointing to non-trailing slash. However, devs have advised against removing the trailing slash from some URLs with a blanket rule, as this may break functionality in Magento that depends on the trailing slash. The full site would need to be tested after implementing a blanket rewrite rule. Is any other way to address this trailing slash duplication issue without breaking anything in Magento? Keen to hear from you guys. Cheers,0 -
Weird Indexation Issue
On this webpage, we have an interactive graphic that allows users to click a navigational element and learn more about an anatomical part of the knee or a knee malady. For example, a user could click "Articular Cartilage" and they will land on this page: http://www.neocartimplant.com/knee-anatomy-maladies/anatomy/articular-cartilage The weird thing is whether you perform a Google Search for the above URL or for a string of text on that URL (i.e. "Articular cartilage is hyaline cartilage (as opposed to menisci, which consists of fibrocartilage) on the articular surfaces, or the ends, of bones. This thin, smooth tissue lines both joint surfaces where the bones come together to form the knee. ") the following page ranks: http://www.neocartimplant.com/anatmal/knee-anatomy-maladies/anatomy/articular-cartilage.php I have two questions: 1 - Any idea on how the Googlebot is getting to that page?
Intermediate & Advanced SEO | | davidangotti
2 - How should I get the Googlebot to index the correct page (http://www.neocartimplant.com/knee-anatomy-maladies/anatomy/articular-cartilage)? Thanks in advance for your help!0 -
Mixing static.htm urls and dynamic urls on a Windows IIS Server?
Hi all, We've had a website originally built using static html with .htm extensions ranking well in Google hence we want to keep those pages/urls. We are on a dedicated sever (Windows IIS). However our developer has custom made a new DYNAMIC section for the site which shows new added products dynamically and allows them to be booked online via shopping cart. We are having problems displaying them both on the same domain even if we put the dynamic section withing its own subfolder and keep the static htms in the root. Is it possible to have both function on IIS (even if they may have to function a little separately)? Does anyone have previous experience of this kind of issue or a way of making both work? What setup do we need to do on the dedicated server.
Intermediate & Advanced SEO | | emerald0 -
Tracking URLS and Redirects
We have a client with many archived newsletters links that contain tracking code at the end of the URL. These old URLs are pointing to pages that don't exist anymore. Is there a way to set up permanent redirects for these old URLs with tracking code? We have tried and it doesn't seem to work. Thank you!
Intermediate & Advanced SEO | | BopDesign0 -
How Long Does it Take for Rel Canonical to De-Index / Re-Index a Page?
Hi Mozzers, We have 2 e-commerce websites, Website A and Website B, sharing thousands of pages with duplicate product descriptions. Currently only the product pages on Website B are indexing, and we want Website A indexed instead. We added the rel canonical tag on each of Website B's product pages with a link towards the matching product on Page A. How long until Website B gets de-indexed and Website A gets indexed instead? Did we add the rel canonical tag correctly? Thanks!
Intermediate & Advanced SEO | | Travis-W0 -
Removed Site-wide links
Hi there, I have recently removed quite a lot of site-wide links leaving the only link on homepage's of some websites, since doing this I have seen a dramatic drop on my keywords, going from position 2-3 to nowhere. Has anyone else experienced anything like this, should I expect to see a return on these keywords? Thanks
Intermediate & Advanced SEO | | Paul780 -
Spammy? Long URLs
Hi All: Is it true that URLs such as this following one are viewed as "spammy" (besides being too long) and that such URLs will negatively affect ranks for keywords and page ranks: http://www.repairsuniverse.com/ipod-parts-ipod-touch-replacement-repair-parts-ipod-touch-1st-gen-replacement-repair-parts.html My thinking is that the page will perform better once it is 301 redirected to a shorter page name, such as: http://www.repairsuniverse.com/ipod-touch-1G-replacement-parts.html It also appears that these long URLs are also more likely to break, creating unnecessary 404s. <colgroup><col width="301"></colgroup> Thanks for your insight on this issue!
Intermediate & Advanced SEO | | holdtheonion0