Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Removing UpperCase URLs from Indexing
-
This search - site:www.qjamba.com/online-savings/automotix
gives me this result from Google:
Automotix online coupons and shopping - Qjamba
https://www.qjamba.com/online-savings/automotix
Online Coupons and Shopping Savings for Automotix. Coupon codes for online discounts on Vehicles & Parts products.and Google tells me there is another one, which is 'very simliar'. When I click to see it I get:
Automotix online coupons and shopping - Qjamba
https://www.qjamba.com/online-savings/Automotix
Online Coupons and Shopping Savings for Automotix. Coupon codes for online discounts on Vehicles & Parts products.This is because I recently changed my program to redirect all urls with uppercase in them to lower case, as it appears that all lowercase is strongly recommended.
I assume that having 2 indexed urls for the same content dilutes link juice. Can I safely remove all of my UpperCase indexed pages from Google without it affecting the indexing of the lower case urls? And if, so what is the best way -- there are thousands.
-
Hi AMHC,
It makes sense that without hardly any backlinks built up Google wont find my upper case URLS since all the page links have been changed, however, I am writing out all of the urls that are redirected into email, and from that I can tell that Google is finding them--I guess they may have a list of urls from prior indexing that they crawl independent of what their crawler comes up with.
I'll keep looking to see what they have indexed and if it turns out they just aren't crawling certain pages, will put them in a sitemap to be crawled..It's a good idea for taking care of the problem quickly--so if it progresses too slowly I'll do that.
Thanks very much for your answers!
-
Google needs to crawl the bad pages that you 301d. If there are no live links to those pages, then Google can't find them to 301. In short, if you created new lower case URLs, you just increased your duplicate content problem.
To solve this problem, build an HTML sitemap with all of the bad URLs. Have Google fetch and submit the page and all of the pages it links to. Google will crawl all of your old pages and apply the 301s.
-
Thanks AMHC. In my case, I just don't have many back links so I don't have the urgency that you faced with getting Google to see all the redirects. But, I'm still not understanding--it sounds like you believe that once google sees the redirect it removes the old uppercase from its index. It doesn't look to me like that is what happened in my case because Google is currently indexing BOTH, and so that means it has crawled my new lowercase and I know it isn't crawling any uppercase anymore (it cant--all are redirected). So, that's why I wonder if I have to remove those uppercase urls...does that make sense or am I just not understanding it still?
EDIT: I just discovered I wasn't doing a 301 direct so it wasn't considered a permanent move. That, if I understand it right, will remove the upper case from googles index permanently.
-
Canonicals still drain link juice. Canonicals aren't like a 301. The link juice still stays on the canocalized page. All a canonical does is tell Google, in the case of duplicate content, which page is primary. Canonicals handle the duplicate content issue, they do not handle the link juice issue. If I have 2 pages: /product-name/ and /product-name=?khdfpohfo/ that are duplicates, you can via canonical, tell Google to ignore the page with the variable string and rank the page without the variable string. If the page with the variable string has links, the link juice stays on the page.
The HTML Sitemap is there to tell Google about the 301s. the sitemap would look like this:
After you do the 301 redirect, as well as set up parameters in the .htaccess file (I think - not the developer on this), everything should redirect to the lower case URL. The problem is that if you do a 301 redirect for your entire site, Google may not figure it out too quickly. When it crawls your home page downward, it's only going to see the new URLs, and can't crawl the old 301 URLs because there aren't any internal links pointing at them. The only way Google will see the 301 is via an external backlink. The way we solved this was to create an HTML sitemap of all of the old upper case URLs. We then had Google fetch and index/crawl the sitemap. As it crawls the sitemap, where all of the URLs are 301 redirects, it will likewise point all of the Link Juice at the new URLs.
-
I gotcha. Yeah, different thing going on here..these urls can be really difficult! I have uppercase lowercase, https http, urls that have different content(not just formatting) for mobile as desktop and vice versa, mobile urls that dont even exist for desktop, and desktop urls that dont exist for mobile..all under the same domain. 1000s of internal pages....In the desire to create a good website for users I've created an SEO monster because I didn't realize the many consequences with regard to search indexes.
If you know a true expert in these areas I need him/her. 4 years on this site, its live finally (2 months), and now I'm discovering all of these things have to be fixed, but i can't afford thousands of dollars..I'll do the work, I just need the knowledge!
-
I see where you are coming from, and I do not have a good answer then, when I did a lowercase redirect I started by creating the new lowercase pages then setting canonical to them. After a few months I removed the uppercase versions and redirected them to the new lowercase.
-
Hutch, thanks.
The site is dynamic with thousands of pages that are now being redirected to lower case, so I'm not seeing how using canonical would work because the upper case urls aren't on the site anymore. I guess I think of canonical as being useful when you have ongoing content on the site that duplicates one or more other pages on the same site. In my case none of the upper case urls exist anymore so they don't have 'ongoing' content. I'm still new to this so if it sounds like I have it wrong, please correct me.
-
Another quick fix would be to use a canonical tag on all of your pages pointing to the full lowercase versions.
So for the URLs example.com/UPPER; example.com/Upper; and example.com/upper you would place the following into the head so Google knows that these are just variations of the same page, and if will point search to the desired page example.com/upper
-
AMHC, thank you for your response. I'm in the middle of quite a mess, as this is one of several issues, so really appreciate your help. I must confess to not following everything you wrote exactly:
In your situation, I think i understand the redirect -- it is the same reason I am doing a redirect--it is so that anyone coming from to this site with uppercase in it will end up on the lower case page, and in the case of google will then index the page as a lower case page. BTW, for me that has been easy as I am doing it via php -- if the url doesn't equal its strtolower of the url , then I redirect to strtolower.
I think I get what you are saying about the sitemap -- it speeds up google crawling the site and seeing that all those upper cases should be lowercase from your redirect. In my case, i don't have the concern about Google discovering them as you did because my site is only a couple months old. And, I never have given Google a sitemap so many of my pages aren't crawled yet (I am trying to clean up my entire url structure before i submit a sitemap to them--however they have already crawled perhaps 20% of the site, so I'm now trying to examine what google has crawled and how it has been indexed to figure out what needs to be done).
What I'm not understanding is this: It seems to me that what you described should succeed for going forward to getting both Google and your users to the right ending page, but I don't see how it removes the prior uppercase urls from Google's index. What is it that tells Google your prior upper case urls should no longer be in their index? Is it the fact that they aren't in the sitemap you provide now? Or, do they literally have to be removed using some kind of removal or disavow tool? I discovered this (as you see in the op) because Google appears to never have removed the Uppercase ones even though they are indexing the lower case now.
Ted
-
We had the same issue. Boy, was it an education. I had no idea that URLs were case sensitive for Google, and neither did my SEO buddies. I bet if you asked 100 SEOs if URLs were case sensitive for Google, 95 would answer "No". We discovered the problem in GWT and GA when they had different statistics for the mixed case and all lower case versions of the URL. We believed that we had both a duplicate content issue as well as a link juice splitting issue, with backlinks being pointed at both URLs.
We solved the problem by doing a 301 redirect, but as we are an ecommerce site with thousands of products, it was a messy process. We had to redirect pretty much every page on the site since the mixed case categories contaminated subcategories and products.
The 301 went pretty smoothly, and we saw a minor bump up in some of our Rankings. I would strongly suggest that you create an HTML sitemap for every upper case URL that you are going to 301. Here were our thoughts - we could be wrong on this. If we just 301 a page, and don't tell Google, then Google won't know about it unless it tries to crawl the page. We felt like we needed to show Google that all of the pages are being redirected asap. Create an HTML sitemap with all of your upper case URLs. After you do the 301, have Google fetch and index the sitemap page and all of the pages that it links to. Leave the map up for a few days, and then you can take it down. This will expedite moving the link juice to the correct pages as Google will index the 301 for every page in the sitemap.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Changing Url Removes Backlink
Hello MOZ Community, I have question regarding Bad Backlink Removal. My Site's Post's Image got 4 to 5k backlinks from unknown sites and also their is no contact details on their site so that i can contact them to remove. So, I have an idea for which i want suggestion " If I change the url that receieves backlinks" does this will remove backlinks? For Example: https://example.com/test/ got 5k backlinks if I change this url to https://examplee.com/test-failed/ does this will remove those 5k backlinks? If not then How Can I remove those Backlinks? I Know about disavow but this takes time.
Intermediate & Advanced SEO | | Jackson210 -
Does Google Index URLs that are always 302 redirected
Hello community Due to the architecture of our site, we have a bunch of URLs that are 302 redirected to the same URL plus a query string appended to it. For example: www.example.com/hello.html is 302 redirected to www.example.com/hello.html?___store=abc The www.example.com/hello.html?___store=abc page also has a link canonical tag to www.example.com/hello.html In the above example, can www.example.com/hello.html every be Indexed, by google as I assume the googlebot will always be redirected to www.example.com/hello.html?___store=abc and will never see www.example.com/hello.html ? Thanks in advance for the help!
Intermediate & Advanced SEO | | EcommRulz0 -
Pages are Indexed but not Cached by Google. Why?
Here's an example: I get a 404 error for this: http://webcache.googleusercontent.com/search?q=cache:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all But a search for qjamba restaurant coupons gives a clear result as does this: site:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all What is going on? How can this page be indexed but not in the Google cache? I should make clear that the page is not showing up with any kind of error in webmaster tools, and Google has been crawling pages just fine. This particular page was fetched by Google yesterday with no problems, and even crawled again twice today by Google Yet, no cache.
Intermediate & Advanced SEO | | friendoffood2 -
Internal links and URL shortners
Hi guys, what are your thoughts using bit.ly links as internal links on blog posts of a website? Some posts have 4/5 bit.ly links going to other pages of our website (noindexed pages). I have nofollowed them so no seo value is lost, also the links are going to noindexed pages so no need to pass seo value directly. However what are your thoughts on how Google will see internal links which have essential become re-direct links? They are bit.ly links going to result pages basically. Am I also to assume the tracking for internal links would also be better using google analytics functionality? is bit.ly accurate for tracking clicks? Any advice much appreciated, I just wanted to double check this.
Intermediate & Advanced SEO | | pauledwards0 -
Product or Shop in URL
What do you think is better for seo and for sale, I am using woo-ecommerce for health products website. websitename.com/product/keyword OR websitename.com/shop/keyword
Intermediate & Advanced SEO | | MasonBaker0 -
301 vs 410 redirect: What to use when removing a URL from the website
We are in the process of detemining how to handle URLs that are completely removed from our website? Think of these as listings that have an expiration date (i.e. http://www.noodle.org/test-prep/tphU3/sat-group-course). What is the best practice for removing these listings (assuming not many people are linking to them externally). 301 to a general page (i.e. http://www.noodle.org/search/test-prep) Do nothing and leave them up but remove from the site map (as they are no longer useful from a user perspective) return a 404 or 410?
Intermediate & Advanced SEO | | abargmann0 -
Limit on Google Removal Tool?
I'm dealing with thousands of duplicate URL's caused by the CMS... So I am using some automation to get through them - What is the daily limit? weekly? monthly? Any ideas?? thanks, Ben
Intermediate & Advanced SEO | | bjs20100 -
Our login pages are being indexed by Google - How do you remove them?
Each of our login pages show up under different subdomains of our website. Currently these are accessible by Google which is a huge competitive advantage for our competitors looking for our client list. We've done a few things to try to rectify the problem: - No index/archive to each login page Robot.txt to all subdomains to block search engines gone into webmaster tools and added the subdomain of one of our bigger clients then requested to remove it from Google (This would be great to do for every subdomain but we have a LOT of clients and it would require tons of backend work to make this happen.) Other than the last option, is there something we can do that will remove subdomains from being viewed from search engines? We know the robots.txt are working since the message on search results say: "A description for this result is not available because of this site's robots.txt – learn more." But we'd like the whole link to disappear.. Any suggestions?
Intermediate & Advanced SEO | | desmond.liang1