Old URLs that have 301s to 404s not being de-indexed.
-
We have a scenario on a domain that recently moved to enforcing SSL. If a page is requested over non-ssl (http) requests, the server automatically redirects to the SSL (https) URL using a good old fashioned 301. This is great except for any page that no longer exists, in which case you get a 301 going to a 404.
Here's what I mean.
Case 1 - Good page:
http://domain.com/goodpage -> 301 -> https://domain.com/goodpage -> 200
Case 2 - Bad page that no longer exists:
http://domain.com/badpage -> 301 -> https://domain.com/badpage -> 404
Google is correctly re-indexing all the "good" pages and just displaying search results going directly to the https version.
Google is stubbornly hanging on to all the "bad" pages and serving up the original URL (http://domain.com/badpage) unless we submit a removal request. But there are hundreds of these pages and this is starting to suck. Note: the load balancer does the SSL enforcement, not the CMS. So we can't detect a 404 and serve it up first. The CMS does the 404'ing.
Any ideas on the best way to approach this problem? Or any idea why Google is holding on to all the old "bad" pages that no longer exist, given that we've clearly indicated with 301s that no one is home at the old address?
-
I don't think 404 vs 410 is the answer here.The basis for this thought is the following:
========
"if we see a page and we get a 404, we are gonna protect that page for 24 hours in the crawling system, so we sort of wait and we say maybe that was a transient 404, maybe it really wasn’t intended to be a page not found.”
“If we see a 410, then the site crawling system says, OK we assume the webmasters knows what they’re doing because they went off the beaten path to deliberately say this page is gone,” he said. “So they immediately convert that 410 to an error, rather than protecting it for 24 hours."
========
I'm thinking the deeper issue is why the 301s are not being respected. If a link points to http://domain.com/badpage and we use a 301 to point to https://domain.com/badpage - shouldn't the crawler (Google or otherwise) respect the 301? Why still index and serve up a page that responds with the 301? To me, this is baffling. If we serve up a 404 or a 410 - either way we are saying "this page is gone" but we're still seeing the original http://domain.com/badpage in the index?
Does that make sense? Or is there more clarification required?
-
sym_admin is right--you'll want to find the source of those pages, as Google apparently is seeing them from somewhere and still requesting them. If there are links to those pages somewhere, you will need to remove them. Also, if you're able, I would change those URLs so that they serve up a "410 Gone" error, and not a 404.
-
Read these three, then do what you got to do...
https://www.searchcommander.com/how-to-bulk-remove-urls-google/
https://productforums.google.com/forum/#!topic/webmasters/uYFJnsyiH8w
https://moz.com/community/q/404-redirects-to-the-homepage-is-this-good-bad-ugly
For proper removal, please ensure that there are no INTERNAL links anywhere on your website to 404 addresses, from sitemap, buttons, text, or images (the whole 9 yards).
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Any Tips for Reviving Old Websites?
Hi, I have a series of websites that have been offline for seven years. Do you guys have any tips that might help restore them to their former SERPs glory? Nothing about the sites themselves has changes since they went offline. Same domains, same content, and only a different server. What has changed is the SERPs landscape. I've noticed competitive terms that these sites used to rank on the first page for with far more results now. I have also noticed some terms result in what seems like a thesaurus similar language results from traditionally more authoritative websites instead of the exact phrase searched for. This concerns me because I could see a less relevant page outranking me just because it is on a .gov domain with similar vocabulary even though the result is not what people searching for the term are most likely searching for. The sites have also lost numerous backlinks but still have some really good ones.
Intermediate & Advanced SEO | | CopBlaster.com1 -
Magento: Should we disable old URL's or delete the page altogether
Our developer tells us that we have a lot of 404 pages that are being included in our sitemap and the reason for this is because we have put 301 redirects on the old pages to new pages. We're using Magento and our current process is to simply disable, which then makes it a a 404. We then redirect this page using a 301 redirect to a new relevant page. The reason for redirecting these pages is because the old pages are still being indexed in Google. I understand 404 pages will eventually drop out of Google's index, but was wondering if we were somehow preventing them dropping out of the index by redirecting the URL's, causing the 404 pages to be added to the sitemap. My questions are: 1. Could we simply delete the entire unwanted page, so that it returns a 404 and drops out of Google's index altogether? 2. Because the 404 pages are in the sitemap, does this mean they will continue to be indexed by Google?
Intermediate & Advanced SEO | | andyheath0 -
Old pages STILL indexed...
Our new website has been live for around 3 months and the URL structure has completely changed. We weren't able to dynamically create 301 redirects for over 5,000 of our products because of how different the URL's were so we've been redirecting them as and when. 3 months on and we're still getting hundreds of 404 errors daily in our Webmaster Tools account. I've checked the server logs and it looks like Bing Bot still seems to want to crawl our old /product/ URL's. Also, if I perform a "site:example.co.uk/product" on Google or Bing - lots of results are still returned, indicating the both still haven't dropped them from their index. Should I ignore the 404 errors and continue to wait for them to drop off or should I just block /product/ in my robots.txt? After 3 months I'd have thought they'd have naturally dropped off by now! I'm half-debating this: User-agent: *
Intermediate & Advanced SEO | | LiamMcArthur
Disallow: /some-directory-for-all/* User-agent: Bingbot
User-agent: MSNBot
Disallow: /product/ Sitemap: http://www.example.co.uk/sitemap.xml0 -
Is this URL Structure SPAMMY
Hey guys/gals I have tried asking this very specific question 3-4 times already and some how my specific question seems to be getting side tracked and my very specif question pertaining to my URL structure keeps getting bypassed and overlooked. I am wondering about if this URL structure would become a possible issue in the somewhat near future with GOOGLE considering what I have seen go down in the SEO world the past 2 years. Does this URL Structure look SPAMMY? http://www.pcmedicsoncall.com/computer-repair/laptop-repair/ www.pcmedicsoncall.com/computer-repair/laptop-repair/laptop-screen-repair/ Below is a Screen shot of the Site which I designed where I have created a SILO Site Architecture. .....PLEASE... Look at the Picture Thank you Marshall SEOMOZ-PC-MEDICS-ON-CALL-1.jpg
Intermediate & Advanced SEO | | MarshallThompson310 -
Canonical url question
i just search seomoz tooll it say duplicate content for www.mysite.com and www.mysite.com/index.php should i use canonical url for this ? is yes then is this right ?
Intermediate & Advanced SEO | | constructionhelpline0 -
URL language on Global Sites
Has anyone looked into a page not ranking as well because the URL is in English when the subdomain is geared for a different country and different language? I can defiantly see this taking away from the user experience, but didn't know if there was any concrete evidence or case studies that would show if it is a big deal or not for rankability? I know this is a backwards question to begin with because the priority over rankability is always UX, but there may not be a way to fix it unless I can prove it is a big deal.
Intermediate & Advanced SEO | | Ryan_Henry0 -
Problem of indexing
Hello, sorry, I'm French and my English is not necessarily correct. I have a problem indexing in Google. Only the home page is referenced: http://bit.ly/yKP4nD. I am looking for several days but I do not understand why. I looked at: The robots.txt file is ok The sitemap, although it is in ASP, is valid with Google No spam, no hidden text I made a request for reconsideration via Google Webmaster Tools and it has no penalties We do not have noindex So I'm stuck and I'd like your opinion. thank you very much A.
Intermediate & Advanced SEO | | android_lyon0 -
Does URL format affect Keyword effectiveness for a URL?
I am looking at our site structure, and don't want to have to rebuild the way the site was linked together based on it's current folder structure so I am wondering what option would work better for our URL structure. I will uses car categories as an example of what I am talking about, but you can insert any category structure you like. For example I would like to have pages like this: www.example.com/ford-convertibles
Intermediate & Advanced SEO | | SL_SEM
www.example.com/chevy-convertibles But instead due to the site structure I will need to have pages like this: www.example.com/ford/convertibles
www.example.com/chevy/convertibles But wonder if I shouldn't do the following to ensure the proper phrase is known for the page: www.example.com/ford/ford-convertibles
www.example.com/chevy/chevy-convertibles The "/ford/ford-convertibles" just seems odd to me as a human, but I haven't seen anything on how well a keyphrase in a URL split by /'s does and I know dashes for phrases are fine. This means I am inclined to go with the"/ford/ford-convertibles"style because it keeps the keyphrase separated by dashes even if it is a bit repetitive. There will be other pages too like "/ford/top-10-fords-ever" but I don't wonder about that since it isnt "ford/ford-xxxxx" Thoughts on whether /'s in a keyphrase are as good as dashes?0