Removing Content 301 vs 410 question
-
Hello,
I was hoping to get the SEOmoz community’s advice on how to remove content most effectively from a large website.
I just read a very thought-provoking thread in which Dr. Pete and Kerry22 answered a question about how to cut content in order to recover from Panda. (http://www.seomoz.org/q/panda-recovery-what-is-the-best-way-to-shrink-your-index-and-make-google-aware).
Kerry22 mentioned a process in which 410s would be totally visible to googlebot so that it would easily recognize the removal of content. The conversation implied that it is not just important to remove the content, but also to give google the ability to recrawl that content to indeed confirm the content was removed (as opposed to just recrawling the site and not finding the content anywhere).
This really made lots of sense to me and also struck a personal chord… Our website was hit by a later Panda refresh back in March 2012, and ever since then we have been aggressive about cutting content and doing what we can to improve user experience.
When we cut pages, though, we used a different approach, doing all of the below steps:
1. We cut the pages
2. We set up permanent 301 redirects for all of them immediately.
3. And at the same time, we would always remove from our site all links pointing to these pages (to make sure users didn’t stumble upon the removed pages.When we cut the content pages, we would either delete them or unpublish them, causing them to 404 or 401, but this is probably a moot point since we gave them 301 redirects every time anyway. We thought we could signal to Google that we removed the content while avoiding generating lots of errors that way…
I see that this is basically the exact opposite of Dr. Pete's advice and opposite what Kerry22 used in order to get a recovery, and meanwhile here we are still trying to help our site recover. We've been feeling that our site should no longer be under the shadow of Panda.
So here is what I'm wondering, and I'd be very appreciative of advice or answers for the following questions:
1. Is it possible that Google still thinks we have this content on our site, and we continue to suffer from Panda because of this?
Could there be a residual taint caused by the way we removed it, or is it all water under the bridge at this point because Google would have figured out we removed it (albeit not in a preferred way)?2. If there’s a possibility our former cutting process has caused lasting issues and affected how Google sees us, what can we do now (if anything) to correct the damage we did?
Thank you in advance for your help,
Eric -
Thanks Dr Peter! I agree with you! Just wanted to feel shure about it.
Yes, Gary, you can personalize also a 410 page.
-
You should be able to customize a 410 just like you do a 404. The problem is that most platforms don't do that, by default, so you get the old-school status code page. That should be configurable, though, on almost all modern platforms.
-
From a commerce perspective the biggest problem I have with the 410 is the user experience. If I tag a URL with a 410 when someone request the page they get a white page that says GONE. They never even get the chance to see the store and maybe search for a similar product.
Would it work if I built a landing page that returns a 410 and then used the 301 to redirect the bad URL to the landing page? It would make the customer happy, they would be in the store with a message to search for something else. But would Google really associate the 410 with the redirected URL?
-
Hi Sandra, don't worry about 404s volume because they won't hurt your rankings.
About your issue I understand that you want to be really clear with your users and don't hurt their experience on the site. So create a custom 404 which changes its content depending of what page is returning it. If it's one of your old product you can return a message or an article of why you decided to remove them and propose some alternatives. For all other errors you can just return a search box or related products to the one you lost.
301 IMHO are not the way to go, if an url is gone it has not being redirected anywhere, so a 301 will result in a bad UX 99% of the time.
-
Hello,
I have a related question about 301 vs 410.
I have a client who wants to delete a whole category of product from one site. It's a big amount of product, so a big amount of urls, but this product is not working very well. So the decision is not SEO-related but more as a business decision. It's not for Panda.
If we think about the communication with the user, the best option would be to have a landing page explaining that we decided to remove that product.
Then the question is, do we do a redirect 301 of all those urls to this landing page? I am afraid that a big redirect like this, going from many urls to a single one (even if this is not created to rank on google) can be seen dodgy by Google. Am I right?
Or do I do a 410 for those pages, and I personalize the 410 landing only for these urls in order to communicate with the user (is that even possible?). But I am afraid, because we'll have much 4XX Errors in WMT, and this may have influence to the rankings!
So I don't know what to do! It's a must that we delete this content and that we communicate it well with the users.
Thanks for your help,
-
100% agreed - 403 isn't really an appropriate alternative to 404. I know SEOs who claim that 410s are stronger/faster, but I haven't seen great evidence in the past couple of years. It's harmless to try 410s, but I wouldn't expect miracles.
-
Hi Eric, I'll try to answer your further question even if I'm not an oracle like Pete
First of all thanks Pete to underline that you need to give google just one response since you can't give them both 301 and 404, I was assuming that and I didn't focus on that part of Eric's answer.
Second. Eric, If your purpose is to give google the ability of recrawl the old content to let them see it has disappeared you want to give them a 404 or a 410 which are respectively not found and permanently not found. Before it was a difference but now they've almost the same value under google's eyes (further reading). In that way google can access your page and see that those contents are now gone.
In the case of 403 the access is denied to anyone both google and humans, so in that case google won't be able to access and recrawl it. If your theory is based (and I think you're in the good way) upon the thing that google needs to recrawl your content and see it ahs really gone, 403 is not the response you should give it.
-
Hey there mememax - thank you for the reply! Reading your post and thinking back to our methodology, yes I think in hindsight we were a bit too afraid about generating errors when we removed content - we should have considered the underlying meaning of the different statuses more carefully. I appreciate your advice.
Eric
-
Hello Dr. Pete – thank you for the great info and advice!
I do have one follow-up question if that's ok – as we move forward cutting undesirable content and generate 4xx status for those pages, is there a difference in impact/effectiveness between a 403 and a 404? We use a CMS and un-publishing a page creates a 403 “Access denied” message. Deleting a page will generate a 404. I would love to hear your opinion about any practical differences from a Googlebot standpoint… does a 404 carry more weight when it comes to content removal, or are they the same to Googlebot? If there’s a difference and the 404 is better, we’ll go the 404 route moving forward.
Thanks again for all your help,
Eric
-
Let me jump in and clarify one small detail. If you delete a page, which would naturally result in a 404, but then 301-redirect that page/URL, there is no 404. I understand the confusion, but ultimately you can only have one HTTP status code. So, if the page properly 301s, it will never return a 404, even if it's technically deleted.
If the page 301s to a page that looks like a "not found" sort of page (content-wise), Google could consider that a "soft 404". Typically, though, once the 301 is in place, the 404 is moot.
For any change in status, the removal of crawl paths could slow Google re-processing those pages. Even if you delete a page, Google has to re-crawl it to see the 404. Now, if it's a high-authority page or has inbound (external) links, it could get re-crawled even if you cut the internal links. If it's a deep, low-value page, though, it may take Google a long time to get back and see those new signals. So, sometimes we recommend keeping the paths open.
There are other ways to kick Google to re-crawl, such as having an XML sitemap open with those pages in them (but removing the internal links). These signals aren't as powerful, but they can help the process along.
As to your specific questions:
(1) It's very tricky, in practice, especially at large-scale. I think step 1 is to dig into your index/cache (slice and dice with the site: operator) and see if Google has removed these pages. There are cases where massive 301s, etc. can look fishy to Google, but usually, once a page is gone, it's gone. If Google has redirected/removed these pages, and you're still penalized, then you may be fixing the wrong problem or possibly haven't gone far enough.
(2) It really depends on the issue. If you cut too deep and somehow cut off crawl paths or stranded inbound links, then you may need to re-establish some links/pages. If you 301'ed a lot of low-value content (and possibly bad links), you may actually need to cut some of those 301s and let those pages die off. I agree with @mememax that sometimes a helathy combination of 301s/404s is a better bet - pages go away, and 404s are normal if there's really no good alternative to the page that's gone.
-
Hi Eric, in my experience I've always found 4** better than 301 to solve this kind of issues.
Many people uses this response too much just because they want to show google that their site don't have any 404.
Just think about it a little, a 301 is a permanent redirect, a content which has just moved from one place to another. If you got a content you want to get rid of, do you want to give google the message "hey that low quality content is not where you found it but now it's here", no. You wan't to give google the message that the low quality content has been improved or removed. And a 404 is the right message to give him if you deleted that content.
It's prefectly normal to have 404s in a website, many 404 won't hurt your rankings, only if those pages were ranking already so users will receive a 404 instead and if some external sites were linking there in that case you may consider a 301.
While I think that google has a sort of a black list (and a white list too) I don't think that it has a memory of bad sites he encounters, if you fix your issues you'll start to rank again.
The issue you may have is not that you're site may be tainted but that maybe you still have some issues here and there which you didn't fix. As it seems Googlers said that Panda is now part of the algo so if you fix your issues you won't need any upgrade to start re ranking.
Hope this may have helped!! G luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEO implications of using Marketing Automation landing pages vs on-site content
Hi there, I'm hoping someone can help here... I'm new to a company where due to the limitations of their Wordpress instance they've been creating what would ordinarily be considered pages in the standard sitemap as landing pages in their Pardot marketing automation platform. The URL subdomain is slightly different. Just wondering if anybody could quickly outline the SEO implications of doing this externally instead of directly on their site? Hope I'm making some sense... Thanks,
Intermediate & Advanced SEO | | philremington
Phil1 -
Identifying Duplicate Content
Hi looking for tools (beside Copyscape or Grammarly) which can scan a list of URLs (e.g. 100 pages) and find duplicate content quite quickly. Specifically, small batches of duplicate content, see attached image as an example. Does anyone have any suggestions? Cheers. 5v591k.jpg
Intermediate & Advanced SEO | | jayoliverwright0 -
HTTP to HTTPS Question
Hello, I have a question regarding SSL Certificates I think I know the answer to but wanted to make sure. One of our clients’ site uses http for their pages but when they started creating Registration forms they created a full duplicate site on https (so now there are two versions of all of the pages). I know due to duplicate concerns this could be an issue and needs to resolved (as well as the pros and cons of both) but if they are already set up with https does it make sense to just move everything there or in some instances would it pay to keep some pages http (using canonical tags, redirects, htccess…etc)? – Most of the information I found related to making the decision prior to having both or describing the process but I couldn’t find anything that specifically related to if both are already present. I thought that the best approach because everything’s already set up is to just move everything over to the more secure one but was curious if anybody had any insight? Thank you in advance.
Intermediate & Advanced SEO | | Ben-R0 -
301 issues
Hi, I have this site: www.berenjifamilylaw.com. We did a 301 from the old site: www.bestfamilylawattorney.com to the one above. It's been several weeks now and Google has indexed the new site, but still pulls the old one on search terms like: Los Angeles divorce lawyer. I'm curious, does anyone have experience with this? How long does it take for Google to remove the old site and start serving the new one as a search result? Any ideas or tips would be appreciated. Thanks.
Intermediate & Advanced SEO | | mrodriguez14400 -
Advanced SEO question.
Hi, I manage and do the SEO for this site: www.aerlawgroup.com. If you Google "Los Angeles Criminal Defense Attorney", you can see I rank well (1st page). I have managed to achieve similar rankings for interior pages within the site: www.aerlawgroup.com/domestic-violence.html (Google: "Los Angeles Domestic Violence Attorney".) Here is my question. No matter how hard I try, I cannot get to the first page on Google for the search term: "Los Angeles DUI Lawyer", for the following interior page: www.aerlawgroup.com/dui.html. Is there anyway that I can pass the authority/ranking (not sure what to call it) that I have for www.aerlawgroup.com to www.aerlawgroup.com/dui.html so that internal page ranks higher for "Los Angeles DUI Lawyer"? I apologize if my question doesn't make sense. In a nutshell, I'm trying to understand if there is anyway to use the ranking I have for www.aerlawgroup.com to help me rank higher for Los Angeles DUI lawyer for the dui interior page. If not, are there any other suggestions anyone has to achieve a higher ranking? Thanks!
Intermediate & Advanced SEO | | mrodriguez14400 -
How would you structure this content?
We have a site where we write about our son who was born with Down syndrome. I had a question regarding some content I'm trying to create and structure and hoping you guys can point me in the right direction. One of the things we are often asked by new parents is what toys we suggest for people to buy for their child with Down syndrome, or as gifts for a friend who has a child with Down syndrome. So I'd like to write some posts that suggest great toys for each year of a kids life (and continue that as Noah grows.) However, there are some variations of key words that I would like to rank for as well and it gets a little messy, which is where I need the help. For example for each year I could have a post titled: Top Ten (I could also change out top ten for Best, etc..)Toys For A One Year Old with Down Syndr Top Ten Christmas Gift Ideas For A One Year Old With Down Syndrome Top Ten Birthday Gift Ideas For a One Year Old With D.S. Top Ten Learning Toys For A One Year Old With D.S. Top Ten Toys Under 25 Dollars For A One Year Old with DS Top Ten Developmental Toys for a One Year Old With DS Top Ten Fisher Price Toys for a child with ds Best Light Up Toys For a one year old with ds best muscial toys for a one year old with ds I could also think of other variations as well. Also I can make each of these with the various ages. 2 year old, 3 year old, etc... So I'm not sure what the best way to go is. I could easily have a ton of content that is all virtually the same (birthday gifts / christmas gifts..although I could suggest different toys) so I'd have a ton of different toys pages trying to rank for one term each that is good for google searchers but probably not so great for folks coming to my site as I would have toy pages scattered all over the site. I also don't know how landing pages would fit in to all of this. Would I want a "Down Syndrome Toy Guide" landing page, or "Down Syndrome Gift Guide" ... or both...or something else, and then link all of those other pages on that page? I have a few pages on my site now that I wrote before I started to think about all the different combinations I wanted to rank for: http://noahsdad.com/gift-ideas-down-syndrome/ and http://noahsdad.com/best-fisher-price-learning-toys/ I'm open to any feedback you guys may have on this. I'd also like to do some posts on "Down Syndrome Books" and hope to use the same info that you guys give me and apply to books. (Therapy books, touch and feel books, resource books, new parents books, etc..) Hoping some folks chime in as your help would really be appreciated.
Intermediate & Advanced SEO | | NoahsDad0 -
How to Remove Joomla Canonical and Duplicate Page Content
I've attempted to follow advice from the Q&A section. Currently on the site www.cherrycreekspine.com, I've edited the .htaccess file to help with 301s - all pages redirect to www.cherrycreekspine.com. Secondly, I'd added the canonical statement in the header of the web pages. I have cut the Duplicate Page Content in half ... now I have a remaining 40 pages to fix up. This is my practice site to try and understand what SEOmoz can do for me. I've looked at some of your videos on Youtube ... I feel like I'm scrambling around to the Q&A and the internet to understand this product. I'm reading the beginners guide.... any other resources would be helpful.
Intermediate & Advanced SEO | | deskstudio0 -
Removing Duplicate Content Issues in an Ecommerce Store
Hi All OK i have an ecommerce store and there is a load of duplicate content which is pretty much the norm with ecommerce store setups e.g. this is my problem http://www.mystoreexample.com/product1.html
Intermediate & Advanced SEO | | ChriSEOcouk
http://www.mystoreexample.com/brandname/product1.html
http://www.mystoreexample.com/appliancetype/product1.html
http://www.mystoreexample.com/brandname/appliancetype/product1.html
http://www.mystoreexample.com/appliancetype/brandname/product1.html so all the above lead to the same product
I also want to keep the breadcrumb path to the product Here's my plan Add a canonical URL to the product page
e.g. http://www.mystoreexample.com/product1.html
This way i have a short product URL Noindex all duplicate pages but do follow the internal links so the pages are spidered What are the other options available and recommended? Does that make sense?
Is this what most people are doing to remove duplicate content pages? thanks 🙂0