Are 17000+ Not Found (404) Pages OK?
-
Very soon, our website will go a rapid change which would result in us removing 95% or more old pages (Right now, our site has around 18000 pages indexed).
It's changing into something different (B2B from B2C) and hence our site design, content etc would change.
Even our blog section would have more than 90% of the content removed.
What would be the ideal scenario be?
- Remove all pages and let those links be 404 pages
- Remove all pages and 301 redirect them to the home page
- Remove all unwanted pages and 301 redirect them to a separate page explaining the change (Although it wouldn't be that relevant since our audience has completely changed)- I doubt it would be ideal since at some point, we'd need ot remove this page as well and again do another redirection
-
Mohit,
Tom's advice will help you determine which pages are worth redirecting and which should just go to a 404 page (which should be customized instead of the browser/host default, and should also return a 404 response code in the http header!). My guess is that pages with links only from scraper sites aren't going to pass the tests laid out by Tom and thus would just go to a 404 page. However, any that have decent external links would fit the criteria and would be candidates for a 301 redirect.
-
Just to add a little to this great reply...
Here is how I would determine if it was worth my time to keep some of the old pages.
If the industry is the same but the end user is different, I would make EVERY attempt to keep those old pages. AuthorRank will matter in the future if you can contribute that information into a particular rel=publisher then I think it will be totally worth the time.
If, however, the information has nothing to do with the industry, then I wouldn't even consider taking the time to figure all of this out. I would have a kick ass 404 page to help people find your new stuff though.
Remember too that when you 301 redirect you do in fact loose some "link juice". (I really hate that phrase) So if the incoming links are of little to now value then a 301 will provide even less.
-
Hi Tom.. Thank you for your advice.
The thing is, we don't want to retain the users. They are not going to serve our cause anymore (We used to spend thousands of dollars every month on server costs just to keep up with teh load. now we are cutting it down- so unwanted users are not really something we want as it would result in load increase)
I'll surely follow your advice on OSE. The thing is, we have lot of link to the pages from scraper sites. I am not sure if it's worth keeping though.
-
Hi there
17,000 is quite a lot. I would look at maybe redirecting some of the URLs and I would do this based on certain criteria.
First of all, it helps to have a complete list of your current URLs. Screaming Frog is a great tool for this and is free.
Once you have your URLs, go into your analytics data and see which pages are attracting users. Take a sample size of about 2-3 months. If you're using Google analytics, click on traffic sources -> sources -> all traffic on the left-hand side.
When the dashboard loads, next to the "Primary Dimension" click other, and from the drop down menu click traffic sources, then landing page.
Any page with more than 5 or 10 visitors could be one worth redirecting. If these are pages that visitors might frequently use to get to your site, ensuring they are redirected might help to not interrupt their user journey. A 404 might put them off and go elsewhere.
Next, I'd look at what pages you might want to save to keep your SEO "strength". Put your URL into OpenSiteExplorer and then once done, click on "top pages". We're interested in the "Inbound Links" column here. Export the file into a CSV then sort the URL list in Excel by the Inbound Link total. You can filter here the pages with less links, so for instance you could remove the pages with 3 inbound links or less. It's a general way of doing things and isn't foolproof, but you will be left with a list of pages that could be getting decent PageRank/link equity. Manually check those pages and their backlinks and if you think they're acceptable, make sure you put in a 301 redirect.
Anything that doesn't match either of these criteria I would leave for a 404. You may be left with a lot, but Google knows that 404s are an accepted part of the course and won't penalise you for them. Check out this webmasters blog link.
Hope this helps with your decision making!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Few pages without SSL
Hi, A website is not fully secured with a SSL certificate.
Intermediate & Advanced SEO | | AdenaSEO
Approx 97% of the pages on the website are secured. A few pages are unfortunately not secured with a SSL certificate, because otherwise some functions on those pages do not work. It's a website where you can play online games. These games do not work with an SSL connection. Is there anything we have to consider or optimize?
Because, for example when we click on the secure lock icon in the browser, the following notice.
Your connection to this site is not fully secured Can this harm the Google ranking? Regards,
Tom1 -
Weight of content further down a page
Hi, A client is trying to justify a design decision by saying he needs all the links for all his sub pages on the top level category page as google won't index them; however the links are available on the sub category and the sub category is linked to from the top level page so I have argued as long as google can crawl the links through the pages they will be indexed and won't be penalised. Am I correct? Additionally the client has said those links need to be towards the top of the page as content further down the page carries less weight; I don't believe this is the case but can you confirm? Thanks again, Craig.
Intermediate & Advanced SEO | | CSIMedia1 -
Removing pages from index
My client is running 4 websites on ModX CMS and using the same database for all the sites. Roger has discovered that one of the sites has 2050 302 redirects pointing to the clients other sites. The Sitemap for the site in question includes 860 pages. Google Webmaster Tools has indexed 540 pages. Roger has discovered 5200 pages and a Site: query of Google reveals 7200 pages. Diving into the SERP results many of the pages indexed are pointing to the other 3 sites. I believe there is a configuration problem with the site because the other sites when crawled do not have a huge volume of redirects. My concern is how can we remove from Google's index the 2050 pages that are redirecting to the other sites via a 302 redirect?
Intermediate & Advanced SEO | | tinbum0 -
Duplicate page title at bottom of page - ok, or bad?
Can I get you experts opinion? A few years ago, we customized our pages to repeat the page title at the bottom of the page. So the page title is in the breadcrumbs at the top, and then it's also at the bottom of the page under all the contents. Here is a sample page: bit.ly/1pYyrUl I attached a screen shot and highlighted the second occurence of the page title. Am worried that this might be keyword stuffing, or over optimizing? Thoughts or advice on this? Thank you so much! ron ZH8xQX6
Intermediate & Advanced SEO | | yatesandcojewelers0 -
Orphan My Home Page
I want to orphan a home page on a site that I own so that the start page becomes site.com/home (or whatever) as opposed to site.com/. I need to accomplish this without associating the former with the latter...meaning no 301. Since this will not be a temporary move, 302 does not seem to work either. And even if I could use it, I don't want to credit / with anything from /home. Is there any way to default the Apache handler to /home without rewriting the URL? Or is there any other solution? The bottom line is, at the end of the day, I need Google to forget about / and anything associated with it, without interrupting the user experience when they request /. Thanks in advance.
Intermediate & Advanced SEO | | NTGproducts0 -
Any downsides of (permanent)redirecting 404 pages to more generic pages(category page)
Hi, We have a site which is somewhat like e-bay, they have several categories and advertisements posted by customers/ client. These advertisements disappear over time and turn into 404 pages. We have the option to redirect the user to the corresponding category page, but we're afraid of any negative impact of this change. Are there any downsides, and is this really the best option we have? Thanks in advance!
Intermediate & Advanced SEO | | vhendriks0 -
How to build links to landing pages?
I have been using link baits like infographics to get quality links to my site and I have observed that these tactics are great to get links to the home page or that particular post page where infographic was originally posted. But we have various other important landing pages and we want to transfer some link equity to those pages. Whenever we publish an infographic we post it on out blog with an embed code carrying anchor text pointed to our site’s home page. People who share our infographic, normally links to the home page or to the post page where they find that particular item. So, what are the possible ways to get links to any other landing page? Can we post some bait on other landing pages as well. I need to know some more techniques to attract deep links. Thanks
Intermediate & Advanced SEO | | shaz_lhr1 -
Page Indexed but not Cached
A section of pages on my site are indexed (I know because they appear in SERPs if I copy and paste a sentence from the content), however according to the text-only cached version of the page they are not being read by Google.Why are they indexed event hough it seems like Google is not reading them..... or is Google in fact reading this text even though it seems like they should not be?Thanks for your assistance.
Intermediate & Advanced SEO | | theLotter0