404 page not found after site migration
-
Hi,
A question from our developer.
We have an issue in Google Webmaster Tools.
A few months ago we killed off one of our e-commerce sites and set up another to replace it. The new site uses different software on a different domain. I set up a mass 301 redirect that would redirect any URLs to the new domain, so domain-one.com/product would redirect to domain-two.com/product. As it turns out, the new site doesn’t use the same URLs for products as the old one did, so I deleted the mass 301 redirect.
We’re getting a lot of URLs showing up as 404 not found in Webmaster tools. These URLs used to exist on the old site and be linked to from the old sitemap. Even URLs that are showing up as 404 recently say that they are linked to in the old sitemap. The old sitemap no longer exists and has been returning a 404 error for some time now. Normally I would set up 301 redirects for each one and mark them as fixed, but there are almost quarter of a million URLs that are returning 404 errors, and rising.
I’m sure there are some genuine problems that need sorting out in that list, but I just can’t see them under the mass of errors for pages that have been redirected from the old site. Because of this, I’m reluctant to set up a robots file that disallows all of the 404 URLs.
The old site is no longer in the index. Searching google for site:domain-one.com returns no results.
Ideally, I’d like anything that was linked from the old sitemap to be removed from webmaster tools and for Google to stop attempting to crawl those pages.
Thanks in advance.
-
I agree that the 301 redirect would be your best option as you can pass along not only users but the bots to the right page.. You may need to get a developer in to write some regular expressions to parse the incoming request and then automatically find the correct new URL. I have worked on sites with a large number of pages and using some sort of automation is the only way to go.
That said, if you simply want to kill the old URLs you can show the 404s or 410s. As you mention, then you end up with a bunch of 404 errors in GWT. I have been there too, it's like damned if you do, damned if you don't. We had some URLs that were tracking URLs from an old site and we are now here a year later (been showing 410s for over a year on the old tracking URLs) they still show up in GWT as errors.
We are trying a new solution for how to remove these URLs from the index without getting 404 errors. We show a 200 and then we put up a minimal html page with the meta robots noindex tag.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710
"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it. "
So, we allow Google to find the page, get a 200 (so no 404 errors), but then use the meta noindex tag to tell Google to remove it from the index and stop crawling the page.
Remember, this is the "nuclear" option. You only want to do this to remove the pages from the Google index. Someone mentioned using GWT to remove URLs, but if I remember correctly, you only have so many pages you can do this with at a time.
If you list the files within the robots.txt. Google will not spider the files, but then if you remove the page from robots.txt file, they will start to try spidering again. I have seen Google come back a year later on URLs when I take them out of robots. This is what happened to us and so we tried just showing the 410/404, but Google still keeps crawling. We recently moved to this option with the 200/noindexmeta and it seems to be working.
Good luck!
-
You can but the 404s should stop being crawled on their own. There's a webmaster tool that you can use to make that happen faster as well
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=64033
-
Yeah it's a 404 http://www.tester.co.uk/17th-edition-equipment/multifunction-testers/fluke-1651b-multifunction-installation-tester
with over 200,000 404's its a lot to go through and 301. For some reason they it got migrated they just pointed the old url to a new one replacing the root domain name without creating matching url's. Doh.
I was thinking about robot.txt filling them all?
-
A 404 should cause Google to de-index the content. Go to one of the bad URLs and view the headers to make sure that your webserver is returning a status 404 and not just a 404 "page".
As hard and time consuming as it might be, I would still pursue a 301 option. It's the cleanest way to resolve the issue. Just start nibbling at it and you can make a dent. Doing nothing just lets the problem grow.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Reason for robots.txt file blocking products on category pages?
Hi I have a website with thosands of products. On the category pages, all the products are linked to with the code “?cgid” in the URL. But “?cgid” is also blocked in the robots.txt file for some reason. So I'm thinking it's stopping all my products getting crawled by Google. Am I right here? Is there any reason why a website would want to limit so many URL's? I'm only here a week and the sites getting great traffic, so don't want to go breaking it!!! Thanks
Web Design | | Frankie-BTDublin0 -
Footer links on my site... bad for passing page rank?
i've been told that it is possible that google discounts the weight or page rank passed in footer links of websites and my website has the navigation to many of my pages in the footer of each page. My whole website is about 20 pages so each page has links to the 5 most popular pages at the top and the rest of the links are in the footer of each page. Am i losing page rank by having these links in the footer? Should i make my navigation different? I have lots of articles on my site so i thought it might be not only helpful to my readers but give my pages an seo boost if i placed in context links in the body of my articles to other pages of my site. Does this sound like a good idea? Thanks mozzers! Thanks mozzers!
Web Design | | Ron100 -
Is WP okay for E commerce sites?
Do any of you out there use wordpress for an ecommerce site? I'm getting some mixed reviews on it (but it's the internet, so that's bound to happen). Is there any sort of site traffic or page limit that would make using wordpress a bad idea? Thanks, Ruben
Web Design | | KempRugeLawGroup1 -
Should my link href be www or go direct to page?
Hi, just wondering which is the best format for linking to pages. In my navigation at the moment i have links like; Car Repair Services Is this the recommended format or should it be; Car Repair Services Many thanks for any answers. Alex
Web Design | | SeoSheikh0 -
One big page vs. multi-step pages
Hi mozers! Brand new to SEO and LOVING it! Having several key questions that I don't see answered yet, but I'll start with one we've been very curious about. Consider this guide we have for Forming a Delaware Corp.
Web Design | | Mase
https://www.upcounsel.com/Free-Legal/Guide/17/Form-A-Delaware-Corporation This is our overview page, giving you a breakdown of what this process involves. We love this page, but (Question1:) does it lack better real "content" rather than lots of links to the guide process itself? Then, you can start to walk through the guide beginning with step one, where each step has crowd sourced answers to it. But as you see, the step pages are all very similar, except for the answers and step info. (Question 2) Would it be better to put all our answers into the one overview page and skip having separate pages for each step? We like the process and simplicity of seeing one step at a time, but then these pages don't seem to have enough unique content on them. Related, at what point (if any) is a page too big with too much content and considered bad for SEO? We're recovering from a big hit from Google, and slowly recovering by nailing down various SEO mistakes. We DO have great, unique and valueable content - now we just need it to rank!0 -
Hi Everybody. I have a large site that is made up of the main site then a large support site. The support site has a lot of overlapping content and similar titles. Would it be beneficial to separate the two? Thank you. All answers appreciated.
Hi Everybody. I have a large site that is made up of the main site then a large support site. The support site has a lot of overlapping content and similar titles. Would it be beneficial to separate the two? Thank you. All answers appreciated.
Web Design | | arithon0 -
Website Blog causes duplicate pages
Hello, I added a blog to my website, which is hosted at weebly. I was told this would drive traffic but I have actually fallen way, way down in Alexa rankings. When I ran a campaign here, the results show over a 100 errors, all to do with the website blog. It states they are duplicate pages and titles. I dont see a way to rename the pages. Am I better off getting rid of the blog? Thanks
Web Design | | Gardengirl0 -
Duplicate Content Problem on Our Site?
Hi, Having read the SEOMOZ guide and already worried about this previously, I have decided to look further into this. Our site is 4-5 years old, poorly built by a rouge firm so we have to stick with what we have for now. Were I think we might be getting punished is duplicate content across various pages. We have a Brands page, link at top of page. Here we are meant to enter each brand we stock and a little write up on that brands. What we then put in these write ups is used on each brands item page when we click a brand name on the left nav bar. Or when we click a Product Type (eg. Footwear) then click on a brand filter on the left. So this in theory is duplicate content. The SEO title and Meta Description for each brand is then used on the Brands Page and also on each page with the Brands Product on. As we have entered this brand info, you will notice that the page www.designerboutique-online.com/all-clothing/armani-jeans/ has the same brand description in the scroll box at the top as the page www.designerboutique-online.com/shirts/armani-jeans/ and all the other product type pages. The same SEO title and same Meta descriptions. Only the products change from each one. This then applies to each brand we have (at least 15) across about 8 pages. All with different URLs but the same text. Not sure how a 301 or rel: canonical would work for this, as each URL needs to point at specific pages (eg. shirts, shorts etc...). Some brands such as Creative Recreation and Cruyff only sell footwear, so technically I think??? We could 301 to the Footwear/ URL rather than having both all-clothing and footwear file paths? This surely must be down to the bad design? Could we be losing valulable rank and juice because of this issue? And how would I go about fixing it? I want a new site, but funds are tight. But if this issue is so big that only a new site would fix it, then maybe the money would need to come forward. What do people make of this? Cheers Will
Web Design | | YNWA0