Old pages STILL indexed...
-
Our new website has been live for around 3 months and the URL structure has completely changed. We weren't able to dynamically create 301 redirects for over 5,000 of our products because of how different the URL's were so we've been redirecting them as and when.
3 months on and we're still getting hundreds of 404 errors daily in our Webmaster Tools account. I've checked the server logs and it looks like Bing Bot still seems to want to crawl our old /product/ URL's. Also, if I perform a "site:example.co.uk/product" on Google or Bing - lots of results are still returned, indicating the both still haven't dropped them from their index.
Should I ignore the 404 errors and continue to wait for them to drop off or should I just block /product/ in my robots.txt? After 3 months I'd have thought they'd have naturally dropped off by now!
I'm half-debating this:
User-agent: *
Disallow: /some-directory-for-all/*User-agent: Bingbot
User-agent: MSNBot
Disallow: /product/Sitemap: http://www.example.co.uk/sitemap.xml
-
Yea. If you cannot do it dynamically, it gets to be a real PIA, and also, depending on how you setup the 301s, you may get an overstuffed .htaccess file that could cause problems.
If these pages were so young and did not have any link equity or rank to start with, they are probably not worth 301ing.
One tool you may want to consider is URLprofiler http://urlprofiler.com/ You could take all the old URLs and have URL profiler pull in GA data (from when they were live on your site) and then also pull in OSE data from Moz. You can then filter them and see what pages got traffic and links. Take those select "top pages" and make sure they 301 to the correct page on the new URL structure and then go from there. URL profiler has a free 15 day trial that you could use for this project and get done at no charge. But after using the product, you will see it is pretty handy and may buy anyway.
Ideally, if you could have dynamically 301ed the old pages to the new, that would have been the simplest method, but with your situation, I think you are ok. Google is just trying to help to make sure you did not "mess up" and 404 those old pages on accident. It wants to give you the benefit of the doubt. It is crazy sometimes how they keep things in the index.
I am monitoring a site that scraped one of my sites. They shut the entire site down after we threatened legal action. The site has been down for weeks and showing 404s, but I can still do a site: search and see them in the index. Meh.
-
Forgot to add this - just some free advice. You have your CSS inlined in your HTML. Ideally, you want to have that in an external CSS file. That way, once the user loads that external file, they do not have to download it multiple times so the experience is faster on subsequent pages.
If you were testing your page with Google site speed and they mentioned render blocking CSS issues and that is why you inlined your CSS, the solution is not to inline all your CSS, but to just inline what is above the fold and put the rest in an external file.
Hope that makes sense.
-
I suppose that's the problem. We've spent hours redirecting hundreds of 404 pages to new/relevant locations - but these pages don't receive organic traffic. It's mostly just BingBot, MSNBot and GoogleBot crawling them because they're still indexed.
I think I'm going to leave them as 404 rather than trying to keep on top of 301 redirecting them and I'll leave it in Google's hands to eventually drop them off!
Thanks!
Liam
-
General rule of thumb, if a page 404s and it is supposed to 404 dont worry about it. The Search Console 404 report does not mean that you are being penalized although it can be diagnostic. If you block the 404 pages in robots.txt yea, it will take the 404 errors out of the Search Console report, but then Google never "deals" with those 404s. It can take 3 months (maybe longer) to get things out of Search Console, I have noticed it taking longer here lately, but what you need to do first is ask the following questions
-
Do I still link internally to any of these /product/ URLs? If you do, Google may assume that you are 404ing those pages by mistake and leave them in the report longer as if you are still linking internally to them they must be a viable page.
-
Do any of these old URLs have value? Do they have links to them from external sites? Did they used to rank for a KW? You should probably 301 them to a semantically relevant page then vs 404ing and getting some use out of them.
If you have either of the above, Google may continue to remind you of the 404 as it thinks the page might be valuable and want to "help" you out.
You mention 5,000 URLs that were indexed and then you 404 them. You cannot assume that Search Console works in real time or that Google checks all 5,000 of these URLs at the same time. Google has a given crawl budget for your site on how often it will crawl a given page. Some pages they crawl more often (home page) some pages they crawl less often. They then have to process those crawls once they get the data back. What you will see in a situation like this is that if you 404 several thousand pages, you will first see several hundred show up in your Search Console report, then the next day some more, then some more, etc. Over time, the total will build and then may peak and then gradually start to fall off. Google has to find the 404s, process them and then show them in the report. You may see 500 of your 404 pages today, but then 3 months later, there may be 500 other 404 pages that show up in the report and those original 500 are now gone. This is why you might be seeing 404 errors after 3 months in addition to the examples I gave above.
It would be great if the process were faster and the data was cleaner. The report has a checkbox for "this is fixed" and that is great if you fixed something, but they need a checkbox for "this is supposed to 404" to help clear things out. If I have learned anything about Search Console, it is helpful, but the data in many cases is not real time.
Good luck!
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Changing Canonical Tags on Indexed Pages that are Ranking Well
Hi Guys, I recently rolled out a domain wide canonical tag change. Previously the website had canonical tags without the www, however the website was setup to redirect to www on page load. I noticed that the site competitors were all using www and as far as I understand www versus non www, it's based on preference. In order to keep things consistent, I changed the canonical tag to include the www. Will the site drop in rankings? Especially if the pages are starting to rank quite well. Any feedback is appreciated. Thanks!
Intermediate & Advanced SEO | | QuickToImpress0 -
How long should it take for indexed pages to update
Google has crawled and indexed my new site, but my old URLS appear in the search results. Is there a typical amount of time that it takes for Google to update the URL's displayed in search results?
Intermediate & Advanced SEO | | brianvest0 -
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika
Intermediate & Advanced SEO | | Malika11 -
Proper 301 in Place but Old Site Still Indexed In Google
So i have stumbled across an interesting issue with a new SEO client. They just recently launched a new website and implemented a proper 301 redirect strategy at the page level for the new website domain. What is interesting is that the new website is now indexed in Google BUT the old website domain is also still indexed in Google? I even checked the Google Cached date and it shows the new website with a cache date of today. The redirect strategy has been in place for about 30 days. Any thoughts or suggestions on how to get the old domain un-indexed in Google and get all authority passed to the new website?
Intermediate & Advanced SEO | | kchandler0 -
Category Pages up - Product Pages down... what would help?
Hi I mentioned yesterday how one of our sites was losing rank on product pages. What steps do you take to improve the SERPS of product pages, in this case home/category/product is the tree. There isn't really any internal linking, except one link from the category page to each product, would setting up a host of internal links perhaps "similar products" linking them together be a place to start? How can I improve my ranking of these more deeply internal pages? Not just internal links?
Intermediate & Advanced SEO | | xoffie0 -
Ranking with other pages not index
The site ranks on page 4-5 with other page like privacy, about us, term pages. I encounter this problem allot in the last weeks; this usually occurs after the page sits 1-2 months on page 1 for the terms. I'm thinking of to much use the same anchor as a primary issue. The sites in questions are 1-5 pages microniche sites. Any suggestions is appreciated. Thank You
Intermediate & Advanced SEO | | m3fan0 -
404'd pages still in index
I recently launched a site and shortly after performed a URL rewrite (not the greatest idea, i know). The developer 404'd the old pages instead of a permanent 301 redirect. This caused a mess in the index. I have tried to use Google's removal tool to remove these URL's from the index. These pages were being removed but now I am finding them in the index as just URL's to the 404'd page (i.e. no title tag or meta description). Should I wait this out or now go back and 301 redirect the old URL's (that are 404'd now) to the new URL's? I am sure this is the reason for my lack of ranking as the rest of my site is pretty well optimized and I have some quality links.
Intermediate & Advanced SEO | | mj7750 -
Best way to stop pages being indexed and keeping PageRank
If for example on a discussion forum, what would be the best way to stop pages such as the posting page (where a user posts a topic or message) from being indexed AND not diluting PageRank too? If we added them to the Disallow on robots.txt, would pagerank still flow through the links to those blocked pages or would it stay concentrated on the linking page? Your ideas and suggestions will be greatly appreciated.
Intermediate & Advanced SEO | | Peter2640