How to identify 404 that get links from external sites (but not search engines)?

lcourse

one of our site had a poor site architecture causing now about 10.000s of 404 being currently reported in google webmaster tools.

Any idea about easily detecting among these thousands of 404, which ones are coming from links from external websites (so filtering out 404 caused by links from our own domain and 404 from search engines)?
crawl bandwidth seems to be an issue on this domain. Anything that can be done to accelerate google removing these 404 pages from their index? Due to number of 404 manual submission in google wbt one by one is not an option.
Or do you believe that google automatically will stop crawling these 404 pages within a month or so and no action needs to be taken?

thanks

lcourse

Hi Robert,

thanks a lot. So I will not take action to get 404s out of google index.

Regarding your first point, I am not sure I understand how screaming frog would help. I did not use screaming frog yet but link sleuth for status code checks. The status check of 404 in google webmaster tools will probably generally also give 404 status in screaming frog. My objective is to identify among these thousands of 404, the few which are caused by inaccurate or outdated links on external websites so that I can create a 301 for these.

Best,

Daniel

RobertFisher

icourse

I would suggest downloading the free version of screaming frog for an easy way to get status codes on any or all links.

As to fixing and "crawl bandwidth" being a problem, I disagree. If you are not being crawled it is because of all the 404's. I do not know the timeline for inaction on this, but I do believe "manual submission is not an option" is a recipe for disaster. Because fully analyzing your issues is outside the scope of Q&A, I would suggest you start manually fixing the issues and if on a CMS, start looking at plugins, etc. as a root cause.

Hope that helps

Robert

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

How to identify 404 that get links from external sites (but not search engines)?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

What IP Address does Googlebot use to read your site when coming from an external backlink?

Do I eventually 301 a page on our site that "expires," to a page that's related, but never expires, just to utilize the inbound link juice?

Organic search data not representative of site Authority, need advice

What's the best way to check Google search results for all pages NOT linking to a domain?

When crawls occur - when will my links show up in Open Site Explorer

Block search engines from URLs created by internal search engine?

Best way to find broken links on a large site?

Steps you can take to ensure your content is indexed and registered to your site before a scraper gets to it?