Mass 404 Checker?
-
Hi all,
I'm currently looking after a collection of old newspaper sites that have had various developments during their time. The problem is there are so many 404 pages all over the place and the sites are bleeding link juice everywhere so I'm looking for a tool where I can check a lot of URLs at once.
For example from an OSE report I have done a random sampling of the target URLs and some of them 404 (eek!) but there are too many to check manually to know which ones are still live and which ones have 404'd or are redirecting. Is there a tool anyone uses for this or a way one of the SEOMoz tools can do this?
Also I've asked a few people personally how to check this and they've suggested Xenu, Xenu won't work as it only checks current site navigation.
Thanks in advance!
-
Hi,
we are seo agency at turkey, our name clicksus. We can deadlinkchecker.com and it is very easy & good.
-
Glad I was able to help!
It would be great if you could mark the answers you found helpful, and mark the question as answered if you feel you got the information you needed. That will make it even more useful for other users.
Paul
-
Wow nice one mate did not know that in the Top Pages tab that is perfect! I'll remember to click around more often now.
I found this tool on my adventures which was exactly what I was after: http://www.tomanthony.co.uk/tools/bulk-http-header-compare/
Also cheers for your walkthrough, having problems with the site still bleeding 404 pages, first thing first however is fixing these pages getting high quality links to them
Cheers again!
-
Sorry, one additional - since you mentioned using Open Site Explorer...
Go to the Top Pages tab in OSE and filter the results to include only incoming links. One of the columns in that report is HTTP Status. It will tell you if the linked page's status is 404. Again, just download the full CSV, sort the resulting spreadsheet by the Status column and you'll be able to generate a list of URLs that no longer have pages associated with them to start fixing.
Paul
-
Ollie, if I'm understanding your question correctly, the easiest place for you to start is with Google Webmaster Tools. You're looking to discover URLs of pages that used to exist on the sites, but no longer do, yes?
If you click on the Health link in left sidebar, then click Crawl Errors, you get a page showing different kinds of errors the Google crawler has detected. Click on the Not Found error box and you'll get a complete list of all the pages Google is aware of that can no longer be found on your site (i.e. 404s).
You can then download the whole list as a CSV and start cleaning them up from there.
This list will basically include pages that have been linked to at one time or another from other sites on the web, so while not exhaustive, it will show the ones that are most likely to still be getting traffic. For really high-value incoming links, you might even want to contact the linking site and see if you can get them to relink to the correct new page.
Alternatively, if you can access the sites' server logs, they will record all the incoming 404s with their associated URLs as well and you can get a dump from the log files to begin creating your work list. I just find it's usually easier to get access to Webmaster Tools than to get at a clients server log files.
Is that what you're looking for?
Paul
-
To be honest, I don't know anyone who has bad things to say about Screaming Frog - aside from the cost, but as you said, really worth it.
However, it is free for up to 500 page crawl limit, so perhaps give it a go?
Andy
-
Cheers Andy & Kyle
Problem with this tool as it works similar to Xenu which is great for making sure your current navigation isn't causing problems.
My problem is there are over 15k links pointing to all sorts of articles and I have no idea what's live and what's not. Running the site through that tool won't report the pages that aren't linked in the navigation anymore but are still being linked to.
Example is manually checking some of the links I've found that the site has quite a few links from the BBC going to 404 pages. Running the site through Xenu or Screamy Frog doesn't find these pages.
Ideally I'm after a tool I can slap in a load of URLs and it'll do a simple HTTP header check on them. Only tools I can find do 1 or 10 at a time which would take quite a while trying to do 15k!
-
Agree with Screaming Frog. It's more comprehensive than **Xenu's Link Sleuth. **
It costs £99 for a year but totally worth it.
I had a few issues with Xenu taking too long to compile a report or simply crashing.
-
Xenu Liunk Seuth - its free and will go through internal links, external or both, it will also show you where the 404 page is being linked from.
Also can report 302s.
-
Screaming Frog Spider does a pretty good job...
As simple as enter the URL and leave it to report back when completed.
Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What do you do with product pages that are no longer used ? Delete/redirect to category/404 etc
We have a store with thousands of active items and thousands of sold items. Each product is unique so only one of each. All products are pinned and pushed online ... and then they sell and we have a product page for a sold item. All products are keyword researched and often can rank well for longtail keywords Would you :- 1. delete the page and let it 404 (we will get thousands) 2. See if the page has a decent PA, incoming links and traffic and if so redirect to a RELEVANT category page ? ~(again there will be thousands) 3. Re use the page for another product - for example a sold ruby ring gets replaces with ta new ruby ring and we use that same page /url for the new item. Gemma
Technical SEO | | acsilver0 -
404 Errors for Form Generated Pages - No index, no follow or 301 redirect
Hi there I wonder if someone can help me out and provide the best solution for a problem with form generated pages. I have blocked the search results pages from being indexed by using the 'no index' tag, and I wondered if I should take this approach for the following pages. I have seen a huge increase in 404 errors since the new site structure and forms being filled in. This is because every time a form is filled in, this generates a new page, which only Google Search Console is reporting as a 404. Whilst some 404's can be explained and resolved, I wondered what is best to prevent Google from crawling these pages, like this: mydomain.com/webapp/wcs/stores/servlet/TopCategoriesDisplay?langId=-1&storeId=90&catalogId=1008&homePage=Y Implement 301 redirect using rules, which will mean that all these pages will redirect to the homepage. Whilst in theory this will protect any linked to pages, it does not resolve this issue of why GSC is recording as 404's in the first place. Also could come across to Google as 100,000+ redirected links, which might look spammy. Place No index tag on these pages too, so they will not get picked up, in the same way the search result pages are not being indexed. Block in robots - this will prevent any 'result' pages being crawled, which will improve the crawl time currently being taken up. However, I'm not entirely sure if the block will be possible? I would need to block anything after the domain/webapp/wcs/stores/servlet/TopCategoriesDisplay?. Hopefully this is possible? The no index tag will take time to set up, as needs to be scheduled in with development team, but the robots.txt will be an quicker fix as this can be done in GSC. I really appreciate any feedback on this one. Many thanks
Technical SEO | | Ric_McHale0 -
Panda Cleanup - Removing Old Blog Posts, Let Them 404 or 301 to Main Blog Page?
tl;dr... Removing old blog posts that may be affected by Panda, should we let them 404 or 301 to the Blog? We have been managing a corporate blog since 2011. The content is OK but we've recently hired a new blogger who is doing an outstanding job, creating content that is very useful to site visitors and is just on a higher level than what we've had previously. The old posts mostly have no comments and don't get much user engagement. I know Google recommends creating great new content rather than removing old content due to Panda concerns but I'm confident we're doing the former and I still want to purge the old stuff that's not doing anyone any good. So let's just pretend we're being dinged by Panda for having a large amount of content that doesn't get much user engagement (not sure if that's actually the case, rankings remain good though we have been passed on a couple key rankings recently). I've gone through Analytics and noted any blog posts that have generated at least 1 lead or had at least 20 unique visits all time. I think that's a pretty low barrier and everything else really can be safely removed. So for the remaining posts (I'm guessing there are hundreds of them but haven't compiled the specific list yet), should we just let them 404 or do we 301 redirect them to the main blog page? The underlying question is, if our primary purpose is cleaning things up for Panda specifically, does placing a 301 make sense or would Google see those "low quality" pages being redirected to a new place and pass on some of that "low quality" signal to the new page? Is it better for that content just to go away completely (404)?
Technical SEO | | eBoost-Consulting0 -
Soft 404 errors
Google webmaster tools is telling me I have 8 "soft 404's". They are all like this page...
Technical SEO | | sdwellers
http://www.seadwellers.com/search/page/8/ All 8 pages are the same except the number at the end...... I just can't figure this....any insight at all is appreciated and do i need to correct somehow?0 -
Why is Google Webmaster Tools showing 404 Page Not Found Errors for web pages that don't have anything to do with my site?
I am currently working on a small site with approx 50 web pages. In the crawl error section in WMT Google has highlighted over 10,000 page not found errors for pages that have nothing to do with my site. Anyone come across this before?
Technical SEO | | Pete40 -
301 on ALL 404 for Pagerank Recovery, bad idea?
Its a bad idea to do a 301 to the home page of the site on all 404 pages as a way of pagerank recovery? If so, why? Thanks,
Technical SEO | | Bligoo0 -
Is this 404 page indexed?
I have a URL that when searched for shows up in the Google index as the first result but does not have any title or description attached to it. When you click on the link it goes to a 404 page. Is it simply that Google is removing it from the index and is in some sort of transitional phase or could there be another reason.
Technical SEO | | bfinternet0