How to find all 404 deadlinks - webmaster only allows 1000 to be downloaded...
-
Hi Guys
I have a question...I am currently working on a website that was hit by a spam attack.
The website was hacked and 1000's of adult censored pages were created on the wordpress site.
The hosting company cleared all of the dubious files - but this has left 1000's of dead 404 pages.
We want to fix the dead pages but Google webmaster only shows and allows you to download 1000.
There are a lot more than 1000....does any know of any Good tools that allows you to identify all 404 pages?
Thanks, Duncan
-
The Moz crawl report will also show 404s. I sometimes find that different spiders may find different things. Between the Search Console report, Screaming Frog (great investment) and Moz, you should have a nice collection of things to fix.
-
I must second Dirk's suggestion of screaming frog, great tool and I use it daily, a license is well worth the cost. Although spider crawl of the site will only point out 404's that have are links from an existing page, so if the hosting company cleaned up the not all of these 404's will surface.
One approach I would suggest is run the current 1000 404's in GWT through Screaming frog as a manually added list, (do it in 2 batches if you have the free version), start a spreadsheet of the resulting 404's and start working through that list. Once you have the 404's mark those as fixed as GWT tools set a reminder to check back in a few days and after a few days export the new list of 1000 404's and run these through screaming frog adding the resulting list to your spreadsheet. Keep doing this until you get the 404's errors in GWT down a manageable level.
I hope that helps, good luck.
-
Probably the easiest solution is to buy a licence from Screaming Frog & to crawl your site locally. The tool can do a lot of useful stuff to audit sites and will show you not only the full list of 4xx errors but also the pages that link to them.
There is also a free version but that allows you to crawl only 500 pages - which in your case is probably not sufficient but it would allow you to see how it works.
Hope this helps,
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301 / 404 & Getting Rid of Keyword Pages
I had a feeling that my keyword focused pages were causing my site not to rank well. I do not have that many keywords. I have 2 main keyword phrases along with 6 city locations. For example (fake) "tea house tampa" "tea house clearwater" "tea house sarasota" and "tea room tampa" "tea room cleawater" "tea house sarasota". So, I don't feel that I need that many pages. I feel like I can optimize my home page and maybe 1 or 2 topic pages. Right now, I have a keyword for each of those phrases. These are all internal pages on 1 domain. Not multiple domains. Sooo... I tested it by 301ing a few of my "tea house" KW pages to the home page. And low and behold... my home page rose BIG TIME! Major improvement! I'm talking like 13th to 2nd! Here is my question... how should I proceed? My SEO has warned me against 301ing too many pages all pointing to the home page. He says that will negatively impact my ratings. Should I 404 some pages? Should I build a "tea room" topic page and 301 that set there? What is worse? 301 or 404? How many is too many? I'm really excited by these results, but I'm scare to move forward and hurt what has happened. Thanks in advance!
Intermediate & Advanced SEO | | CalicoKitty20000 -
Mystery 404's
I have a large number of 404's that all have a similar structure: www.kempruge.com/example/kemprugelaw. kemprugelaw keeps getting stuck on the end of url's. While I created www.kempruge.com/example/ I never created the www.kempruge.com/example/kemprugelaw page or edited permalinks to have kemprugelaw at the end of the url. Any idea how this happens? And what I can do to make it stop? Thanks, Ruben
Intermediate & Advanced SEO | | KempRugeLawGroup0 -
Ecommerce SEO - Indexed product pages are returning 404's due to product database removal. HELP!
Hi all, I recently took over an e-commerce start-up project from one of my co-workers (who left the job last week). This previous project manager had uploaded ~2000 products without setting up a robot.txt file, and as a result, all of the product pages were indexed by Google (verified via Google Webmaster Tool). The problem came about when he deleted the entire product database from our hosting service, godaddy and performed a fresh install of Prestashop on our hosting plan. All of the created product pages are now gone, and I'm left with ~2000 broken URL's returning 404's. Currently, the site does not have any products uploaded. From my knowledge, I have to either: canonicalize the broken URL's to the new corresponding product pages, or request Google to remove the broken URL's (I believe this is only a temporary solution, for Google honors URL removal request for 90 days) What is the best way to approach this situation? If I setup a canonicalization, would I have to recreate the deleted pages (to match the URL address) and have those pages redirect to the new product pages (canonicalization)? Alex
Intermediate & Advanced SEO | | byoung860 -
How much content on PDF download page
Hello, This is about content for an ecommerce site. We have an article page that we also created a PDF out of. We have an HTML page that doesn't have anything commercial on it that is the download page for the PDF page. How much of the article do you recommend we put on the non-commercial HTML download page? Should we put most of the article on there? We're trying to get people to link to the HTML Download page, not the PDF.
Intermediate & Advanced SEO | | BobGW0 -
Status Code: 404 Errors. How to fix them.
Hi, I have a question about the "4xx Staus Code" errors appearing in the Analysis Tool provided by SEOmoz. They are indicated as the worst errors for your site and must be fixed. I get this message from the good people at SEOmoz: "4xx status codes are shown when the client requests a page that cannot be accessed. This is usually the result of a bad or broken link." Ok, my question is the following. How do I fix them? Those pages are shown as "404" pages on my site...isn't that enough? How can fix the "4xx status code" errors indicated by SEOmoz? Thank you very much for your help. Sal
Intermediate & Advanced SEO | | salvyy0 -
403, 301, 302, 404 errors & possible google penalty
William Rock ran a Xenu site scan on nlpca(dot)com and mentioned the following: ...ran a test with Xenu site scan and it found a lot of broken links with 403, 301, 302, 404 Errors. Other items found: Broken page-local links (also named 'anchors', 'fragmentidentifiers'): http://www.nlpca.com/DCweb/Interesting_NLP_Sites.html#null anchor occurs multiple timeshttp://www.nlpca.com/DCweb/Interesting_NLP_Sites.html#US not found Could somone give us an output of that list, and which ones of these errors do we need to clean up for SEO purposes? Thank you.
Intermediate & Advanced SEO | | BobGW0 -
301 vs. 404
If a listing on a website is no longer available to display is it better to resolve to a 301 redirect or use a 404? I know from an SEO point of view a 301 will pass on the link value, but is that as valuable as saying tto the user hey that page is no lonoger available try something else?
Intermediate & Advanced SEO | | AU-SEO0 -
Need Help Finding Directories to Submit To
I am looking for a lot of free "do follow" technology directories to submit to. Does anyone know of a good directory or a list of some sort of technology directories or something similar? Actually, I guess any directory that has a technology category would be helpful.
Intermediate & Advanced SEO | | MyNet0