Salvaging links from WMT “Crawl Errors” list?

GregB123

When someone links to your website, but makes a typo while doing it, those broken inbound links will show up in Google Webmaster Tools in the Crawl Errors section as “Not Found”. Often they are easy to salvage by just adding a 301 redirect in the htaccess file.

But sometimes the typo is really weird, or the link source looks a little scary, and that's what I need your help with.

First, let's look at the weird typo problem. If it is something easy, like they just lost the last part of the URL, ( such as www.mydomain.com/pagenam ) then I fix it in htaccess this way:

RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]

RewriteCond %{HTTP_HOST} ^www.mydomain.com$

RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]

But what about when the last part of the URL is really screwed up? Especially with non-text characters, like these:

www.mydomain.com/pagename1.htmlsale www.mydomain.com/pagename2.htmlhttp:// www.mydomain.com/pagename3.html" www.mydomain.com/pagename4.html/

How is the htaccess Rewrite Rule typed up to send these oddballs to individual pages they were supposed to go to without the typo?

Second, is there a quick and easy method or tool to tell us if a linking domain is good or spammy? I have incoming broken links from sites like these:

www.webutation.net titlesaurus.com www.webstatsdomain.com www.ericksontribune.com www.addondashboard.com search.wiki.gov.cn www.mixeet.com dinasdesignsgraphics.com

Your help is greatly appreciated. Thanks!

Greg

JaredMumford

Hi Gregory -

Yes, as Frederico mentions you do not have to put the rewrite cond. before every rewrite since it the htaccess is on your root its implied. You might need to do this if you creating multiple redirects for www to non-www etc.

Also Frederico is right - this isnt the best way to deal with these links, but I use a different solution. First I get a flat file of my inbound links using other tools as well as WMT, and then i run them through a test to ensure that the linking page still exist.

Then I go through the list and just remove the scraper / stats sites like webstatsdomain, alexa etc so that the list is more manageable. Then I decide which links are ok to keep (there's no real quick way to decide, and everyone has their own method). But the only links are "bad" would be ones that may violate Google's Webmaster Guidelines.

Your list should be quite small at this point, unless you had a bunch of links to a page that you subsequently moved or changed its URL. In that case, add the rewrite to htaccess. The remaining list you can simply contact the sites and notify them of the broken link and ask to have it fixed. This is the best case scenario (instead of having it go to a 404 or even a 301 redirect). If its a good link, its worth the effort.

Hope that helps!

FedeEinhorn

Exactly.

Let's do some cleanup

To redirect everything domain.com/** to www.domain.com you need this:

RewriteCond %{HTTP_HOST} !=www.domain.com [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]

That's it for the www and non-www redirection.

Then, you only need one line per 301 redirection you want to do, without the need of specifying those rewrite conds you had previously, doing it like this:

RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]

That will in fact redirect any www/non-www page like pagename1.htmlhgjdfh to www.domain.com/pagename1.html. The (.*) acts as a wildcard.

You also don't need to type the domain as you did in your examples. You just type the page (as it is in your same domain, you don't need to specify it): pagename1.html

GregB123

Thank you Federico. I did not know about the ability to use (.*)$ to deal with any junk stuck to the end of html

So when you said "the rewrite conds are not needed" do you mean that instead of creating three lines of code for each 301 redirect, like this...

RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]

RewriteCond %{HTTP_HOST} ^www.mydomain.com$

RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]

...that the first two lines can be removed? So each 301 redirect rules is just one line like this...

RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]

...without causing problems if the visitor is coming into the mydomain.com version or the www.mydomain.com version?

If so, that will sure help decrease the size of the file. But I thought that if we are directing everything to the www version, that those first two lines were needed.

Thanks again!

FedeEinhorn

Well, if you still want to go that way, the rewrite conds there are not needed (as it is given that the htaccess IS in your domain). Then a rewrite rule for www.mydomain.com/pagename1.htmlsale should be:

RewriteRule ^pagename1.htmlsale$ pagename1.html [R=301,L]

Plus a rule to cover everything that is pagename1.html*** such as pagename1.html123, pagename1.html%22, etc. can be redirected with this rule:

RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]

GregB123

Thanks Federico, I do have a good custom 404 page set up to help those who click a link with a typo.

But I still would like to know how to solve the questions asked above...

FedeEinhorn

Although you can redirect any URL to the one you consider they wanted to link, you may end up with hundreds of rules in your htaccess.

I personally wouldn't use this approach, instead, you can build a really good 404 page, which will look into the typed URL and show a list of possible pages that the user was actually trying to reach, while still returning a 404 as the typed URL actually doesn't exists.

By using the above method you also avoid worrying about those links as you mentioned. No linkjuice is passed tho, but still traffic coming from those links will probably get the content they were looking for as your 404 page will list the possible URLs they were trying to reach...

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Salvaging links from WMT “Crawl Errors” list?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Multilingual Sitewide Links

Links from a nonexistent domain, what do we do?

302 redirected links not found

WebMaster Tools keeps showing old 404 error but doesn't show a "Linked From" url. Why is that?

Too many on page links

Have a client that migrated their site; went live with noindex/nofollow and for last two SEOMoz crawls only getting one page crawled. In contrast, G.A. is crawling all pages. Just wait?

3 pages crawled?

Why would you remove a canonical link?