Salvaging links from WMT “Crawl Errors” list?
-
When someone links to your website, but makes a typo while doing it, those broken inbound links will show up in Google Webmaster Tools in the Crawl Errors section as “Not Found”. Often they are easy to salvage by just adding a 301 redirect in the htaccess file.
But sometimes the typo is really weird, or the link source looks a little scary, and that's what I need your help with.
First, let's look at the weird typo problem. If it is something easy, like they just lost the last part of the URL, ( such as www.mydomain.com/pagenam ) then I fix it in htaccess this way:
RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.mydomain.com$
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
But what about when the last part of the URL is really screwed up? Especially with non-text characters, like these:
www.mydomain.com/pagename1.htmlsale www.mydomain.com/pagename2.htmlhttp:// www.mydomain.com/pagename3.html" www.mydomain.com/pagename4.html/
How is the htaccess Rewrite Rule typed up to send these oddballs to individual pages they were supposed to go to without the typo?
Second, is there a quick and easy method or tool to tell us if a linking domain is good or spammy? I have incoming broken links from sites like these:
www.webutation.net titlesaurus.com www.webstatsdomain.com www.ericksontribune.com www.addondashboard.com search.wiki.gov.cn www.mixeet.com dinasdesignsgraphics.com
Your help is greatly appreciated. Thanks!
Greg
-
Hi Gregory -
Yes, as Frederico mentions you do not have to put the rewrite cond. before every rewrite since it the htaccess is on your root its implied. You might need to do this if you creating multiple redirects for www to non-www etc.
Also Frederico is right - this isnt the best way to deal with these links, but I use a different solution. First I get a flat file of my inbound links using other tools as well as WMT, and then i run them through a test to ensure that the linking page still exist.
Then I go through the list and just remove the scraper / stats sites like webstatsdomain, alexa etc so that the list is more manageable. Then I decide which links are ok to keep (there's no real quick way to decide, and everyone has their own method). But the only links are "bad" would be ones that may violate Google's Webmaster Guidelines.
Your list should be quite small at this point, unless you had a bunch of links to a page that you subsequently moved or changed its URL. In that case, add the rewrite to htaccess. The remaining list you can simply contact the sites and notify them of the broken link and ask to have it fixed. This is the best case scenario (instead of having it go to a 404 or even a 301 redirect). If its a good link, its worth the effort.
Hope that helps!
-
Exactly.
Let's do some cleanup
To redirect everything domain.com/** to www.domain.com you need this:
RewriteCond %{HTTP_HOST} !=www.domain.com [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]That's it for the www and non-www redirection.
Then, you only need one line per 301 redirection you want to do, without the need of specifying those rewrite conds you had previously, doing it like this:
RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]
That will in fact redirect any www/non-www page like pagename1.htmlhgjdfh to www.domain.com/pagename1.html. The (.*) acts as a wildcard.
You also don't need to type the domain as you did in your examples. You just type the page (as it is in your same domain, you don't need to specify it): pagename1.html
-
Thank you Federico. I did not know about the ability to use (.*)$ to deal with any junk stuck to the end of html
So when you said "the rewrite conds are not needed" do you mean that instead of creating three lines of code for each 301 redirect, like this...
RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.mydomain.com$
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
...that the first two lines can be removed? So each 301 redirect rules is just one line like this...
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
...without causing problems if the visitor is coming into the mydomain.com version or the www.mydomain.com version?
If so, that will sure help decrease the size of the file. But I thought that if we are directing everything to the www version, that those first two lines were needed.
Thanks again!
-
Well, if you still want to go that way, the rewrite conds there are not needed (as it is given that the htaccess IS in your domain). Then a rewrite rule for www.mydomain.com/pagename1.htmlsale should be:
RewriteRule ^pagename1.htmlsale$ pagename1.html [R=301,L]
Plus a rule to cover everything that is pagename1.html*** such as pagename1.html123, pagename1.html%22, etc. can be redirected with this rule:
RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]
-
Thanks Federico, I do have a good custom 404 page set up to help those who click a link with a typo.
But I still would like to know how to solve the questions asked above...
-
Although you can redirect any URL to the one you consider they wanted to link, you may end up with hundreds of rules in your htaccess.
I personally wouldn't use this approach, instead, you can build a really good 404 page, which will look into the typed URL and show a list of possible pages that the user was actually trying to reach, while still returning a 404 as the typed URL actually doesn't exists.
By using the above method you also avoid worrying about those links as you mentioned. No linkjuice is passed tho, but still traffic coming from those links will probably get the content they were looking for as your 404 page will list the possible URLs they were trying to reach...
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Surge in spammy links
Hi, Our website www.foodjet.com has recently seen a huge amount of spammy incoming links to non-exisiting URLS: They all target pages that lead to a 404 and which clearly do not exist on our website. Since they have started to appear our DA has plummeted. I have already disavowed some domains, but more re-appear just as fast. I have also checked if our site has been hacked, which does not seem to be the case. What am I missing? And/or what can I do?
Technical SEO | | FoodJEt0 -
Internal no follow links
I have just discovered that the WordPress theme I have been using for some time has no follow internal links on the blog. Simply put each post has an image and text link plus a 'read more'. The Read more is a no-follow which is also on my homepage. The developer is saying duplicate follow links are worse than an internal no follow. What is your opinion on this? Should I spend time removing the no follow?
Technical SEO | | Libra_Photographic0 -
What is meant by to many on page links
I have just done the report for my site http://www.in2town.co.uk and it says i have 246 on page links but i am not sure how come i have got that many. I know i have a large number of links and in the old days it says that you should keep the links under 100 but now with website speed and the net, people are saying this is no longer listened to. A report i read said that the links should not confuse the reader or put them off, so i am just wondering what your thoughts are on a site with over a 100 links on the home page and also if my site does have to many links what should i do about it. I cannot understand why it is showing 246 when i do not see that many on the page, any advice would be great
Technical SEO | | ClaireH-1848860 -
Client error 404
I have an 404 error but what does that mean? I go to the site and click on the link to exampleX.com there is no problem. What can it be? The error message http://www.example.com/www.example.com/exampleX.html
Technical SEO | | mato0 -
Crawl Diagnostics - How to find where broken links are located?
Hi, One of my sites has a 4xx error that has been picked up in the crawl diagnostics section. It is a broken link. Does anybody know if it is possible for me to find out which page the broken link was found on? I have checked all of the pages on the site that I thought were linking to the page that seems to have a problem but all of these links are fine / not broken. Any ideas? Thanks
Technical SEO | | CherryK0 -
4XX Broken Links
I am attempting to fix the issues SEOmoz found when crawling my site. I have a list of 4XX errors that I am attempting to fix. Basically I know one option is to redirect them to another page, but I would like to have the option to remove the links completely. The only problem is I can not find where the links are located. Does SEOmoz provide where on my site these broken links are? Or do they only provide the url that is linked to?
Technical SEO | | ClaytonKendall0 -
Link Share Matrix
Our developers have requested our "Link Share Matrix" - does anyone know what a Link Share Matrix is? Google and the Wiki haven't provided any decent results so I still don't know exactly what it is. Thanks in advance 🙂
Technical SEO | | Seaward-Group0 -
A Puzzling Link
I'm stumped and I'm hoping some mozzers will be able to help. I run our company blog (http://scottymacblog.com/). The last couple of days I have noticed that the blog is receiving some traffic from cnn.com. I looked, but cannot find any mention of the blog on cnn. Adding to my frustration is that the content on cnn is constantly changing. Our blog doesn't do any sort of advertising and no one affiliated with the blog posts on cnn. As great as it is to be getting traffic from such a valued source, I have no idea why. Has something like this happened to (for?) anyone else? Any ideas on how I can research the source of the link? Thanks in advance!
Technical SEO | | EssEEmily0