Salvaging links from WMT “Crawl Errors” list?
-
When someone links to your website, but makes a typo while doing it, those broken inbound links will show up in Google Webmaster Tools in the Crawl Errors section as “Not Found”. Often they are easy to salvage by just adding a 301 redirect in the htaccess file.
But sometimes the typo is really weird, or the link source looks a little scary, and that's what I need your help with.
First, let's look at the weird typo problem. If it is something easy, like they just lost the last part of the URL, ( such as www.mydomain.com/pagenam ) then I fix it in htaccess this way:
RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.mydomain.com$
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
But what about when the last part of the URL is really screwed up? Especially with non-text characters, like these:
www.mydomain.com/pagename1.htmlsale www.mydomain.com/pagename2.htmlhttp:// www.mydomain.com/pagename3.html" www.mydomain.com/pagename4.html/
How is the htaccess Rewrite Rule typed up to send these oddballs to individual pages they were supposed to go to without the typo?
Second, is there a quick and easy method or tool to tell us if a linking domain is good or spammy? I have incoming broken links from sites like these:
www.webutation.net titlesaurus.com www.webstatsdomain.com www.ericksontribune.com www.addondashboard.com search.wiki.gov.cn www.mixeet.com dinasdesignsgraphics.com
Your help is greatly appreciated. Thanks!
Greg
-
Hi Gregory -
Yes, as Frederico mentions you do not have to put the rewrite cond. before every rewrite since it the htaccess is on your root its implied. You might need to do this if you creating multiple redirects for www to non-www etc.
Also Frederico is right - this isnt the best way to deal with these links, but I use a different solution. First I get a flat file of my inbound links using other tools as well as WMT, and then i run them through a test to ensure that the linking page still exist.
Then I go through the list and just remove the scraper / stats sites like webstatsdomain, alexa etc so that the list is more manageable. Then I decide which links are ok to keep (there's no real quick way to decide, and everyone has their own method). But the only links are "bad" would be ones that may violate Google's Webmaster Guidelines.
Your list should be quite small at this point, unless you had a bunch of links to a page that you subsequently moved or changed its URL. In that case, add the rewrite to htaccess. The remaining list you can simply contact the sites and notify them of the broken link and ask to have it fixed. This is the best case scenario (instead of having it go to a 404 or even a 301 redirect). If its a good link, its worth the effort.
Hope that helps!
-
Exactly.
Let's do some cleanup
To redirect everything domain.com/** to www.domain.com you need this:
RewriteCond %{HTTP_HOST} !=www.domain.com [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]That's it for the www and non-www redirection.
Then, you only need one line per 301 redirection you want to do, without the need of specifying those rewrite conds you had previously, doing it like this:
RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]
That will in fact redirect any www/non-www page like pagename1.htmlhgjdfh to www.domain.com/pagename1.html. The (.*) acts as a wildcard.
You also don't need to type the domain as you did in your examples. You just type the page (as it is in your same domain, you don't need to specify it): pagename1.html
-
Thank you Federico. I did not know about the ability to use (.*)$ to deal with any junk stuck to the end of html
So when you said "the rewrite conds are not needed" do you mean that instead of creating three lines of code for each 301 redirect, like this...
RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.mydomain.com$
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
...that the first two lines can be removed? So each 301 redirect rules is just one line like this...
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
...without causing problems if the visitor is coming into the mydomain.com version or the www.mydomain.com version?
If so, that will sure help decrease the size of the file. But I thought that if we are directing everything to the www version, that those first two lines were needed.
Thanks again!
-
Well, if you still want to go that way, the rewrite conds there are not needed (as it is given that the htaccess IS in your domain). Then a rewrite rule for www.mydomain.com/pagename1.htmlsale should be:
RewriteRule ^pagename1.htmlsale$ pagename1.html [R=301,L]
Plus a rule to cover everything that is pagename1.html*** such as pagename1.html123, pagename1.html%22, etc. can be redirected with this rule:
RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]
-
Thanks Federico, I do have a good custom 404 page set up to help those who click a link with a typo.
But I still would like to know how to solve the questions asked above...
-
Although you can redirect any URL to the one you consider they wanted to link, you may end up with hundreds of rules in your htaccess.
I personally wouldn't use this approach, instead, you can build a really good 404 page, which will look into the typed URL and show a list of possible pages that the user was actually trying to reach, while still returning a 404 as the typed URL actually doesn't exists.
By using the above method you also avoid worrying about those links as you mentioned. No linkjuice is passed tho, but still traffic coming from those links will probably get the content they were looking for as your 404 page will list the possible URLs they were trying to reach...
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Broken URL Links
Hi everyone, I have a question regarding broken URL links on my website. Late last year I move my site from an old platform to Shopify, and now have broken URL links giving out 4xx errors. When I look at Moz Pro>Campaigns>Insights>links, I can see the top broken URL links, however there is a difference if copy & paste URL directly from Moz Pro and by Export CSV file. For example below, If I copy and paste links direct from Moz Pro, it has the “http://” in front as below: http://www.thehairhub.com.au/WebRoot/ecshared01/Shops/thehairhub/57F3/1D8F/D244/C675/E27D/AC10/003F/35AD/manic-panic-colours.jpg But when I export the list of links as an CSV file, the http:// is removed. www.thehairhub.com.au/WebRoot/ecshared01/Shops/thehairhub/57F3/1D8F/D244/C675/E27D/AC10/003F/35AD/manic-panic-colours.jpg Another Example below: By copy & paste URL direct from Moz Pro
Technical SEO | | johnwall
http://thehairhub.com.au/Shop-Brands/Vitafive-CPR/CPR-Rescue By export CSV file.
thehairhub.com.au/Shop-Brands/Vitafive-CPR/CPR-Rescue Which one do I use to enter into the “Redirect From” field in Shopify URL Redirects? Do I need to have the http:// in front of the URL? Or is it not required for redirects to work? Kind Regards, John Wall
The Hair Hub0 -
Crawl at a stand still
Hello Moz'ers, More questions about my Shopify migration...it seems that I'm not getting indexed very quickly (it's been over a month since I completed the migration) - I have done the following: used an Seo app to find and complete redirects (right away) used the same app to straighten out title tags, metas and alt tags submitted the sitemap re-submitted my main product URL's via Fetch checked the Console - no reported blocks or crawl errors I will mention that I had to assign my blog to a sub-domain because Shopify's blog platform is awful. I had a lot of 404's on the blog, but fixed those. The blog was not a big source of traffic (I'm an ecomm business) Also, I didn't have a lot of backlinks, and most of those came along anyway. I did have a number of 8XX and 9XX errors, but I spoke to Shopify about them and they found no issues. In the meantime, those issues pretty much disappeared in the MOZ reporting. Any duplicate page issues now have a 200 code since I straightened out the title tags. So what am I missing here? Thanks in advance, Sharon
Technical SEO | | Sharon2016
www.zeldassong.com0 -
Webmaster tools not showing links but Moz OSE is showing links. Why can't I see them in the Google Search Console
Hi, Please see attached photos. I have a website that shows external follow links when performing a search on open site explorer. However, they are not recognised or visible in search console. This is the case for both internal and external links. The internal links are 'no follow' which I am getting developer to rectify. Any ideas why I cant see the 'follow' external links? Thanks in advance to those who help me out. Jesse T7dkL5s T7dkL5s OkQmPL4 3qILHqS
Technical SEO | | jessew0 -
Has Google Stopped Listing URLs with Crawl Errors in Webmaster Tools?
I went to Google Webmaster Tools this morning and found that one of my clients had 11 crawl errors. However, Webmaster Tools is not showing which URLs are having experiencing the errors, which it used to do. (I checked several other clients that I manage and they list crawl errors without showing the specific URLs. Does anyone know how I can find out which URLs are experiencing problems? (I checked with Bing Webmaster Tools and the number of errors are different).
Technical SEO | | TopFloor0 -
Developing a link profile.....
So we are a brand new site looking to establish a link profile of earned links vs. manipulative link building practices and have received some conflicting information. Our goal is to provide users and webmasters of relevant websites with useful content about the areas and topics we cover and let them decide to link to us. We have been advised by some parties that in order to develop a base set of links we should enter our website into directories. Now I understand entering it into some of the main directories such as BOTW and Yahoo etc, but please offer your thoughts on smaller less official directories. Thanks in advance. Scott
Technical SEO | | jackaveli0 -
Remove more than 1000 crawl errors from GWT in one day?
In google webmasters tools you have the feature "Crawl Errors". This one displays the top 1000 crawl errors google have on your site. I have around 16k crawl errors at the moment, which all are fixed. But i can only mark 1000 of them as fixed each day/each time google crawls the site. (This as it only displays top 1000 errors. When i have marked those as fixed it won't show other errors for a while.) Does anyone know if it's possible to mark ALL errors as fixed in one operation?
Technical SEO | | Host10 -
Rel=nofollow for affiliate links?
Hi, For a holiday/travel website including hotels and holiday packages from affiliates I am currently using the rel="nofollow" attribute to link out to the affiliate's website and wanted to know if this is the right way? To be more precise: there are distinct pages for each city and on a city specific page there are ~50 available hotels listed with some other information such as price and address, etc. Each of these hotels have an outlink to the affiliate's hotel website which uses private branding and as such is running on a subdomain hotels.mytraveldomain.tld. So in order not to pass on the link juice to the affiliate's website I thought I would simply use rel="nofollow". Would you also use nofollow? or are there any other opinions out there about that?
Technical SEO | | socialtowards1 -
Linking out?
First of all, sorry this Q is all in one block, but iPads don't like this site or vc/vs. When using the SEOmoz on-site keyword optimizer tool, it suggests at least one link to be to an off-site page. Would it be considered a link exchange if we linked out to an niche SUPER Authority sit that had a link back to our website? It seems like a naturally good strategy, but I'm afraid google may not agree. If the answer is no, there are many similar sites that mention our company in ver good ways, awards, etc.., but with no links. I would think this is a no-brainer. Personally I would like to eventually harvest all this press coverage to benefit our site. Btw, I was grey before I learned about SEOmoz, just like the rest of our niche. Now I'm shooting to be Snow White! Hopefully it works out. 🙂 I also wrote two landing pages that I tried to SEO the right way. I would love to hear your feedback to know if they are truly effective and if they are actually white. I think they are, but don't know "all" the rules of being white http://jamproa.com/ideology/product-innovation.php http://jamproa.com/industrial-design/what-is.php Thanks!
Technical SEO | | dmac0