High Number of Crawl Errors for Blog
-
Hello All,
We have been having an issue with very high crawl errors on websites that contain blogs. Here is a screenshot of one of the sites we are dealing with: http://cl.ly/image/0i2Q2O100p2v .
Looking through the links that are turning up in the crawl errors, the majority of them (roughly 90%) are auto-generated by the blog's system. This includes category/tag links, archived links, etc. A few examples being:
http://www.mysite.com/2004/10/
http://www.mysite.com/2004/10/17/
As far as I know (please correct me if I'm wrong!), search engines will not penalize you for things like this that appear on auto-generated pages. Also, even if search engines did penalize you, I do not believe we can make a unique meta tag for auto-generate pages. Regardless, our client is very concerned seeing these high number of errors in the reports, even though we have explained the situation to him.
Would anyone have any suggestions on how to either 1) tell Moz to ignore these types of errors or 2) adjust the website so that these errors now longer appear in the reports?
Thanks so much!
- Rebecca
-
Hi Rebecca
What are the crawl errors exactly? From that report screenshot it looks like you have a variety of them, so the fixes will all be different.
Let me know, and in the meantime you might want to check out my article on Moz about setting up WordPress
-Dan
-
It is true that you will most likely not be penalized for these pages, Google is pretty good at figuring out common canonicalization problems in my opinion and would most likely not penalize you for having duplicate content. I would encourage you to dig a little deeper and see what additional problems these pages could create though.
Consider that Google will waste valuable crawl bandwidth crawling these meaningless pages, rather than focusing on the important content you want them too. If Google is crawling them, you can most likely bet that PageRank is flowing through these pages as well, diluting the link equity of your site.
Are you using Wordpress? There are a lot of great plug ins that can help you manage these pages. You could control how Google crawls these pages with your robots.txt, by placing meta robots tags on the pages using a plug in, or by placing rel=canonical tags on the pages pointing back to the page that is the original source.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz Site Crawl can't index WIX sites
We've been attempting to work on some SEO for a new potential client however they are using a WIX site. We've noticed that Moz SEO tools will not index any WIX sites. e.g. https://www.sharonradisch.com/ (which is one of their case studies). Anyone seen this that can offer any advice? Thanks,
Getting Started | | monkeex
Mark2 -
Moz only crawling one page of a campaign, please help
Today I set up a new campaign for a client, however the crawl has only found the home page and is saying that the URL is unavailable. The site is definitely live and the URL is correct. I have set up the campaign 3 times one with the full address (http://www.) one with www. and with just the domain name. All three of these have come page with one page crawled and "unavailable" above the URL. It is picking up the crawl issues on the page and showing domain authority but I don't know why it's not crawling other pages. Prior to setting up the campaign I did a site crawl and Moz found everything then, so I don't know why it isn't now. Please help. Thanks
Getting Started | | Wrapped0 -
We recently switched from HTTP to HTTPS and we are having crawling issues!
We switched our website from HTTP to HTTPS and we started to get an email from Moz about the robots.txt being unable to crawl our website. The website is hosted through wordpress but we haven't had any issues until we switched. We have no idea what to do or even what the problem is! If you have had a similar problem and fixed it, we need your help! Thank you.
Getting Started | | DrInfinity0 -
Error Code 612: Error response for robots.txt
Hi, We are getting Error Code 612: Error response for robots.txt in our crawl but everything looks to be ok with the robots file. Can you confirm what is wrong? Thanks
Getting Started | | david.weston0 -
Crawl rate
How often does Moz crawl my website ? (I have a number of issues I believe I have fixed, and wondered if there was a manual request to re-crawl ?) Thanks. Austin.
Getting Started | | FuelDump0 -
After fixing Crawl Errors, how long does it take to for Moz or Google to re-crawl a website?
Last night I found out through Moz that my robots.txt file was blocking any crawling of my website. I fixed the issue. Now do I just sit and wait?
Getting Started | | cmc-interactive0 -
Finding out Who has high DA\PA and getting a backlink
Hi, So, like many others I was always in the top 3 for keywords in google for about a year, then 9 months ago we dropped, we now dont even exisit! (Page 10). One of the actions we are looking at is backlinks, however we have limited budget and want to work with domains that have high DA in order to help us, I assume this is good idea for us to do? Question, how can I easily identity domains that have high DA relevant to my sector? so I can engage in backlinks conversation David
Getting Started | | petstar1 -
How to get moz to crawl a staging domain that is blocked by robots.txt
Is it possible to get Moz to do a crawl report on a domain blocked by robots.txt and actually display all errors instead of only one saying the domain was blocket in robots.txt? Anything i can add to robots.txt to make moz able to do the crawl report but still hinder google from crawling a staging domain?
Getting Started | | classifiedtech0