High Number of Crawl Errors for Blog
-
Hello All,
We have been having an issue with very high crawl errors on websites that contain blogs. Here is a screenshot of one of the sites we are dealing with: http://cl.ly/image/0i2Q2O100p2v .
Looking through the links that are turning up in the crawl errors, the majority of them (roughly 90%) are auto-generated by the blog's system. This includes category/tag links, archived links, etc. A few examples being:
http://www.mysite.com/2004/10/
http://www.mysite.com/2004/10/17/
As far as I know (please correct me if I'm wrong!), search engines will not penalize you for things like this that appear on auto-generated pages. Also, even if search engines did penalize you, I do not believe we can make a unique meta tag for auto-generate pages. Regardless, our client is very concerned seeing these high number of errors in the reports, even though we have explained the situation to him.
Would anyone have any suggestions on how to either 1) tell Moz to ignore these types of errors or 2) adjust the website so that these errors now longer appear in the reports?
Thanks so much!
- Rebecca
-
Hi Rebecca
What are the crawl errors exactly? From that report screenshot it looks like you have a variety of them, so the fixes will all be different.
Let me know, and in the meantime you might want to check out my article on Moz about setting up WordPress
-Dan
-
It is true that you will most likely not be penalized for these pages, Google is pretty good at figuring out common canonicalization problems in my opinion and would most likely not penalize you for having duplicate content. I would encourage you to dig a little deeper and see what additional problems these pages could create though.
Consider that Google will waste valuable crawl bandwidth crawling these meaningless pages, rather than focusing on the important content you want them too. If Google is crawling them, you can most likely bet that PageRank is flowing through these pages as well, diluting the link equity of your site.
Are you using Wordpress? There are a lot of great plug ins that can help you manage these pages. You could control how Google crawls these pages with your robots.txt, by placing meta robots tags on the pages using a plug in, or by placing rel=canonical tags on the pages pointing back to the page that is the original source.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
5xx Crawl Issue might not be issues at all. Help
Hi, I ran a crawl test on our website and it came back with 900 5xx potential errors. When I started opening these links 1 by 1 I could see they were actually working. So i exported the full list of 900 and went to the website: https://httpstatus.io/ pasted the links by 100 and used that. They came back with status codes of 301 / 301 / 200 which i believe means they are okay. After reading it says that my programmer may need to see if we are blocking the MOZ BOT or to slow the MOZ BOT down. I guess I'm wondering if this is not done is the site actually having these 5xx errors when Google is Crawling or is it just showing 900 errors because of MOZ BOT but actually things are okay? I know the simple answer is to get the programmer to fix the MOZ BOT issue to know for sure but getting programmers to do things take a lot of time so I'm trying to get a better idea here. Thanks for your input.
Getting Started | | Cfarcher1 -
When I crawl my site On Moz it says it can't access the robots.txt file, but crawl is fine on SEM Rush - Anyone know any reason for this?
Hi guys, When I try to run a site crawl on Moz it returns an error saying that it has failed due to an error with the robots.txt file. However, my site can be crawled by SEM Rush with no mention of problems with roots.txt file issues. My developer has looked into it and insists their is no problem with my robots.txt and I've tried the Moz crawl at least 6 times over an 8 week period. Has anyone ever seen such a large discrepancy between Moz and SEM Rush or have any ideas why Moz has this issue with my site?? TIA everyone
Getting Started | | Webreviewadmin0 -
Scheduled update - Re-Crawl - recrawl
Can I not perform a manual update? I setup a campaign without GA as I did not have access, I got access, added the GA account to the campaign but no data is showing as I think I require an update, but have to wait 7 days? Is that right? Thanks
Getting Started | | SJMDT0 -
Crawl rate
How often does Moz crawl my website ? (I have a number of issues I believe I have fixed, and wondered if there was a manual request to re-crawl ?) Thanks. Austin.
Getting Started | | FuelDump0 -
Moz could not crawl my httpS website
Hi, we have a website with HTTPS, moz could not crawl it and we get "902 : Network errors prevented crawler from contacting server for page" while in logs we see moz robot access but fail after some seconds, what could be the problem, while moz can access site when it is without httpS | 902 : Network errors prevented crawler from contacting server for page. |
Getting Started | | Hamedkhorasani10 -
Why wont rogerbot crawl my page?
How can I find out why rogerbot won't crawl an individual page I give it to crawl for page-grader? Google, bing, yahoo all crawl pages just fine, but I put in one of the internal pages fo page-grader to check for keywords and it gave me an F -- it isn't crawling the page because the keyword IS in the title and it says it isn't. How do I diagnose the problem?
Getting Started | | friendoffood0 -
My site is not being fully crawled
Our site has been crawled several times by RogerBot but each time only 6 pages are crawled even though we have more than 100 pages. Do I need to submit my sitemap.xml to Moz?
Getting Started | | Scurri0 -
Down for me? Or everyone? 504 errors on campaigns and research tools.
Additionally direct emails to help@moz.com don't work and return a permanent failure error. There's also no verification email coming through when using the message form page to contact support, I'm presuming it's emailing the above email but also failing.
Getting Started | | Skitrel0