High Number of Crawl Errors for Blog
-
Hello All,
We have been having an issue with very high crawl errors on websites that contain blogs. Here is a screenshot of one of the sites we are dealing with: http://cl.ly/image/0i2Q2O100p2v .
Looking through the links that are turning up in the crawl errors, the majority of them (roughly 90%) are auto-generated by the blog's system. This includes category/tag links, archived links, etc. A few examples being:
http://www.mysite.com/2004/10/
http://www.mysite.com/2004/10/17/
As far as I know (please correct me if I'm wrong!), search engines will not penalize you for things like this that appear on auto-generated pages. Also, even if search engines did penalize you, I do not believe we can make a unique meta tag for auto-generate pages. Regardless, our client is very concerned seeing these high number of errors in the reports, even though we have explained the situation to him.
Would anyone have any suggestions on how to either 1) tell Moz to ignore these types of errors or 2) adjust the website so that these errors now longer appear in the reports?
Thanks so much!
- Rebecca
-
Hi Rebecca
What are the crawl errors exactly? From that report screenshot it looks like you have a variety of them, so the fixes will all be different.
Let me know, and in the meantime you might want to check out my article on Moz about setting up WordPress
-Dan
-
It is true that you will most likely not be penalized for these pages, Google is pretty good at figuring out common canonicalization problems in my opinion and would most likely not penalize you for having duplicate content. I would encourage you to dig a little deeper and see what additional problems these pages could create though.
Consider that Google will waste valuable crawl bandwidth crawling these meaningless pages, rather than focusing on the important content you want them too. If Google is crawling them, you can most likely bet that PageRank is flowing through these pages as well, diluting the link equity of your site.
Are you using Wordpress? There are a lot of great plug ins that can help you manage these pages. You could control how Google crawls these pages with your robots.txt, by placing meta robots tags on the pages using a plug in, or by placing rel=canonical tags on the pages pointing back to the page that is the original source.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz not able to crawl our site - any advice?
When I try and crawl our site through Moz it gives this message: Moz was unable to crawl your site on Aug 7, 2019. Our crawler was banned by a page on your site, either through your robots.txt, the X-Robots-Tag HTTP header, or the meta robots tag. Update these tags to allow your page and the rest of your site to be crawled. If this error is found on any page on your site, it prevents our crawler (and some search engines) from crawling the rest of your site. Typically errors like this should be investigated and fixed by the site webmaster. I have been through all the help and doesn't seem to be any issues. You can check the site and robots.txt here: https://myfamilyclub.co.uk/robots.txt. Anyone got any advice on where I could go to get this sorted?
Getting Started | | MyFamilClubLtd1 -
When I crawl my site On Moz it says it can't access the robots.txt file, but crawl is fine on SEM Rush - Anyone know any reason for this?
Hi guys, When I try to run a site crawl on Moz it returns an error saying that it has failed due to an error with the robots.txt file. However, my site can be crawled by SEM Rush with no mention of problems with roots.txt file issues. My developer has looked into it and insists their is no problem with my robots.txt and I've tried the Moz crawl at least 6 times over an 8 week period. Has anyone ever seen such a large discrepancy between Moz and SEM Rush or have any ideas why Moz has this issue with my site?? TIA everyone
Getting Started | | Webreviewadmin0 -
Moz can't crawl my site.
Moz cannot carry out the site crawl on my online shop. Not really sure what the issue is, it has no problem getting onto my site when you use www. before the address, but it needs to be able to access bluerinsevintage.co.uk Stuck as what to do, we are a shopify store. Anyone else had this problem, or know what i need to change so they can crawl the site? thjis is the page they are getting when trying to get on bluerinsevintage.co.uk but if they use www.bluerinsevintage.co.uk the site comes up.
Getting Started | | bluerinsevintageAdam
0 -
Crawl issues, how to see a referring link?
Hi There, We've got two crawl issues for pages that don't exist (and never existed). The links are strange and judging by the code in them, appear to be coming from our own CMS. How can we see which pages the links are on in Moz? Cheers Ben
Getting Started | | cmscss0 -
High total links, but very few root domains?
Hi Moz community!I've just joined and am getting to grips with SEO basics. Right now, I'm looking at the Competitive Link Metrics in Moz Pro, and I'm curious about the following- Of the three competitors that we're following, I'm trying to figure out some differences between two of them - we'll call them A and B. 'A' has 3.6k external followed and total links, with 5 total linking root domains. 'B' (a more prestigious and established company with a much higher DA) has 2.2k total external links, with 180 root domains. So my question is, how can A have nearly 1,000 more links, but only from 5 domains? Any feedback much appreciated! Thanks!
Getting Started | | thegildedteapot0 -
901 error code showing url back to back in crawl
Hi Everyone, I'm absolutely dumbfounded about this 901 issue (showing pages with our url back to back). Our site is hosted on Big Commerce: https://www.santabarbarachocolate.com When I look for these pages being crawled I don't find them. I've called BC for help and I can't seem to find a solution or where to turn as to how to fix the issue at hand or even if it matters. Please see below what the Moz crawl shows. Could this be related to Yotpo or some app we have running? Or does this even matter and does it have any influence on rank? Do you have recommendations or ideas? Thanks so much. Pages with Crawl Attempt Error as of Mar 3 URL Page Authority Linking Root Domains Status Code | Error Code 901: DNS Errors Prevented Crawler from Resolving Hostname http://www.santabarbarachocolate.comhttp/www.santabarbarachocolate.com/100-percent-pure-cacao-unsweetened-baking-chocolate -- -- 901 Error Code 901: DNS Errors Prevented Crawler from Resolving Hostname http://www.santabarbarachocolate.comhttp/www.santabarbarachocolate.com/buy-wholesale-bulk-chocolate -- -- 901 Error Code 901: DNS Errors Prevented Crawler from Resolving Hostname http://www.santabarbarachocolate.comhttp/www.santabarbarachocolate.com/organic-chocolate-wholesale | -- | -- | 901 |
Getting Started | | santabarbarachocolate0 -
Getting errors -- Pages with Title Missing or Empty
I am getting a high priority warning that I have pages with title missing or empty, however, when I check the page, there is a title tag <title>Long Term Care Planning</title> . Anyone have any idea what could be doing on? I check the CMS (using AEM - adobe experience manager CQ5) and the title field is filled out as well.
Getting Started | | Laura-Genworth0 -
My site is not being fully crawled
Our site has been crawled several times by RogerBot but each time only 6 pages are crawled even though we have more than 100 pages. Do I need to submit my sitemap.xml to Moz?
Getting Started | | Scurri0