Crawl Diagnostic Errors
-
Hi there,
Seeing a large number of errors in the SEOMOZ Pro crawl results. The 404 errors are for pages that look like this:
http://www.example.com/2010/07/blogpost/http:%2F%2Fwww.example.com%2F2010%2F07%2Fblogpost%2F
I know that t%2F represents the two slashes, but I'm not sure why these addresses are being crawled. The site is a wordpress site. Anyone seen anything like this?
-
Yep, i think you nailed it. I crawled another 2 sites I manage, one has sexy bookmarks, one doesn't. The one with had 404 errors. A quick search for sexy bookmarks causes 404 had some results as well.
You're right about the issue with the other plugin, commentluv. Will definitely take that suggestion to the developer.
And a hat trick, you're right about the block of latest from the blog on the footer. Been meaning to take that out for ages.
Very grateful for your attention and wisdom! Thank you!
-
Ross, it seems you have a plugin for comments which adds a link to the last post of the person who made the comment. This is an interesting plugin which i have not seen before. There are two problems I see with the plugin. First, it identifies links to your own site as external, when they should be tagged as internal. Secondly, it probably shouldn't be used to link to the current page. Debbi's comment is a link asking readers to view her latest article, which is the current page.
There is also a link to the current article under Recent Posts. It would be a great advancement for the plugin if it could identify the current URL and not include it in the list.
There is also a footer section "Latest from blog" which offers a link to the post. In my opinion offering the same links in the Recent Posts side bar and the "Latest from blog" footer is excessive, and since footer links aren't used very much I would recommend removing the footer block.
The fourth link to the article I located on the page is from a plugin which is referred to as "Shareaholic TopSharingBar SexyBookmarks". The link is contained within javascript.
All of the above 4 links are valid links and should not be the source of the 404 error.
And finally I believe I just now discovered the root cause of this issue. It seems to be your "Shareaholic" plugin. Try disabling it and then crawling your site again. The 404 error should disappear.
The URL you shared, in the exact format you shared it, is present in your site's HTML code in a line which begins with the following code:
-
will do and thank you for your insight!
-
I just started a SEOmoz crawl for your site. It will take some time to complete. Once the report is available I'll take a look.
Since you removed a plug in, the results may not be the same. You may have resolved the issue. Please refrain from making further changes until the crawl is complete.
-
Okay sure. Embarassingly enough, it's my own site at bayareaseo.net.
http://www.bayareaseo.net/2011/11/things-that-can-mess-up-your-google-places-rankings/
is referring to in SEOMOZ crawler
and in GWT the original url refers to
http://www.bayareaseo.net/2011/11/things-that-can-mess-up-your-google-places-rankings/<a< p=""></a<>
Just removed a "related posts" style plug in, not sure if that's the culprit.
-
It doesn't make sense to me that the referrer is the page itself. If you are willing to share your site's URL and the specific URL which is having an issue I can perform a crawl and offer more details.
-
The referrer is the page itself. Examined the code and I'm not seeing any links that match, with or without the funky markup, i.e. searching for
http://www.example.com/2010/07/blogpost/http:%2F%2Fwww.example.com%2F2010%2F07%2Fblogpost%2F
as well as
http://www.example.com/2010/07/blogpost/http://www.example.com/2010/07/blogpost/
I'm thinking it's down to one of two WP plugins causing the error. Found similar results in GWT, with many 404s referring from themselves as
http://www.example.com/page<a< p=""></a<>
Will disable the plugins and report back after the next crawl
-
The crawler normally will start on your site's home page and move through all the html code on the home page, then crawl each and every link on the home page following it throughout your site. If you are seeing these errors on your crawl report then the links are on your site.
Examine your crawl report and look for the REFERRER field. This field indicates the page which contains the link. If you can't see the link on the page itself, right-click on the page and choose View Page Source, then do a search of the html code (CTRL+F) for the link.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404 Crawl Diagnostics with void(0) appended to URL
Hello I am getting loads of 404 reported in my Crawl report, all appended with void(0) at the end. For example: http://lfs.org.uk/films-and-filmmakers/watch-our-films/1289/void(0)
Moz Pro | | moshen
The site is running on Drupal 7, Has anyone come across this before? Kind Regards Moshe | http://lfs.org.uk/films-and-filmmakers/watch-our-films/1289/void(0) |0 -
Block Moz (or any other robot) from crawling pages with specific URLs
Hello! Moz reports that my site has around 380 duplicate page content. Most of them come from dynamic generated URLs that have some specific parameters. I have sorted this out for Google in webmaster tools (the new Google Search Console) by blocking the pages with these parameters. However, Moz is still reporting the same amount of duplicate content pages and, to stop it, I know I must use robots.txt. The trick is that, I don't want to block every page, but just the pages with specific parameters. I want to do this because among these 380 pages there are some other pages with no parameters (or different parameters) that I need to take care of. Basically, I need to clean this list to be able to use the feature properly in the future. I have read through Moz forums and found a few topics related to this, but there is no clear answer on how to block only pages with specific URLs. Therefore, I have done my research and come up with these lines for robots.txt: User-agent: dotbot
Moz Pro | | Blacktie
Disallow: /*numberOfStars=0 User-agent: rogerbot
Disallow: /*numberOfStars=0 My questions: 1. Are the above lines correct and would block Moz (dotbot and rogerbot) from crawling only pages that have numberOfStars=0 parameter in their URLs, leaving other pages intact? 2. Do I need to have an empty line between the two groups? (I mean between "Disallow: /*numberOfStars=0" and "User-agent: rogerbot")? (or does it even matter?) I think this would help many people as there is no clear answer on how to block crawling only pages with specific URLs. Moreover, this should be valid for any robot out there. Thank you for your help!0 -
In Crawl Diagnostics, length of title element is incorrect
Hey all, It appears the Moz crawler is misreading the number of characters in my website's page titles. It shows 72 characters for the following page's title element: http://giavan.com/products/orange-crystal-chain-necklace-with-drop The page title for this web page is: Orange Crystal Chain Necklace with Drop | Giavan which is 48 characters. As it stands, this page title is displayed at 48 characters in Google SERPs. I am getting "This Element is Too Long" issue on 925 pages, which is just about the entire site. These issues appeared after I added additional Shopify (Liquid) code to the page title. If you inspect the code, you will see title element looks a bit odd with extra spacing and line breaks. What I'd like to know is whether or not it's necessary to rewrite the Shopify code, for SEM purposes. My feeling is that it's okay because the page titles look fine in SERPs but those 925 Moz crawl errors are kind of scary. Thanks for your help!
Moz Pro | | RichAlbanese0 -
Crawl Diagnostics
Hello, I would appreciate your help on the following issue. During Crawl procedure of e-maximos.com (WP installation) I get a lot of errors of the below mentioned categories: Title Missing or Empty & Missing Meta Description Tag for the URLs: http://e-maximos.com/?like_it=xxxx (i.e. xxxx=1033) Any idea of the reason and possible solution. Thank you in advance George
Moz Pro | | gpapatheodorou0 -
How effective is Crawl DIagnostics in determining crawlibility?
Is Seomoz crawl diagnostics useful for determining what pages Google has a hard time indexing. One of the problems with my site is that it uses JS and Flash and I know Google isnt too keen on that. Can Crawl Diagnostics accurately tell me if there is too much of something and therefore Google is having a hard time crawling? I want to be able to know if JS or Flash is hurting any of my pages in any way. I provide good content and I want to make sure Google can pick it up.....Is this too much to ask? Is there anything out there for this?
Moz Pro | | waltergah0 -
SEOmoz Dashboard Report: Crawl Diagnostic Summary
Hi there, I'm noticing that the total errors for our website has been going up and down drastically almost every other week. 4 weeks ago there were over 10,000 errors. 2 weeks ago there were barely 1,000 errors. Today I'm noticing it's back to over 12,000 errors. It says the majority of the errors are from duplicate page content & page title. We haven't made any changes to the titles or the content. Some insight and explanation for this would be much appreciated. Thanks, Gemma
Moz Pro | | RBA1 -
Websites First Crawl - Over 2 Hour Suggested Wait
Hello SEOMoz! We recently signed up for a free trial and on the pro dashboard it states the following. "To get you started quickly Roger is crawling up to 250 pages on your site. You should see these results within two hours. The full crawl will complete within 7 days." It's been nearly 24 hours and we see no results under Crawl Diagnostics however we do under rankings. Is this normal? Thanks
Moz Pro | | hostsurfuk0 -
"Issue: Duplicate Page Content " in Crawl Diagnostics - but these pages are noindex
Hello guys, our site is nearly perfect - according to SEOmoz campaign overview. But, it shows me 5200 Errors, more then 2500 Pages with Duplicate Content plus more then 2500 Duplicated Page Titles. All these pages are sites to edit profiles. So I set them "noindex, follow" with meta robots. It works pretty good, these pages aren't indexed in the search engines. But why the SEOmoz tools list them as errors? Is there a good reason for it? Or is this just a little bug with the toolset? The URLs which are listet as duplicated are http://www.rimondo.com/horse-edit/?id=1007 (edit the IDs to see more...) http://www.rimondo.com/movie-edit/?id=10653 (edit the IDs to see more...) The crawling picture is still running, so maybe the errors will be gone away in some time...? Kind regards
Moz Pro | | mdoegel0