Issues with Moz producing 404 Errors from sitemap.xml files recently.
-
My last campaign crawl produced over 4k 404 errors resulting from Moz not being able to read some of the URLs in our sitemap.xml file. This is the first time we've seen this error and we've been running campaigns for almost 2 months now -- no changes were made to the sitemap.xml file. The file isn't UTF-8 encoded, but rather Content-Type:text/xml; charset=iso-8859-1 (which is what Moveable Type uses). Just wondering if anyone has had a similar issue?
-
Hi Barb,
I am sure Joel will chime in also but just to clarify that it is probably not the utf8 encoding or lack of it that is causing the issue. At least with the sitemap urls it is simply the formatting of the xml that is being produced. As to if the other errors you are seeing are caused by the same kind of thing, if you are seeing references to the same encoded characters (%0A%09%) then the answer is most likely yes.
So the issue is not utf8 encoding related (there are plenty of non utf8 encoded sites on the web still!) but how the moz crawler is reading your links and if other tools/systems will be having the same troubles. Have you looked in google webmaster tools to see if it reports similar 404 errors from the sitemap or elsewhere? If you see similar errors in GWT then the issue is likely not restricted to the moz crawler only.
Beyond that, since for the sitemap at least the fix should be relatively simple and quite possibly the other moz errors you see will also be able to be fixed easily by making small adjustments to the templates and removing the extra line breaks/tabs which are creating the issue then it is worth doing so that these errors are removed and you can concentrate on the 'real' errors without all the noise.
-
Joel,
The latest 404 errors have the same type of issue, and are all over place in terms of referrer (none are the sitemap.xml) that I can see.
My question is, can the fact that we don't use the UTF-8 encoding in our site potentially cause issues with other reporting? This is not something we can change easily and I don't want to waste a great deal of effort sorting through "red herring" issues due to the encoding we use on the site.
thoughts?
barb
-
Thanks Joel,
We're looking into this.
barb
-
Thanks Lynn,
We are looking at that. The 4k 404 errors are gone now, but it's possible they will return.
It's a major change for us to switch to UTF-8, so it's not something that will happen anytime soon. I'll just have to be aware that it might be causing issues.
barb
-
Hey Brice,
I just to add to Lynn's great answer with the reason you're seeing the URLs the way they are and to reinforce that.
You have it formatted as such:
<loc>http://www.cmswire.com/cms/web-cms/david-hillis-10-predictions-for-web-content-management-in-2011-009588.php</loc>The crawler converts everything to URL encoding. So those line feeds and tabs will be converted to percentage tags. The reason your root domain is there is because %0A is not the proper start of a URL so RogerBot assumes it's a relative link to the domain your sitemap is on.
The encoding thing is probably not affecting this.
Cheers,
Joel. -
Hi,
It can be frustrating I know, but if you are methodical you will get to the bottom of all errors and then feel much better
Not sure why the number of 404s would have gone down, but in regards the sitemap itself the moz team might be right that utf-8 encoding could be part of the problem. I think it might be more to do with some non visible formatting/characters being added to your site map during creation. %09 is a url encoded tab and %0A is a url encoded line feed, it looks to me that these are getting into your sitemap even though they are not actually visible.
If you download your site map you will see that many (but not all) the urls look like this:
<loc>http://www.cmswire.com/cms/web-cms/david-hillis-10-predictions-for-web-content-management-in-2011-009588.php</loc>Note the new lines and the indent. Some other urls do not have this format for example:
<loc>http://www.cmswire.com/news/topic/impresspages</loc>
It would be wise to ensure both the file creating the sitemap and the sitemap itself are in utf-8, but also it could be as simple as going into the file creating the sitemap and removing those line breaks. Once that is done wait for the next crawl and see if it brings the error numbers down (it should). As for the rest of the warnings, just be methodical, identify where they are occurring and why and work through them. You will get to few or zero warnings, and you will feel good about it!
-
interesting that a new crawl just completed and now I only have 307 404 Errors and a lot of other different errors and warnings. It's frustrating to see such different things each week.
barb
-
Hi Lynn,
I did download the csv and found all the 404 errors were generate from our sitemap.xml file. Here's what the URLs look like:
referring URL is http://www.cmswire.com/sitemap.xml
You'll notice that there is odd formatting wrapping the URL (%0A%09%09%09) + the extra http://www.cmswire to the front of the URL- which does not exist in the actual sitemap.xml file if I view it separately.
Also: Moz support looked at our campaign and they thought the problem was that our sitemap wasn't UTF-8 encoded.
Any ideas?
-
Hi Brice,
What makes you think the issue is that moz cannot read the urls? In the first instance I would want to make sure that something else is not going wrong by checking the urls moz is flagging as 404s, ensuring they actually do or do not exist and if the latter finding out where the link is coming (be it the sitemap or another page on the site). You may have already done this, but if not you can get all this information by downloading the error report in csv and then filtering in excel to get data for 404 pages only.
If you have done this already then if you give us a sample or two of the urls moz is flagging along with the referring url and your sitemap url we might be able to diagnose the issue better. It would be unusual for the moz crawler to start throwing errors all of a sudden if nothing else has changed. Not saying it is impossible for it to be an error with moz, just saying that the chances are on the side of something else going on.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz Crawl Test error
Moz crawl test show blank report for my website test - guitarcontrol.com. Why??? Please suggest.
Moz Pro | | zoe.wilson170 -
Possible Crawling Problem with Screaming Frog and Moz Crawlers
So I'm not sure if what I'm seeing is a problem or not. As of about two weeks ago the Moz crawler has only been able to see www.mysite.com, and none of the links, content, title, ect associated with the page. Essentially the report has one line, what should be the homepage, but it's not able to pull any information from the page but does show a 200 http status code. The report shows nothing blocked by robots or any errors. When I use screaming frog to crawl the site about 75% of the time it just reports one line www.mysite.com with a 200 status code, but again the crawler is not able to actually see the html. The other 25% of the time it works perfectly fine, crawls all pages and sees all meta info and content. There are no errors in Google WMT and everything looks ok there. We have seen a traffic drop the last two weeks but I don't know if this is the reason for it. I can't publicly post the page but if someone has an idea of what might be going on I'd be happy to PM them. Thanks
Moz Pro | | CJ50 -
MOZ Pro - Page Grader Queries
Greetings! I started a new position last week and the extremely supportive director promised to give me anything I required to make my job easier. Of course, my first port of call was MOZ Pro. Having never used MOZ Pro before, I've just been getting to grips with it, fixing any pressing issues and giving the whole site a general SEO health check. A few fairly major issues have been flagged, which I'm in the process of fixing, and I'm currently putting our main landing pages through the MOZ Page Grader. After a little bit of tinkering, our iPhone 6 cases page has been graded B for the term 'iPhone 6 cases', but I have a few queries/concerns regarding some of the suggested fixes: **Avoid keyword stuffing in document **- The term 'iphone 6 cases' only appears thrice in the body, so it can't be that? The term, however, appears 23 times in the page's img alt tags. Could this be the issue? This is an ecommerce site that sells iPhone 6 cases, so the img alt tags are bound to contain that keyword. Each img alt tag is unique, so I don't really know what I can do here? "Show details for YouSave iPhone 6 0.6mm Clear Gel Case" is the example of the img alt tag of one of the products on that page, surely I can't remove the words 'iPhone 6 case'? Avoid too many internal links - MOZ suggests keeping the internal links to below 100 or, at a minimum, less than 100 links on the main navigation menu. I haven't counted, but I'd guess that page has more than 100 links, but not too many on the navigation menu. To me, this looks like a standard ecommerce page, with links to products and different pages via the top and bottom menus. Would I improve visibility if I reduced the amount of links by, say, reducing the number of products on the page? We currently have it set to 36, but can easily be reduced. Only One Canonical URL - We've put a fix in place for this issue and are just waiting for it to go live. For some reason rel=canonical tags have been duplicated on the majority of the pages. Like I say, this is being remedied, but I just wondered whether a duplicate tag negatively affect the page's visibility? The tags are identical and just point to the page they're on. I think that's about it for now! Thanks in advance and keep up the good work! Cheers, Lewis (Andrew is the name of the director) UPDATE Now I've sorted the rel=canonical issue, the pages are being graded A but still with the first two suggestions above.
Moz Pro | | PeaSoupDigital0 -
Looking to pull a report from MOZ with global search volume
I'm in need of a monthly report on campaign keywords with rank AND global search volume (not "just" traffic.) Is this doable from MOZ? More generally, what does monthly SEO reporting to the C-Suite look like for most SEO's? Would be a good "#WhiteboardFriday" topic. We are a mid sized ecommerce retailer.
Moz Pro | | TEMurray0 -
Search issues
I often use the search tool to give comparisons from 1 company to another and recently have found it not working very well, i am constantly refreshing as it has the circle whirling round for many minutes at a time, it often brings up the results of searches I did earlier in the day or even the day before, and the overall time that it takes to bring up the information I have requested seems to be getting longer and longer. Now working in a face paced environment this can often loose an hour of my time each day just waiting on information that I have already entered several times. I feel that the quality of the service has just slowed down so much that I am now looking at other companies/tools to gain the same information, is there anything that has happened on your side recently that has caused this issue? are other users experiencing the same issue? is there anything that can be addressed to resolve this issue. Kind regards Michael Beardsell
Moz Pro | | zenwebsol0 -
Couple of Moz's bugs
There are still some bugs in a new Moz: I can still find a lot of mentions of "SEOMoz", for example: in a footer, in a Q/A form (SEOMoz Resources), in a new question form (SEOmoz PRO Application, SEOmoz Tools, etc.) On a main form (http://moz.com/pro/home) sometimes my full name is not visible at all, sometimes my MozPoints are hidden (on a left top corner); There is not direct link from http://devblog.moz.com/ to a main website; Regards.
Moz Pro | | ditoroin0 -
Problems with csv file from OSE
Hello Support, I have problems with the formatting of csv files from OSE in Excel. I got lines that only contain -- and these lines break up the data. It is possible to correct this manually but a bit annoying if you have 1500+ links generated in the file. I work a lot with csv files from other tools and programs and those give me no problems. Can you help me out please? Greetings Rob
Moz Pro | | FindFactory0 -
How to run down the actual source of a 404 error that is reported.
In my 404 errors, the second entry is as follows: URL: http://www.virginiahomesandforeclosures.com/listing/0428387-lot-k-commerce-park-franklin-va-23851/REWIDX_URL_CDNimg/no-image.gif Is there a simple way to find the root or page in which this error was generated? IF I visit this page " http://www.virginiahomesandforeclosures.com/listing/0428387-lot-k-commerce-park-franklin-va-23851" without the attached gobble de gook, I see a good page. So bottom line its possible it could be in one of my sitemaps, but I have 50 of those so its time consuming to search thru all 50 for each error like this since I have so many. I am pretty sure its not in my sitemaps, since google has not picked up any of these errors and they have crawled over 12,000 urls so far. When google gives me a 404 error I can click on the link and find what pages they found the link and go there and correct it at the root. Any suggestions would be greatly appreciated. I have more than 1,000 of these errors with the bad url with the junk attached to the end and have not been able to isolate the cause yet. Thanks in advance.
Moz Pro | | tommytx0