Issues with Moz producing 404 Errors from sitemap.xml files recently.
-
My last campaign crawl produced over 4k 404 errors resulting from Moz not being able to read some of the URLs in our sitemap.xml file. This is the first time we've seen this error and we've been running campaigns for almost 2 months now -- no changes were made to the sitemap.xml file. The file isn't UTF-8 encoded, but rather Content-Type:text/xml; charset=iso-8859-1 (which is what Moveable Type uses). Just wondering if anyone has had a similar issue?
-
Hi Barb,
I am sure Joel will chime in also but just to clarify that it is probably not the utf8 encoding or lack of it that is causing the issue. At least with the sitemap urls it is simply the formatting of the xml that is being produced. As to if the other errors you are seeing are caused by the same kind of thing, if you are seeing references to the same encoded characters (%0A%09%) then the answer is most likely yes.
So the issue is not utf8 encoding related (there are plenty of non utf8 encoded sites on the web still!) but how the moz crawler is reading your links and if other tools/systems will be having the same troubles. Have you looked in google webmaster tools to see if it reports similar 404 errors from the sitemap or elsewhere? If you see similar errors in GWT then the issue is likely not restricted to the moz crawler only.
Beyond that, since for the sitemap at least the fix should be relatively simple and quite possibly the other moz errors you see will also be able to be fixed easily by making small adjustments to the templates and removing the extra line breaks/tabs which are creating the issue then it is worth doing so that these errors are removed and you can concentrate on the 'real' errors without all the noise.
-
Joel,
The latest 404 errors have the same type of issue, and are all over place in terms of referrer (none are the sitemap.xml) that I can see.
My question is, can the fact that we don't use the UTF-8 encoding in our site potentially cause issues with other reporting? This is not something we can change easily and I don't want to waste a great deal of effort sorting through "red herring" issues due to the encoding we use on the site.
thoughts?
barb
-
Thanks Joel,
We're looking into this.
barb
-
Thanks Lynn,
We are looking at that. The 4k 404 errors are gone now, but it's possible they will return.
It's a major change for us to switch to UTF-8, so it's not something that will happen anytime soon. I'll just have to be aware that it might be causing issues.
barb
-
Hey Brice,
I just to add to Lynn's great answer with the reason you're seeing the URLs the way they are and to reinforce that.
You have it formatted as such:
<loc>http://www.cmswire.com/cms/web-cms/david-hillis-10-predictions-for-web-content-management-in-2011-009588.php</loc>The crawler converts everything to URL encoding. So those line feeds and tabs will be converted to percentage tags. The reason your root domain is there is because %0A is not the proper start of a URL so RogerBot assumes it's a relative link to the domain your sitemap is on.
The encoding thing is probably not affecting this.
Cheers,
Joel. -
Hi,
It can be frustrating I know, but if you are methodical you will get to the bottom of all errors and then feel much better
Not sure why the number of 404s would have gone down, but in regards the sitemap itself the moz team might be right that utf-8 encoding could be part of the problem. I think it might be more to do with some non visible formatting/characters being added to your site map during creation. %09 is a url encoded tab and %0A is a url encoded line feed, it looks to me that these are getting into your sitemap even though they are not actually visible.
If you download your site map you will see that many (but not all) the urls look like this:
<loc>http://www.cmswire.com/cms/web-cms/david-hillis-10-predictions-for-web-content-management-in-2011-009588.php</loc>Note the new lines and the indent. Some other urls do not have this format for example:
<loc>http://www.cmswire.com/news/topic/impresspages</loc>
It would be wise to ensure both the file creating the sitemap and the sitemap itself are in utf-8, but also it could be as simple as going into the file creating the sitemap and removing those line breaks. Once that is done wait for the next crawl and see if it brings the error numbers down (it should). As for the rest of the warnings, just be methodical, identify where they are occurring and why and work through them. You will get to few or zero warnings, and you will feel good about it!
-
interesting that a new crawl just completed and now I only have 307 404 Errors and a lot of other different errors and warnings. It's frustrating to see such different things each week.
barb
-
Hi Lynn,
I did download the csv and found all the 404 errors were generate from our sitemap.xml file. Here's what the URLs look like:
referring URL is http://www.cmswire.com/sitemap.xml
You'll notice that there is odd formatting wrapping the URL (%0A%09%09%09) + the extra http://www.cmswire to the front of the URL- which does not exist in the actual sitemap.xml file if I view it separately.
Also: Moz support looked at our campaign and they thought the problem was that our sitemap wasn't UTF-8 encoded.
Any ideas?
-
Hi Brice,
What makes you think the issue is that moz cannot read the urls? In the first instance I would want to make sure that something else is not going wrong by checking the urls moz is flagging as 404s, ensuring they actually do or do not exist and if the latter finding out where the link is coming (be it the sitemap or another page on the site). You may have already done this, but if not you can get all this information by downloading the error report in csv and then filtering in excel to get data for 404 pages only.
If you have done this already then if you give us a sample or two of the urls moz is flagging along with the referring url and your sitemap url we might be able to diagnose the issue better. It would be unusual for the moz crawler to start throwing errors all of a sudden if nothing else has changed. Not saying it is impossible for it to be an error with moz, just saying that the chances are on the side of something else going on.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I am having a duplicate title -pagination issue Need Help
In WordPress I added %%page%% saved changes and ran a test in Moz and screaming frog, it doesn't show any duplicates. But when I open the site up in a new browser the duplicate titles are still there. No access to the PHP file due to the theme the client chose also. Any suggestions anyone?
Moz Pro | | Strateguyz0 -
Moz Pro Tools
Hello I ran into a error while using Moz Pro Tools Crawl Site feature. Stating that my wordpress website could not be crawled. When contacting moz they sent me this screenshot stating the reason for this error is because of the odd ip address highlighted in yellow. Only time I've seen this particular ip-address is during local development. If anyone has any advice on how to fix this or what may have caused this issue. I feel this maybe effecting the site's overall search visibility. ednqnL7
Moz Pro | | willakawillow220 -
Can Moz generate 404 Broken links report in an excel format
Hi all, Hope you are doing good and that all is well at your end. I'm a marketing team member here at Zephyr and I take care of the content side things for the team. I'm reaching out to you this morning in regards to figuring whether any Moz feature can be used for fixing broken links on our website and whether or not it can generate the broken links report which can be exported and saved in a spreadsheet. We are in the middle of cleaning up all the broken links which are currently live on our drupal site and which is why we are in need of a tool that generates the report of all the broken links onto an excel spreadsheet.Once we have the list of broken links set in a spreadsheet, we will work towards fixing those links. Meanwhile, we have also created a 404 error image for our readers who can be directed towards our homepage. Please, let me know if you can help me take care of these below-mentioned requests: 1. Suggest a tool /Moz feature which can generate the site report and list of broken links onto excel/spreadsheet 2. Once I've the list in an excel format, is there an automated way in which we can fix all the broken links which are currently live on our website, instead of manually deleting/unlinking those pages. Truly appreciate your help. Thank you! Is there an automated way in which we can fix all the broken links which are currently live on our website, instead of manually deleting/unlinking those pages.
Moz Pro | | LilianB0 -
Bing SEM Traffic showing up as Organic in Moz?
Hi there, Everyone! First time poster. 🙂 One of my clients has an ads string showing up in organic search? It is: s=bing_search&p=none . If I block it in the robots text, then I won't see any traffic results.
Moz Pro | | TopGrowthHacker.com0 -
New to Moz and wanted a bit of help with my report
Hi, I have used the MOZ report to analyse one of my friends sites and I wanted to query a few warnings it highlighted and I just wanted people's thought on how important they thought these were: The first is dupliate descriptions/titles. This is mainly down the e-commerce pages. Fist duplicate content:
Moz Pro | | dannylancs
On some pages the description is identical and all that is different is the title and picture, is this an issue? Duplicate pages:
Due to the way the website folder structure/catergories has been created some pages are identical but because the product comes under 2 cetergories there is 2 seperate pages, should we use the canonical on one of the pages? Also regarding the canonical tag, they have put link rel="canonical" on every page and got it to point at itself, so not really being used in the way it is meant to be. Could something like this cause any harm? The final thing is internal linking back to the homepage. If for example the homepage is http://www.test.com, when linking back is it best to put the full URL over "index.html" even though they are the same page? Any help really appreciated Dan0 -
Mass Moz Grading
I want to know if there's a workaround to grade my individual pages for multiple keywords in a more efficient manner. Let's say for example (this is an extreme hypothetical) that I'm targeting these keywords: "Red Castles" "Blue Cars" "Green Bottles" on one page. I want to know if there is a way to run three grade reports at once for the single page that they're on with the Moz tools. Thanks a bunch!
Moz Pro | | OOMDODigital0 -
Weird 404 Errors
Hi All, Although my Moz error scans have been pretty clean for a while, a law firm site I manage recently cropped up with 80+ 404 errors since the last scan. I'm a little baffled as the url it shows being returned looks like this: http://www.yoursite.com/ http://www.yoursite.com/resource.html For some reason it seems to be initiating a query to call the root domain twice before the actual resource. I installed ModX Revolution 2.2.6-PL on the site in question, and am hoping a canonical plugin I just started using will take care of these. Has this happened to anyone else? What did you do to solve the issue? Thanks for your time and any tips!
Moz Pro | | G2W0 -
Duplicate pages with canonical links still show as errors
On our CMS, there are duplicate pages such as /news, /news/, /news?page=1, /news/?page=1. From an SEO perspective, I'm not too worried, because I guess Google is pretty capable of sorting this out, but to be on the safe side, I've added canonical links. /news itself has no link, but all the other variants have links to "/news". (And if you go wild and add a bunch of random meaningless parameters, creating /news/?page=1&jim=jam&foo=bar&this=that, we will laugh at you and generate a canonical link back to "/news". We're clever like that.) So far so good. And everything appears to work fine. But SEOMoz is still flagging up errors about duplicate titles and duplicate content. If you click in, you'll see a "Note" on each error, showing that SEOMoz has found the canonical link. So SEOMoz knows the duplication isn't a problem, as we're using canonical links exactly the way they're supposed to be used, and yet is still flagging it as an error. Is this something I should be concerned about, or is it just a bug in SEOMoz?
Moz Pro | | LockyDotser0