XML Sitemap Issue or not?
-
Hi Everyone,
I submitted a sitemap within the google webmaster tools and I had a warning message of 38 issues.
Issue: Url blocked by robots.txt.
Description: Sitemap contains urls which are blocked by robots.txt.
Example: the ones that were given were urls that we don't want them to be indexed: Sitemap: www.example.org/author.xml
Value: http://www.example.org/author/admin/
My issue here is that the number of URL indexed is pretty low and I know for a fact that Robot.txt aren't good especially if they block URL that needs to be indexed. Apparently the URLs that are blocked seem to be URLs that we don't to be indexed but it doesn't display all URLs that are blocked.
Do you think i m having a major problem or everything is fine?What should I do? How can I fix it?
FYI: Wordpress is what we use for our website
Thanks
-
Hi Dan
Thanks for your answer. Would you really recommend using the plugin instead of just uploading the xml sitemap directly to the website's root directory? If yes why?
Thanks
-
Lisa
I would honestly switch to the Yoast SEO plugin. It handles the SEO (and robots.txt) a lot better, as well as the XML sitemaps all within that one plugin.
I'd check out my guide for setting up WordPress for SEO on the moz blog.
Most WP robots.txt files will look like this;
User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/
And that's it.
You could always just try changing yours to the above setting first,
before switching to Yoast SEO - I bet that would clear up
the sitemap issues.
Hope that helps!
-Dan ```
-
Lisa, try checking manually which URL is not getting indexed in Google. Make sure you do not have any no follows on those pages. If all the pages are connected / linked together, then Google will crawl your whole site eventually, just a matter of time.
-
Hi
when generating sitemap there are 46 URLs detected by xml-sitemaps.com but when adding the sitemap to WMT only 12 get submitted and 5 are indexed which is really kind of worrying me. This might be because of the xml sitemap plugin that I installed. May be something is wrong with my settings(doc attached 1&2)
I am kind of lost especially that SEOmoz hasn't detected any URLs blocked by Robot.txt
It would be great if you could tell me what should I do next ?
Thanks
-
The first question i would ask is how big is the difference. If the difference is a large in the # of pages on your site and the ones indexed by Google, then you have an issue. The blocked pages might be the ones linking to the ones that have not been indexed and causing issues. Try removing the no follow on those pages and then resubmit your sitemap and see if that fixes the issue. Also double check your site map to make sure you have correctly added all the pages in it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Issue with GA tracking and Native AMP
Hi everyone, We recently pushed a new version of our site (winefolly.com), which is completely AMP native on WordPress (using the official AMP for WordPress plugin). As part of the update, we also switched over to https. In hindsight we probably should have pushed the AMP version and HTTPS changes in separate updates. As a result of the update, the traffic in GA has dropped significantly despite the tracking code being added properly. I'm also having a hard time getting the previous views in GA working properly. The three views are: Sitewide (shop.winefolly.com and winefolly.com) Content only (winefolly.com) Shop only (shop.winefolly.com) The sitewide view seems to be working, though it's hard to know for sure, as the traffic seems pretty low (like 10 users at any given time) and I think that it's more that it's just picking up the shop traffic. The content only view shows maybe one or two users and often none at all. I tried a bunch of different filters to only track to the main sites content views, but in one instance the filter would work, then half an hour later it would revert to no traffic. The filter is set to custom > exclude > request uri with the following regex pattern: ^shop.winefolly.com$|^checkout.shopify.com$|/products/.|/account/.|/checkout/.|/collections/.|./orders/.|/cart|/account|/pages/.|/poll/.|/?mc_cid=.|/profile?.|/?u=.|/webstore/. Testing the filter it strips out anything not related to the main sites content, but when I save the filter and view the updated results, the changes aren't reflected. I did read that there is a delay in the filters being applied and only a subset of the available data is used, but I just want to be sure I'm adding the filters correctly. I also tried setting the filter to predefined, exclude host equal to shop.winefolly.com, but that didn't work either. The shop view seems to be working, but the tracking code is added via Shopify, so it makes sense that it would continue working as before. The first thing I noticed when I checked the views is that they were still set to http, so I updated the urls to https. I then checked the GA tracking code (which is added as a json object in the Analytics setting in the WordPress plugin. Unfortunately, while GA seems to be recording traffic, none of the GA validators seem to pickup the AMP tracking code (adding using the amp-analytics tag), despite the json being confirmed as valid by the plugin. This morning I decided to try a different approach and add the tracking code via Googles Tag Manager, as well as adding the new https domain to the Google Search Console, but alas no change. I spent the whole day yesterday reading every post I could on the topic, but was not able to find any a solution, so I'm really hoping someone on Moz will be able to shed some light as to what I'm doing wrong. Any suggestions or input would be very much appreciated. Cheers,
Technical SEO | | winefolly
Chris (on behalf of WineFolly.com)0 -
Fetch as Google issues
HI all, Recently, well a couple of months back, I finally got around to switching our sites over to HTTPS://. In terms of rankings etc all looks fine and we have not move about much, only the usual fluctuations of a place or two on a daily basis in a competitive niche. All links have been updated, redirects in place, the usual https domain migration stuff. I am however, troubled by one thing! I cannot for love nor money get Google to fetch my site in GSC. No matter what I have tried it continues to display "Temporarily unreachable". I have checked the robots.txt and it is on a new https:// profile in GSC. Has anyone got a clue as I am stumped! Have I simply become blinded by looking too much??? Site in Q. caravanguard co uk. Cheers and looking forward to your comments.... Tim
Technical SEO | | TimHolmes0 -
Google Webmaster Image Index Issue
I submitted the image sitemap in GWT and only few of them get indexed in google, but now the indexed images are also getting de-index. Any solution for it? See the attached E4hPDQE
Technical SEO | | tigersohelll0 -
I've had a sudden a increase in crawl issues as of yesterday (like 300 from a steady 10, does anyone else have this issue?
the main issue is that it's now indexing both www and http:// - anyone else got this issue or had any changes suddenly on their crawl results?
Technical SEO | | beckyhy0 -
Issue: Duplicate Page Content
Hi All, I am getting warnings about duplicate page content. The pages are normally 'tag' pages. I have some blog posts tagged with multiple 'tags'. Does it really affect my site?. I am using wordpress and Yoast SEO plugin. Thanks
Technical SEO | | KLLC0 -
Sitemaps and "noindex" pages
Experimenting a little bit to recover from Panda and added "noindex" tag for quite a few pages. Obviously now we need Google to re-crawl them ASAP and de-index. Should we leave these pages in sitemaps (with updated "lastmod") for that? Or just patiently wait? 🙂 What's the common/best way?
Technical SEO | | LocalLocal0 -
NEED HELP ASAP: SERVER ISSUE
Hey guys, Some of you may be aware of our story. We have a website about or son who was born with Down syndrome. Two days a go a post I wrote went sort of viral, and I woke up this morning to an email from my host saying they had to take my site down as an emergency because of the amount of resources it is using. So now my site is down (noahsdad.com.) ...any ideas how to proceeded? I really need to get my site back online asap. Thank you.
Technical SEO | | NoahsDad0 -
Canonicalization - duplicate homepage issues
I'm trying to work out the best way to resolve an issue where Google is seeing duplicate versions of a homepage, i.e. http://www.home.co.uk/Home.aspx and http://www.home.co.uk/ The site runs on Windows servers. I've tried implementing redirects for homepages before (for a different site on a linux server) and ended up with a loop, so although I know I can read lots of info (as I have been doing) and try again, I am really concerned about getting it wrong. Can anyone give me some advice on the best way to make Google take just one version of the page? Obviously link juice is also being diluted so I need to get this sorted asap. Thanks.
Technical SEO | | travelinnovations0