Is it OK for a sitemap to appear as a "Top URL" in Google Webmaster?
-
I'm using Google Webmaster (alongside other tools) to understand how Google is indexing my site.
One of the tools is "Content Keywords", where it lists keywords that Google sees as significant for your site. The keywords shown are generally fine, but when I click on an individual word, I am often seeing our sitemap as one of the "Top URLs" that the keyword is found on (our sitemap is at system/sitemap1.xml.gz) - is this OK?
Obviously I don't want to add the sitemap URL to robots.txt, but I also want to ensure that 'real' user-focused pages (e.g. our homepage) appear higher in the "Top URLs" list for the keywords, as I'm assuming this is an indicator of how the site is performing in search.
Any help appreciated!
-
Thanks for the answer. However I'm still unclear on a few things so I thought I'd give some further info:
- We actually have two XML sitemaps - one for our main site including our forums (this sitemap is generated/submitted by a ruby on rails plugin) and one for blog posts and static pages (this sitemap is generated by a Wordpress plugin). The sitemap which is appearing as a "Top URL" is the first one
- There are actually no links to our sitemap anywhere on our site - the only way Google knows about it is because we automatically generate and submit it to Webmaster
I think the reason that it is appearing as a Top URL is because all of the page titles of forum posts are listed in the sitemap, and this is the only page where they are all listed on one page. So I think you are right about the 'simple algorithm' thing, but I think it's because of the frequency of the keyword in the sitemap, rather than because the sitemap is linked to from anywhere on the site (because it's not).
This brings me to a related question - is it bad having two separate XML sitemaps, and should I be linking to them somehow from the site?
-
I wouldn't be overly concerned.
For some terms, especially product codes and the detail pages of your site there are probably only going to be three pages where that term appears. The product page itself, the page within the navigation that links to that page (normally a list), and the sitemap.
Your sitemap is probably heavily linked to across the site so it does kind of make sense that it would appear as one of the top URLs for a term.
The reason I wouldn't be overly concerned is that I would IMAGINE (and I could be totally wrong) that the top Pages list is generated by a very simple algorithm that doesn't reflect how the organic search algorithm sees your site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Clean URL vs. Parameter URL and Using Canonical URL...That's a Mouthfull!
Hi Everyone, I a currently migrating a Magento site over to Shopify Plus and have a question about best practices for using the canonical URL. There is a competitor that I believe is not doing it the correct way, so I want to make sure my way is the better choice. With 'Vendor Pages' in Shopify, they show up looking like: https://www.campusprotein.com/collections/vendors?q=Cellucor. Not as clean. Problem is that Shopify also creates https://www.campusprotein.com/collections/cellucor. Same products, same page, just a different more clean URL. I am seeing both indexed in Google. What I want to do is basically create a canonical URL from the URL with the parameter that points to the clean URL. The two pages are very similar. The only difference is that the clean URL page has some additional content at the top of the page. I would say the two pages are 90% the same. Do you see any issue with that?
Technical SEO | | vetofunk0 -
Sitemaps, 404s and URL structure
Hi All! I recently acquired a client and noticed in Search Console over 1300 404s, all starting around late October this year. What's strange is that I can access the pages that are 404ing by cutting and pasting the URLs and via inbound links from other sites. I suspect the issue might have something to do with Sitemaps. The site has 5 Sitemaps, generated by the Yoast plugin. 2 Sitemaps seem to be working (pages being indexed), 3 Sitemaps seem to be not working (pages have warnings, errors and nothing shows up as indexed). The pages listed in the 3 broken sitemaps seem to be the same pages giving 404 errors. I'm wondering if auto URL structure might be the culprit here. For example, one sitemap that works is called newsletter-sitemap.xml, all the URLs listed follow the structure: http://example.com/newsletter/post-title Whereas, one sitemap that doesn't work is called culture-event-sitemap.xml. Here the URLs underneath follow the structure http://example.com/post-title. Could it be that these URLs are not being crawled / found because they don't follow the structure http://example.com/culture-event/post-title? If not, any other ideas? Thank you for reading this long post and helping out a relatively new SEO!
Technical SEO | | DanielFeldman0 -
Website Migration - Very Technical Google "Index" Question
This is my understanding of how Google's search works, and I am unsure about one thing in specifc: Google continuously crawls websites and stores each page it finds (let's call it "page directory") Google's "page directory" is a cache so it isn't the "live" version of the page Google has separate storage called "the index" which contains all the keywords searched. These keywords in "the index" point to the pages in the "page directory" that contain the same keywords. When someone searches a keyword, that keyword is accessed in the "index" and returns all relevant pages in the "page directory" These returned pages are given ranks based on the algorithm The one part I'm unsure of is how Google's "index" connects to the "page directory". I'm thinking each page has a url in the "page directory", and the entries in the "index" contain these urls. Since Google's "page directory" is a cache, would the urls be the same as the live website? For example if webpage is found at wwww.website.com/page1, would the "page directory" store this page under that url in Google's cache? The reason I ask is I am starting to work with a client who has a newly developed website. The old website domain and files were located on a GoDaddy account. The new websites files have completely changed location and are now hosted on a separate GoDaddy account, but the domain has remained in the same account. The client has setup domain forwarding/masking to access the files on the separate account. From what I've researched domain masking and SEO don't get along very well. Not only can you not link to specific pages, but if my above assumption is true wouldn't Google have a hard time crawling and storing each page in the cache?
Technical SEO | | reidsteven750 -
Webmaster Tools - Clarification of what the top directory is in a calender url
Hi all, I had an issue where it turned out a calender was used on my site historically (a couple of years ago) but the pages were still present, crawled and indexed by google to this day. I want to remove them now from the index as it really clouds my analysis and as I have been trying to clean things up e.g. by turning modules off, webmaster tools is throwing up more and more errors due to these pages. Below is an example of the url of one of the pages: http://www.example.co.uk/index.php?mact=Calendar,m1a033,default,1&m1a033year=2084&m1a033month=3&m1a033returnid=59&page=59?phpMyAdmin=xxyyzz The closest question I have found on the topic in Seomoz is: http://www.seomoz.org/q/duplicate-content-issue-6 I want to remove all these pages from the index by targeting their top level folder. From the historic question above would I be right in saying that it is: http://www.example.co.uk/index.php?mact=Calendar I want to be certain before I do a directory level removal request in case it actually targets index.php instead and deindexes my whole site (or homepage at the very least). Thanks
Technical SEO | | Mitty0 -
Updating content on URL or new URL
High Mozzers, We are an event organisation. Every year we produce like 350 events. All the events are on our website. A lot of these events are held every year. So i have an URL like www.domainname.nl/eventname So what would you do. This URL has some inbound links, some social mentions and so on. SO if the event will be held again in 2013. Would it be better to update the content on this URL or create a new one. I would keep this URL and update it because of the linkvalue and it is allready indexed and ranking for the desired keyword for that event. Cheers, Ruud
Technical SEO | | RuudHeijnen0 -
URL Error "NODE"
Hey guys, So I crawled my site after fixing a few issues, but for some reason I'm getting this strange node error that goes www.url.com/node/35801 which I haven't seen before. It appears to originate from user submitted content and when I go to the page it's a YouTube video with no video playing just a black blank screen. Has anyone had this issue before. I think it can probably just be taken off the site, but if it's a programming error of some sort I'd just like to know what it is to avoid it in the future. Thanks
Technical SEO | | KateGMaker0 -
Is "last modified" time in XML Sitemaps important?
My Tech lead is concerned that his use of a script to generate XML sitemaps for some client sites may be causing negative issues for those sites. His concern centers around the fact that the script generates a sitemap which indicates that every URL page in the site was last modified at the exact same date and time. I have never heard anything to indicate that this might be a problem, but I do know that the sitemaps I generate for other client sites can choose server response or not. What is the best way to generate the sitemap? Last mod from actual time modified, or all set at one date and time?
Technical SEO | | ShaMenz0