Using Site Maps Correctly
-
Hello
I'm looking to submit a sitemap for a post driven site with over 5000 pages.
The site hasn't got a sitemap but it is indexed by google - will submitting a sitemap make a difference at this stage?
Also, most free sitemap tools only go up to 5000 pages, and I'm thinking I would try a sitemap using a free version of the tool before I buy one - If my site is 5500 pages but I only submit a sitemap for 5000 (I have no control of which pages get included in the sitemap) would this have a negative effect for the pages that didn't get included?
Thanks
-
Submitting a sitemap in Webmaster Console is always a good idea at any stage. If your website URLs are crawled and indexed in search engines than there will be no negative impact of it but in the longer run if you add more pages sitemap will defiantly a help.
If you are using CMS like WordPress, Joomla, Zencart or any other they all have extensions and plugins in their directory that will help you generate the sitemap of your current site and will add links as soon as you will add more pages.
Rest peter explains almost everything in detail like if you have URL issues and issues with crawling and indexing.
If you have a custom CMS, I think you should seriously consider the idea by Peter as this is something you need on regular basis anyways!
Hope this helps!
-
It's hard to tell without seeing your URL architecture.
First there are two specific terms and you never, never ever should forget them. They are - crawling and indexing. Once you prepare sitemap and submit there (or include in robots.txt) all bots get some map of your site and start crawling pages based on their crawling budget for your site. In crawling process they MAY find new pages that doesn't include in this map and will crawl them too. Again this is based on your crawling budget.
So when you submit sitemap - bot will get within seconds list of "non-crawled" 5000 pages and will start crawl them. Then he can find missed 500 pages and will crawl them too. Tricky is that when you update sitemap - he can detect quick changes there and start recrawling them again. But for missed 500 pages he can visit you again to check them for changes. And this will be also under your crawling budget. But if pages there isn't changed often - isn't big deal.
So you shouldn't hesitated about negative impact there. Only negative impact can happen if you have some serious URL architecture issues and messy URLs there. Then submitting partial sitemap can obfuscate this issues and some of your URLs to remain non-crawled.
Technically in SearchConsole you can see sitemap statistics like submitted and indexed. In perfect world numbers should be almost equal with little difference. But if you see huge difference between them - then you're in trouble. For example - on some site i have sitemap with submitted 44,950 pages and indexed of them was 29,643. This is pure example site crawling troubles or sitemap troubles. Because 1/3 of all pages isn't indexed at all.
PS: I forgot. You should use own CMS plugin for generating sitemap inside. Even if your CMS was custom made you should write (or hire someone) to create plugin inside. It's near 20-30 lines of write-here-your-favorite-language (PHP/Python/Perl/Ruby) and isn't big deal. This plugin will minimize crawling time from 3rd party sitemap generator tool because CMS already have all information inside and just need to be exported to XML.
-
It would definitely be better to submit a complete sitemap. If your site is built in Wordpress, Joomla, Magento, or many other standard CMS, it should have the ability to generate a full sitemap. Plugins like Yoast or Google Sitemaps help. Just depends on the site.
Otherwise you can probably get any pro SEO or agency to create a full 5500+ sitemap for you for $100 bucks or so. PM me if you need more help.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What is the most effective way of selecting a top keyword per page on a site?
We are creating fresh content for outdated sites and I need to identify the most significant keyword per page for the content developers, What is the best way to do this?
Reporting & Analytics | | Sable_Group0 -
What type of links/redirect is Yahoo! using?
So I'm trying to figure out exactly what type redirect or hyperlinking Yahoo! is using on their article pages. For example:
Reporting & Analytics | | William.Lau
https://shopping.yahoo.com/blogs/fashionate/spring-clean-your-beauty-routine--10-tips-on-looking-fresh-this-season-000058218.html Hover over an external link, it shows you the ending URL. Right or left click it, it gives you a 302 redirect. When you actually left click it, it adds and "id" attribute, I assume for tracking. However, when you left click the the hyperlink, it no longer shows as a 302. I have limited working knowledge of web development techniques, so anyone with advance knowledge or have actually done this, it'd be helpful to understand this more.0 -
404 errors more than 1.8 lacs, Duplicate Content, Duplicate title, missing meta description increasing as site is based on regular ticket selling (CRM), kindly help
Sites error increasing i.e. 404 errors more than 1.8 lacs, Duplicate Content, Duplicate title, missing meta description increasing day by day as site is based on regular ticket selling (CRM), We have checked with webmasters for 404's, but it is not easy to delete 1.8 lac entries. How to resolve this issue for future. kindly help and suggest the solution.
Reporting & Analytics | | 1akal0 -
Any harm and why the differences - multiple versions of same site in WMT
In Google Webmaster Tools we have set up: ourdomain.co.nz
Reporting & Analytics | | zingseo
ourdomain.co.uk
ourdomain.com
ourdomain.com.au
www.ourdomain.co.nz
www.ourdomain.co.uk
www.ourdomain.com
www.ourdomain.com.au
https://www.ourdomain.co.nz
https://www.ourdomain.co.uk
https://www.ourdomain.com
https://www.ourdomain.com.au As you can imagine, this gets confusing and hard to manage. We are wondering whether having all these domains set up in WMT could be doing any damage? Here http://support.google.com/webmasters/bin/answer.py?hl=en&answer=44231 it says: "If you see a message that your site is not indexed, it may be because it is indexed under a different domain. For example, if you receive a message that http://example.com is not indexed, make sure that you've also added http://www.example.com to your account (or vice versa), and check the data for that site." The above quote suggests that there is no harm in having several versions of a site set up in WMT, however the article then goes on to say: "Once you tell us your preferred domain name, we use that information for all future crawls of your site and indexing refreshes. For instance, if you specify your preferred domain as http://www.example.com and we find a link to your site that is formatted as http://example.com, we follow that link as http://www.example.com instead." This suggests that having multiple versions of the site loaded in WMT may cause Google to continue crawling multiple versions instead of only crawling the desired versions (https://www.ourdomain.com + .co.nz, .co.uk, .com.au). However, even if Google does crawl any URLs on the non https versions of the site (ie ourdomain.com or www.ourdomain.com), these 301 to https://www.ourdomain.com anyway... so shouldn't that mean that google effectively can not crawl any non https://www versions (if it tries to they redirect)? If that was the case, you'd expect that the ourdomain.com and www.ourdomain.com versions would show no pages indexed in WMT, however the oposite is true. The ourdomain.com and www.ourdomain.com versions have plenty of pages indexed but the https versions have no data under Index Status section of WMT, but rather have this message instead: Data for https://www.ourdomain.com/ is not available. Please try a site with http:// protocol: http://www.ourdomain.com/. This is a problem as it means that we can't delete these profiles from our WMT account. Any thoughts on the above would be welcome. As an aside, it seems like WMT is picking up on the 301 redirects from all ourdomain.com or www.ourdomain.com domains at least with links - No ourdomain.com or www.ourdomain.com URLs are registering any links in WMT, suggesting that Google is seeing all links pointing to URLs on these domains as 301ing to https://www.ourdomain.com ... which is good, but again means we now can't delete https://www.ourdomain.com either, so we are stuck with 12 profiles in WMT... what a pain.... Thanks for taking the time to read the above, quite complicated, sorry!! Would love any thoughts...0 -
My GA code is on my site but Google Analytics isn't being pulled into SEOMoz...why?
The CEO wants me to present an SEO plan next week for three of our sites; however, I got this message when I went to campaign overview tab: "It appears there's a problem with our connection to your Google Analytics account. Please go to your Settings page to update your connection." I double-checked the GA code and it's the same on both our site and in SEOMoz...what gives? I clicked on Choose Your GA Profile->Set GA Account and Profile then got this warning: "Are you sure you want to change your Google Analytics connection? Changing your connection will reset our cache of your historical GA traffic data." I need this data pronto so I can set strategy for three sites; any help would be greatly appreciated! Darrell
Reporting & Analytics | | AdviceElle0 -
Google Maps not passing referral data
Google Maps is not passing referral data (URLs, not KWs). Google+ Local is referring, but nothing from maps. Maps referrals appear to be coming across as direct. Any ideas? We haven't found anything online, one of the guys at the office documented what we did find, using Chrome's debugger - http://manofactionmetrics.com/2012/11/02/google-maps-not-passing-any-referral-data/
Reporting & Analytics | | Danieljacobree0 -
Page Speed - What tool to use?
I am looking for a good tool to measure page speed. Any tools out there that you recommend?
Reporting & Analytics | | rmontanez0 -
Question on correctly using rel="canonical
OK I have a question for the community here. All links below are just used as examples and no relationship or real campaigns are being used with any websites named below. Lets say that my domain is abc.com/whiskey/jack-daniels-whiskey/Gentleman-Jack/ but for Google Analytics tracking purposes I gave another website a tracking link for a banner that is as follows http://abc.com/whiskey/jack-daniels-whiskey/Gentleman-Jack/?utm_source=jackdanials&utm_medium=banner&utm_content=Gentleman-Jack&utm_campaign=holiday%2Bpromotion Since the original URL to my site is http://abc.com/whiskey/jack-daniels-whiskey/Gentleman-Jack and Google will then spider the other site picking up my tracking link within the banner which also contains my original URL, can it cause issues with duplicate content and if so what is the best way to use rel="canonical in this case or would you handle this issue in a different way? Thanks in advance for all your help.
Reporting & Analytics | | DRTBA0