Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Can't generate a sitemap with all my pages
-
I am trying to generate a site map for my site nationalcurrencyvalues.com but all the tools I have tried don't get all my 70000 html pages... I have found that the one at check-domains.com crawls all my pages but when it writes the xml file most of them are gone... seemingly randomly.
I have used this same site before and it worked without a problem. Can anyone help me understand why this is or point me to a utility that will map all of the pages?
Kindly,
Greg
-
Thank you all for the responses... I found them all helpful. I will look into creating my own sitemap with the IIS tool.
I can't help the 70k pages but the URLS are totally static. I guess I can make a site map for all the aspx pages and then other one for all the lowest level .html pages.
Thanks everyone!
-
I definitely agree with Logan. The max for an XML sitemap for Search Console is 50,000 URLs, so you won't be able to fit all of yours into one.
That being the case, divide them into different sitemaps by category or type, then list all of those in one directory sitemap and submit that. Now you can see indexation by page type on your website.
Finally, I have to ask why you are doing this with a third party tool and creating a static sitemap as opposed to creating a dynamic one that can update automatically when you publish new content? If your site is static and you're not creating new pages, then your approach might be ok, but otherwise I'd recommend investigating how you build a dynamic XML sitemap that updates with new content.
Cheers!
-
Looking at your site how sure are you that you need 70,000 pages?
For the sitemap I would stop trying to use a website and do it yourself. It looks like you are running IIS. They have a sitemap generator that you can install on a server easily and run it there. It looks like you have GoDaddy, they catch a lot of crap but I have always found their technical support to be top notch. If you can't figure out how to do it on the server I would give them a call.
-
Greg,
Have you tried creating multiple XML sitemaps by section of the site, like by folder or by product detail pages? 70,000 is a huge amount of URLs and even if you could get them all on one sitemap, I wouldn't recommend it. Nesting sitemaps into an index sitemap can help Google understand your site structure and make it easier for you to troubleshoot indexing problems should they arise.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
After hack and remediation, thousands of URL's still appearing as 'Valid' in google search console. How to remedy?
I'm working on a site that was hacked in March 2019 and in the process, nearly 900,000 spam links were generated and indexed. After remediation of the hack in April 2019, the spammy URLs began dropping out of the index until last week, when Search Console showed around 8,000 as "Indexed, not submitted in sitemap" but listed as "Valid" in the coverage report and many of them are still hack-related URLs that are listed as being indexed in March 2019, despite the fact that clicking on them leads to a 404. As of this Saturday, the number jumped up to 18,000, but I have no way of finding out using the search console reports why the jump happened or what are the new URLs that were added, the only sort mechanism is last crawled and they don't show up there. How long can I expect it to take for these remaining urls to also be removed from the index? Is there any way to expedite the process? I've submitted a 'new' sitemap several times, which (so far) has not helped. Is there any way to see inside the new GSC view why/how the number of valid URLs in the indexed doubled over one weekend?
Intermediate & Advanced SEO | | rickyporco0 -
How can I avoid duplicate content for a new landing page which is the same as an old one?
Hello mozers! I have a question about duplicate content for you... One on my clients pages have been dropping in search volume for a while now, and I've discovered it's because the search term isn't as popular as it used to be. So... we need to create a new landing page using a more popular search term. The page which is losing traffic is based on the search query "Can I put a solid roof on my conservatory" this only gets 0-10 searches per month according to the keyword explorer tool. However, if we changed this to "replacing conservatory roof with solid roof" this gets up to 500 searches per month. Muuuuch better! The issue is, I don't want to close down and re-direct the old page because it's got a featured snippet and sits in position 1. So I'd like to create another page instead... however, as the two are effectively the same content, I would then land myself in a duplicate content issue. If I were to put a rel="canonical" tag in the original "can I put a solid roof...." page but say the master page is now the new one, would that get around the issue?
Intermediate & Advanced SEO | | Virginia-Girtz0 -
My product category pages are not being indexed on google can someone help?
My website has been indexed on google and all of its pages can be found on google except for the product category pages - which are where we want our traffic heading to, so this is a big problem for us. Our website is www.skirtinguk.com And an example of a page that isn't being indexed is https://www.skirtinguk.com/product-category/mdf-skirting-board/
Intermediate & Advanced SEO | | chelseaskirtinguk0 -
What's the best way to noindex pages but still keep backlinks equity?
Hello everyone, Maybe it is a stupid question, but I ask to the experts... What's the best way to noindex pages but still keep backlinks equity from those noindexed pages? For example, let's say I have many pages that look similar to a "main" page which I solely want to appear on Google, so I want to noindex all pages with the exception of that "main" page... but, what if I also want to transfer any possible link equity present on the noindexed pages to the main page? The only solution I have thought is to add a canonical tag pointing to the main page on those noindexed pages... but will that work or cause wreak havoc in some way?
Intermediate & Advanced SEO | | fablau3 -
ScreamingFrog won't crawl my site.
Hey guys, My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages. Examples
Intermediate & Advanced SEO | | FrederikTrovatten22
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspx Is it because the products are being loaded in Javascript?
What's your recommendation? All best,
Fred.0 -
Should pages with rel="canonical" be put in a sitemap?
I am working on an ecommerce site and I am going to add different views to the category pages. The views will all have different urls so I would like to add the rel="canonical" tag to them. Should I still add these pages to the sitemap?
Intermediate & Advanced SEO | | EcommerceSite0 -
Can't crawl website with Screaming frog... what is wrong?
Hello all - I've just been trying to crawl a site with Screaming Frog and can't get beyond the homepage - have done the usual stuff (turn off JS and so on) and no problems there with nav and so on- the site's other pages have indexed in Google btw. Now I'm wondering whether there's a problem with this robots.txt file, which I think may be auto-generated by Joomla (I'm not familiar with Joomla...) - are there any issues here? [just checked... and there isn't!] If the Joomla site is installed within a folder such as at e.g. www.example.com/joomla/ the robots.txt file MUST be moved to the site root at e.g. www.example.com/robots.txt AND the joomla folder name MUST be prefixed to the disallowed path, e.g. the Disallow rule for the /administrator/ folder MUST be changed to read Disallow: /joomla/administrator/ For more information about the robots.txt standard, see: http://www.robotstxt.org/orig.html For syntax checking, see: http://tool.motoricerca.info/robots-checker.phtml User-agent: *
Intermediate & Advanced SEO | | McTaggart
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/0 -
My landing pages don't show up in the SERPs, only my frontpage does.
I am having some trouble with getting the landing pages for a clients website to show up in the SERPs.
Intermediate & Advanced SEO | | InmediaDK
As far as I can see, the pages are optimized well, and they also get indexed by Google. The website is a danish webshop that sells wine, www.vindanmark.com Take for an instance this landing page, http://www.vindanmark.com/vinhandel/
It is optimzied for the keywords "Vinhandel Århus". Vinhandel means "Winestore" and "Århus" is a danish city. As you can see, I manage to get them at page 1 (#10), but it's the frontpage that ranks for the keyword. And this goes for alle the other landing pages as well. But I can't figure out, why the frontpage keep outranking the landingpages on every keyword.
What am I doing wrong here?1