undefined
Skip to content
Moz logo Menu open Menu close
  • Products
    • Moz Pro
    • Moz Pro Home
    • Moz Local
    • Moz Local Home
    • STAT
    • Moz API
    • Moz API Home
    • Compare SEO Products
    • Moz Data
  • Free SEO Tools
    • Domain Analysis
    • Keyword Explorer
    • Link Explorer
    • Competitive Research
    • MozBar
    • More Free SEO Tools
  • Learn SEO
    • Beginner's Guide to SEO
    • SEO Learning Center
    • Moz Academy
    • SEO Q&A
    • Webinars, Whitepapers, & Guides
  • Blog
  • Why Moz
    • Agency Solutions
    • Enterprise Solutions
    • Small Business Solutions
    • Case Studies
    • The Moz Story
    • New Releases
  • Log in
  • Log out
  • Products
    • Moz Pro

      Your all-in-one suite of SEO essentials.

    • Moz Local

      Raise your local SEO visibility with complete local SEO management.

    • STAT

      SERP tracking and analytics for enterprise SEO experts.

    • Moz API

      Power your SEO with our index of over 44 trillion links.

    • Compare SEO Products

      See which Moz SEO solution best meets your business needs.

    • Moz Data

      Power your SEO strategy & AI models with custom data solutions.

    NEW Keyword Suggestions by Topic
    Moz Pro

    NEW Keyword Suggestions by Topic

    Learn more
  • Free SEO Tools
    • Domain Analysis

      Get top competitive SEO metrics like DA, top pages and more.

    • Keyword Explorer

      Find traffic-driving keywords with our 1.25 billion+ keyword index.

    • Link Explorer

      Explore over 40 trillion links for powerful backlink data.

    • Competitive Research

      Uncover valuable insights on your organic search competitors.

    • MozBar

      See top SEO metrics for free as you browse the web.

    • More Free SEO Tools

      Explore all the free SEO tools Moz has to offer.

    NEW Keyword Suggestions by Topic
    Moz Pro

    NEW Keyword Suggestions by Topic

    Learn more
  • Learn SEO
    • Beginner's Guide to SEO

      The #1 most popular introduction to SEO, trusted by millions.

    • SEO Learning Center

      Broaden your knowledge with SEO resources for all skill levels.

    • On-Demand Webinars

      Learn modern SEO best practices from industry experts.

    • How-To Guides

      Step-by-step guides to search success from the authority on SEO.

    • Moz Academy

      Upskill and get certified with on-demand courses & certifications.

    • SEO Q&A

      Insights & discussions from an SEO community of 500,000+.

    Unlock flexible pricing & new endpoints
    Moz API

    Unlock flexible pricing & new endpoints

    Find your plan
  • Blog
  • Why Moz
    • Small Business Solutions

      Uncover insights to make smarter marketing decisions in less time.

    • Agency Solutions

      Earn & keep valuable clients with unparalleled data & insights.

    • Enterprise Solutions

      Gain a competitive edge in the ever-changing world of search.

    • The Moz Story

      Moz was the first & remains the most trusted SEO company.

    • Case Studies

      Explore how Moz drives ROI with a proven track record of success.

    • New Releases

      Get the scoop on the latest and greatest from Moz.

    Surface actionable competitive intel
    New Feature

    Surface actionable competitive intel

    Learn More
  • Log in
    • Moz Pro
    • Moz Local
    • Moz Local Dashboard
    • Moz API
    • Moz API Dashboard
    • Moz Academy
  • Avatar
    • Moz Home
    • Notifications
    • Account & Billing
    • Manage Users
    • Community Profile
    • My Q&A
    • My Videos
    • Log Out

The Moz Q&A Forum

  • Forum
  • Questions
  • Users
  • Ask the Community

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

  1. Home
  2. SEO Tactics
  3. Technical SEO
  4. Good alternatives to Xenu's Link Sleuth and AuditMyPc.com Sitemap Generator

Moz Q&A is closed.

After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.

Good alternatives to Xenu's Link Sleuth and AuditMyPc.com Sitemap Generator

Technical SEO
6
11
8.9k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as question
Log in to reply
This topic has been deleted. Only users with question management privileges can see it.
  • blrs12
    blrs12 last edited by Apr 2, 2015, 5:10 AM

    I am working on scraping title tags from websites with 1-5 million pages. Xenu's Link Sleuth seems to be the best option for this, at this point. Sitemap Generator from AuditMyPc.com seems to be working too, but it starts handing up, when a sitemap file, the tools is working on,becomes too large. So basically, the second one looks like it wont be good for websites of this size. I know that Scrapebox can scrape title tags from list of url, but this is not needed, since this comes with both of the above mentioned tools.

    I know about DeepCrawl.com also, but this one is paid, and it would be very expensive with this amount of pages and websites too (5 million ulrs is $1750 per month, I could get a better deal on multiple websites, but this obvioulsy does not make sense to me, it needs to be free, more or less). Seo Spider from Screaming Frog is not good for large websites.

    So, in general, what is the best way to work on something like this, also time efficient. Are there any other options for this?

    Thanks.

    1 Reply Last reply Reply Quote 0
    • TheeDigital
      TheeDigital last edited by Apr 3, 2015, 3:14 PM Apr 3, 2015, 3:14 PM

      import.io and it's free

      1 Reply Last reply Reply Quote 0
      • blrs12
        blrs12 last edited by Apr 2, 2015, 9:47 AM Apr 2, 2015, 9:44 AM

        Another idea that I have here, is to look for sitemaps of these websites. There may be a way to get a list of all the urls, right away, without crawling. Look at /robots.txt, /sitemap.xml, search for sitemap in Google, things like that. If there is urls, title tags can be scraped with Scrapebox, and as far as their website is saying, it can be done relatively fast.

        # # Edit:

        I had somebody suggesting http://inspyder.com, around $40 and free trial. May be a good option too.

        1 Reply Last reply Reply Quote 0
        • blrs12
          blrs12 last edited by Apr 2, 2015, 9:40 AM Apr 2, 2015, 9:39 AM

          So there is probably no way to tell, whether I have all the urls of the site, or what percentage I have... I may have 80 or even less percent of the total site, and not know about it, I would assume. This is one of the parts of working on the sites (I've never needed it, but I am working on something like this now), and there is no good tool, which would do the work.

          I have a website with 33,500,000 pages. I've been running the tool for close to 5 hours, and I have around 125,000 urls, so far. This means, that it would take 1340 hours to do the entire site. This is close to two months of running the program 24 hours a day, which does not make sense. And besides that I was planning to do it on up to 100 sites. Definitely not something that can be done, and I would say that it should be possible, software-wise.

          I will try your method, and see what I will get. I dont have too much time for experimenting with it too. I need to work, and generate results...

          # # Edit

          I will now how the number of urls compares to the 33,500,000 figure, obviously, but whats indexed in Google is not necessarily the complete website too. The method that you are suggesting is not perfect, but I dont have two months to wait too, obviously...

          1 Reply Last reply Reply Quote 0
          • MattAntonino
            MattAntonino last edited by Apr 2, 2015, 9:27 AM Apr 2, 2015, 9:27 AM

            You will crawl some of the same URLs - that's why you remove duplicates at the end.  There's no way to keep it from re-crawling some of the URLs, as far as I know.

            But yes, get it to recognize 600-800k URLs and then split the file.  (Export, put the links in as an html file and start over.) Let me break it down the best I can:

            1. Crawl your main (seed) URL until you've recognized 800k.

            2. Pause/stop and then export the results.

            3. Create an html file with the URLs from the export - separated 50k to 100k at a time.

            4. Recrawl those files in Xenu with the "file" option.

            5. Build them back up to 800k or so recognized URLs again and repeat.

            After a few (4-6) iterations of this, you'll have most URLs crawled on most sites no matter how large.  Doing it this way, I think you could expect to crawl about 2-3 million URLs a day.  If you really paid attention to it and created smaller files but ran them more frequently, you could get 4-5 million, I think.  I've crawled close to that in a day for a scrape once.

            1 Reply Last reply Reply Quote 0
            • blrs12
              blrs12 last edited by Apr 2, 2015, 9:22 AM Apr 2, 2015, 9:16 AM

              Thanks. It is good to hear, that there is a way to do, of what I am trying to do, especially on 50 or more sites, large.

              I've been running Xenu on a 33,500,000 pages site for a little over 4 hours and 15 minutes, and I have something like this, so far:

              http://prntscr.com/6ojt92

              Close to 500,000 urls recognized, and only 115,000 processed, it looks like. I am manually saving it to a file, every now and then, as there is no way to auto save, as far as I was checking (there could be though, I am not sure, there is no too many options there).

              I am not sure, based on your advice, how I could speed it up this process. Should I wait from this point, then stop the program, and divide the file into 8 separate files, and load it to the program separately? Then the program will recognize these separate files as one, and it will continue crawling for new urls? If possible, please give better information on how this would need to be done, as I dont fully understand. I also dont see how this could do this large website in one day, or lets say even five days...

              # # Edit:

              I actually got to understanding what you mean, get 8 separate files (can be 6 or, lets say 10) and run them all at the same time. But still, how will the program know not to crawl and download the same urls, on all the files? In general, I would like to ask for better explanation, on how this needs to be done.

              Thanks.

              1 Reply Last reply Reply Quote 0
              • MattAntonino
                MattAntonino last edited by Apr 2, 2015, 8:58 AM Apr 2, 2015, 8:58 AM

                Let Xenu crawl until you have about 800k links.  Then export the file and add it back as 8 x 100k lists of URLs.  You can then run it again and repeat the process.  By the time you have split it 4-5 times, you can then export everything, put it into one file and remove duplicates.

                Xenu, done this way, with 100 threads, is probably the fastest way to do the whole thing. I think you could get the 5M results in under 1 day of work this way.

                1 Reply Last reply Reply Quote 0
                • blrs12
                  blrs12 last edited by Apr 2, 2015, 7:06 AM Apr 2, 2015, 7:06 AM

                  Ok. So it looks like Screaming Frog may be a good way to go too, if not better. Xenu is free, which is a big plus. On the top of that Creaming Frog's Seo Spider is based on a yearly subscription, and not a one time fee. For those who dont know, there is a version of Xenu for large sites, which can be found on their website. They also have a support group at groups.yahoo.com (find it through there), I am not sure if it is still active.

                  Xenu upgraded to the version for larger sites may be the best way to go, since it is free. I've been testing AuditMyPc.com Sitemap Creator and the better version of Xenu, and the first one already hanged up (I discontinued using it). They were both collecting the info at about the same speed, but Xenu is working better (does not hang up, looks like it should be good). Either way, this will take quite a lot of time, with it, as previously mentioned.

                  1 Reply Last reply Reply Quote 0
                  • Matt-Williamson
                    Matt-Williamson last edited by Apr 2, 2015, 6:36 AM Apr 2, 2015, 6:23 AM

                    I agree with Moosa and Danny - in terms of I use Screaming Frog (full paid version) on a stripped down windows machine with an SSD and 16GB of performance RAM. I have also download the 64 bit version of Java and increased the memory allocation for Screaming Frog to 12GB (default limit is 512mb) - here's how - http://www.screamingfrog.co.uk/seo-spider/user-guide/general/ (look at the section Increasing Memory on Windows 32 & 64-bit)

                    I did this as I was having issues crawling a large site - after I put this system in place it eats any site I have thrown at it so far so it works well for me personally. In terms of speed of crawl large sites such as you mention will still take a while - you can set crawl speed in Screaming Frog, but you need to be careful as you can overload the server of the site you are crawling and cause issues...

                    Another option would be to buy a server and configure it for Screaming Frog and other tools you may use - this gives you options to grow the system as your needs grow. It all depends on budget and how often you crawl large sites - obviously buying a server such as a windows instance on Amazon EC2 will cost more in the long run but it takes the strain away from your own systems and networks plus you should effectively never hit capacity on the server as you can just upgrade. It will also allow you to remote desktop in on whatever system you use - yes even a Mac 😉

                    Hope this helps 😄

                    1 Reply Last reply Reply Quote 0
                    • MoosaHemani
                      MoosaHemani Banned last edited by Apr 2, 2015, 5:45 AM Apr 2, 2015, 5:45 AM

                      I believe when you are talking about 1 to 5 million URLs it is going to take time no matter what tool you use but if you ask me screaming frog is a better tool and if you have a paid version of it you still can crawl websites with few million URLs in it.

                      Xenu is not a bad choice either but it’s kind of confusing and there is a possibility that it can broke.

                      Hope this helps!

                      1 Reply Last reply Reply Quote 1
                      • SEO-Expert-Danny
                        SEO-Expert-Danny Subscriber last edited by Apr 2, 2015, 5:37 AM Apr 2, 2015, 5:37 AM

                        I was facing similar issue with huge sites, that have over 100s of thousands of pages. But ever since I upgraded my computer with RAM and SSD it run way better on huge sites as well. I tried several scrappers and I still believe Xenu is the best one and most recommended by SEO experts. Also you might want to check this post on Moz Blog about Xenu's
                        http://moz.com/blog/xenu-link-sleuth-more-than-just-a-broken-links-finder

                        Good luck!

                        1 Reply Last reply Reply Quote 0
                        • 1 / 1
                        1 out of 11
                        • First post
                          1/11
                          Last post

                        Got a burning SEO question?

                        Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.


                        Start my free trial


                        Browse Questions

                        Explore more categories

                        • Moz Tools

                          Chat with the community about the Moz tools.

                        • SEO Tactics

                          Discuss the SEO process with fellow marketers

                        • Community

                          Discuss industry events, jobs, and news!

                        • Digital Marketing

                          Chat about tactics outside of SEO

                        • Research & Trends

                          Dive into research and trends in the search industry.

                        • Support

                          Connect on product support and feature requests.

                        • See all categories

                        Related Questions

                        • Rignite

                          Should I noindex my blog's tag, category, and author pages

                          Hi there, Is it a good idea to no index tag, category, and author pages on blogs? The tag pages sometimes have duplicate content.  And the category and author pages aren't really optimized for any search term. Just curious what others think. Thanks!

                          Technical SEO | Oct 10, 2014, 4:53 AM | Rignite
                          0
                        • reidsteven75

                          How Does Google's "index" find the location of pages in the "page directory" to return?

                          This is my understanding of how Google's search works, and I am unsure about one thing in specific: Google continuously crawls websites and stores each page it finds (let's call it "page directory") Google's "page directory" is a cache so it isn't the "live" version of the page Google has separate storage called "the index" which contains all the keywords searched.  These keywords in "the index" point to the pages in the "page directory" that contain the same keywords. When someone searches a keyword, that keyword is accessed in the "index" and returns all relevant pages in the "page directory" These returned pages are given ranks based on the algorithm The one part I'm unsure of is how Google's "index" knows the location of relevant pages in the "page directory".  The keyword entries in the "index" point to the "page directory" somehow. I'm thinking each page has a url in the "page directory", and the entries in the "index" contain these urls.   Since Google's "page directory" is a cache, would the urls be the same as the live website (and would the keywords in the "index" point to these urls)? For example if webpage is found at wwww.website.com/page1, would the "page directory" store this page under that url in Google's cache? The reason I want to discuss this is to know the effects of changing a pages url by understanding how the search process works better.

                          Technical SEO | Jun 2, 2013, 12:00 PM | reidsteven75
                          0
                        • yatesandcojewelers

                          Links from Instructables.com?

                          This is a silly newbie question.  But will posting on www.instructables.com with some valuable content and url link back to my site help with "linking"? Or do they put a no-follow on all links on their site? Thanks for answering! Ron

                          Technical SEO | Mar 20, 2016, 5:50 PM | yatesandcojewelers
                          0
                        • rhoadesjohn

                          The word 'shop' in a page title

                          I'm reworking most of the page titles on our site and I'm considering the use of the word 'Shop' before a product category. ex.  Shop 'keyword' | Brand Name As opposed to just using the keyword sans 'Shop.'  Some of the keywords are very generic, especially for a top level category page. Question: Is the word 'Shop' damaging my SEO efforts in any way?

                          Technical SEO | May 6, 2013, 4:22 PM | rhoadesjohn
                          0
                        • PooleyK

                          404 error - but I can't find any broken links on the referrer pages

                          Hi, My crawl has diagnosed a client's site with eight 404 errors. In my CSV download of the crawl, I have checked the source code of the 'referrer' pages, but can't find where the link to the 404 error page is. Could there be another reason for getting 404 errors? Thanks for your help. Katharine.

                          Technical SEO | Nov 17, 2012, 2:22 AM | PooleyK
                          0
                        • shanky1

                          Structuring URL's for better SEO

                          Hello, We were rolling our fresh urls for our new service website. Currently we have our structure as www.practo.com/health/dental/clinic/bangalore We like to have it as www.practo.com/health/dental-clinic-bangalore Can someone advice us better which one of the above structure would work out better and why? Should this be a focus of attention while going ahead since this is like a search engine platform for patients looking out for actual doctors. Thanks, Aditya

                          Technical SEO | Oct 15, 2012, 7:21 AM | shanky1
                          0
                        • kevin4803

                          Best XML Sitemap generator

                          Do you guys have any suggestions  on  a good XML Sitemaps generator? hopefully free, but if it's good i'd consider paying I am using a MAC so would prefer  a online or mac version

                          Technical SEO | Mar 15, 2012, 11:05 PM | kevin4803
                          0
                        • wparlaman

                          What's the difference between a category page and a content page

                          Hello, Little confused on this matter. From a website architectural and content stand point, what is the difference between a category page and a content page? So lets say I was going to build a website around tea. My home page would be about tea. My category pages would be: White Tea, Black Tea, Oolong Team and British Tea correct? (  I Would write content for each of these topics on their respective category pages correct?) Then suppose I wrote articles on organic white tea, white tea recipes, how to brew white team etc...( Are these content pages?) Do I think link FROM my category page ( White Tea) to my ( Content pages ie; Organic White Tea, white tea receipes etc) or do I link from my content page to my category page? I hope this makes sense. Thanks, Bill

                          Technical SEO | May 22, 2011, 5:03 PM | wparlaman
                          0

                        Get started with Moz Pro!

                        Unlock the power of advanced SEO tools and data-driven insights.

                        Start my free trial
                        Products
                        • Moz Pro
                        • Moz Local
                        • Moz API
                        • Moz Data
                        • STAT
                        • Product Updates
                        Moz Solutions
                        • SMB Solutions
                        • Agency Solutions
                        • Enterprise Solutions
                        Free SEO Tools
                        • Domain Authority Checker
                        • Link Explorer
                        • Keyword Explorer
                        • Competitive Research
                        • Brand Authority Checker
                        • Local Citation Checker
                        • MozBar Extension
                        • MozCast
                        Resources
                        • Blog
                        • SEO Learning Center
                        • Help Hub
                        • Beginner's Guide to SEO
                        • How-to Guides
                        • Moz Academy
                        • API Docs
                        About Moz
                        • About
                        • Team
                        • Careers
                        • Contact
                        Why Moz
                        • Case Studies
                        • Testimonials
                        Get Involved
                        • Become an Affiliate
                        • MozCon
                        • Webinars
                        • Practical Marketer Series
                        • MozPod
                        Connect with us

                        Contact the Help team

                        Join our newsletter
                        Moz logo
                        © 2021 - 2025 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                        • Accessibility
                        • Terms of Use
                        • Privacy

                        Looks like your connection to Moz was lost, please wait while we try to reconnect.