How to extract URLs from a site (without bringing the server down!)
-
Hi everybody.
One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.
However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.
Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!
-
Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):
http://www.screamingfrog.co.uk/seo-spider/
It's a good tool, and nice to have around, IMO.
-
Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?
-
why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv
-
Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?
-
-
Scrape Google?
-
Make your own scraper and keep the requests per second really low ?
-
Maybe the site has an automated sitemap somewhere ?
-
Google webmaster tools -> download "internal links" table
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I'm struggling to understand (and fix) why I'm getting a 404 error. The URL includes this "%5Bnull%20id=43484%5D" but I cannot find that anywhere in the referring URL. Does anyone know why please? Thanks
Can you help with how to fix this 404 error please? It appears that I have a redirect from one page to the other, although the referring page URL works, but it appears to be linking to another URL with this code at the end of the the URL - %5Bnull%20id=43484%5D that I'm struggling to find and fix. Thanks
Technical SEO | | Nichole.wynter20200 -
Doudle URLs without Canonical link and a change in keyword.: What are the effects on SEO?
I built my new website and i have two major worries. 1. My home page has two URLs. The one with a high PA though indexed by Google, is not submitted in the sitemap. I tried to place a canonical tag but the hosting service said it was impossible for me to place the canonical link. My concern is if the indexed page will be successfully optimized for SEO without it being submitted in the sitemap and what happens to the other URL for the same page which is also indexed and submitted in the sitemap? 2.I started my link building campaign for one of my pages. I acquired some good PA already for a particular keyword but later on discovered it will be very difficult for me to rank for the major keyword. I have decided to change the keyword. Will the acquired PA influence the SEO for the new keyword? I wish to know if i should dissolve the links to the page for the former keyword or should i maintain them and move forward with building links for the new keyword as well.
Technical SEO | | trevordocs0 -
Changing Urls
Hi All, I have a question I hope someone can help me with. I ran a scan on a website and it has a stack of urls that are far too long. I am going through and changing the urls to shorter ones. But my question is regarding redirections. Wordpress seems to be automatically redirecting the old urls to the new ones, should i be adding a more solid 301 in as well or is the wordpress redirect enough? I ask as they dont all seem to stay redirecting Thanks in advance for the help
Technical SEO | | DaleZon2 -
URL Redirect
Hi All, So we have employees who can own their own domains for business, however, one employee has a domain that links back to our main site, but when it does, the URL and Page title of our main site, still say his own domain. IE: www.johndoe.com links to www.mysite.com except the url and itle still say www.johndoe.com What are the implications of this? Thank you
Technical SEO | | PeteEllard0 -
PortfolioID urls appearing in my wordpress site- what to do?
Hey guys, Hoping someone may have some advice on a wordpress site. Most of their URL's are duplicates due to a PortfolioID appearing in the URLs causing a duplicate title tags
Technical SEO | | Swanny_s
It's the same page but it's being flagged as duplicate. Would you remove the portfolioID url or 301 redirect? Many thanks
Simon0 -
Keywords, when are you overdoing it in the URL?
Hi guys, I'm auditing a site covering compensation for cancer. Keywords could include: Undiagnosed cancer 20 cancer compensation 10 undiagnosed cancer symptoms 10 cancer misdiagnosis claims 20 cancer claims 10 misdiagnosis of cancer 50 cancer misdiagnosis 70 So, when structuring the URL for the category, this was previously selected: www.site.co.uk/medical-negligence/cancer-misdiagnosis Although sub-pages appear like this: www.site.co.uk/medical-negligence/cancer-misdiagnosis/breast-cancer-misdiagnosis-claim/ 'Cancer misdiagnosis' as a keyword attracts the most traffic, but if we're using it on sub-pages - is there a need to include it twice on all sub-page URLs? With that in mind, would it be better to follow the following format? www.site.co.uk/medical-negligence/cancer-compensation www.site.co.uk/medical-negligence/cancer-compensation/breast-cancer-misdiagnosis-claim/ Or is there a better way to structure this? Thanks in advance guys!
Technical SEO | | Muhammad-Isap0 -
Unique URLs for each local office 301 to parent site
I have a dentist who has multiple locations and a unique domain for each location. www.DentalCareofLacey.com www.DentalCareofSumner.com www.DentalCareofVashon.com The current plan is to setup 301s to redirect to the parent site (www.atlasdentistry.com/locations/lacey). Would there be any negative impact if we continue to purchase a unique domain for each branch office and just set it as a 301 to the parent site? Does having too many 301 redirects look too spammy to Google?
Technical SEO | | marlattts0 -
Site problem
I moved a site earlier on in the year to a better server www.keyrs.co.uk, my main keywords being equity release - equity release calculator and equity release schemes. Since this happened the ranking have gone down and the schemes and calculator terms and have hit positions 7-8 when they were 2-3. basically my question is open to all, i am looking to see what the problem is with these pages as it is driving me nuts. All tools on SEO moz show the pages are doing well, however i must be missing something. Mike
Technical SEO | | TomBarker820