How to extract URLs from a site (without bringing the server down!)
-
Hi everybody.
One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.
However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.
Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!
-
Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):
http://www.screamingfrog.co.uk/seo-spider/
It's a good tool, and nice to have around, IMO.
-
Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?
-
why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv
-
Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?
-
-
Scrape Google?
-
Make your own scraper and keep the requests per second really low ?
-
Maybe the site has an automated sitemap somewhere ?
-
Google webmaster tools -> download "internal links" table
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301 for a Very Long URL
Hey gang, Thanks ahead of time for the help. I have a url somehow that is very very long: http://www.colbysphotography.com/wedding-caterers-knoxville-east-tennessee/Here is an extensive list of wedding venues in the Knoxville and East Tennessee region. If you find that any of these links are not working, that the venues are no longer in business, or have a suggestion for an additional venue (at no charge), please contact Colby. Colby's Photography works hard on keeping this list helpful. I have tried Yoast Premium on a wordpress site to redirect the url but it doesn't seem to keep. I've tried a few other redirect plugins with not help either. I would love some suggestions on this one! Colby
Technical SEO | | littlecolby0 -
Should I make a new URL just so it can include a target keyword, then 301 redirect the old URL?
This is for an ecommerce site, and the company I'm working with has started selling a new line of products they want to promote.Should I make a new URL just so it can include a target keyword, then 301 redirect the old URL? One of my concerns is losing a little bit of link value from redirecting. Thank you for reading!
Technical SEO | | DA20130 -
Structure of urls
**Hallo from Athens, Greece. We have to implement the following project and i need your help: ** We will build a company guide for the whole country and company local guides for each city for the same client. **Information of the country guide is the sum of information of local guides, so when a user is at the country guide he sees information from companies from all cities and when the user is at city guide he sees info only for the city. ** The problem is the structure of the url we should have. Should the page of presentation of each company should have structure as domain.gr/id/company? or city.domain.gr/id/company and the one to be canonical to the other? is this good for seo? Should both urls be included in the sitemap? Thank you
Technical SEO | | herculesopa0 -
How many pages should my site have?
Right now I think I only have 36. What is a good amount of pages to have? Any ideas on ways to add relevant pages to my site? I was thinking about starting a message board. Also, I have a free tech support chat room, and was thinking about posting the logs somewhere on the site. Does that sound like a good idea? Thanks.
Technical SEO | | eugenecomputergeeks0 -
Site not indexing correctly
I am trying to figure out what is going on with my site listings. Google is only displaying my title and url - no description. You can see it when you search for Franchises for Sale. The site is www.franchisesolutions.com. Why could this happen? Also I saw a big drop off in a handful of keyword rankings today. Could this be related?
Technical SEO | | franchisesolutions0 -
Drupal URL Aliases vs 301 Redirects + Do URL Aliases create duplicates?
Hi all! I have just begun work on a Drupal site which heavily uses the URL Aliases feature. I fear that it is creating duplicate links. For example:: we have http://www.URL.com/index.php and http://www.URL.com/ In addition we are about to switch a lot of links and want to keep the search engine benefit. Am I right in thinking URL aliases change the URL, while leaving the old URL live and without creating search engine friendly redirects such as 301s? Thanks for any help! Christian
Technical SEO | | ChristianMKTG0 -
HTML url extension
I've read some information about the extension of an url. But i couldn't find a clear answer. What is better for SEO, an extension with html or without? /make-money-online/how-to-make-a-million-dollars-in-1-year/ or /make-money-online/how-to-make-a-million-dollars-in-1-year.html/ Is there a difference between a normal website or a blog?
Technical SEO | | PlusPort0 -
What are the pros and cons of moving one site onto a subdomain of another site?
Two sites. One has weaker sales. What would the benefits and problems for SEO of moving the weak site from its own domain to a subdomain of the stronger site?
Technical SEO | | GriffinHansen0