How to extract URLs from a site (without bringing the server down!)
-
Hi everybody.
One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.
However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.
Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!
-
Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):
http://www.screamingfrog.co.uk/seo-spider/
It's a good tool, and nice to have around, IMO.
-
Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?
-
why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv
-
Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?
-
-
Scrape Google?
-
Make your own scraper and keep the requests per second really low ?
-
Maybe the site has an automated sitemap somewhere ?
-
Google webmaster tools -> download "internal links" table
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Our clients Magento 2 site has lots of obsolete categories. Advice on SEO best practice for setting server level redirects so I can delete them?
Our client's Magento website has been running for at least a decade, so has a lot of old legacy categories for Brands they no longer carry. We're looking to trim down the amount of unnecessary URL Redirects in Magento, so my question is: Is there a way that is SEO efficient to setup permanent redirects at a server level (nginx) that Google will crawl to allow us at some point to delete the categories and Magento URL Redirects? If this is a good practice can you at some point then delete the server redirects as google has marked them as permanent?
Technical SEO | | WillyGx0 -
SEO URLs: 1\. URLs in my language (Greek, Greeklish or English)? 2\. Αt the end it is good to put -> .html? What is the best way to get great ranking?
Hello all, I must put URLs in my language Greek, Greeklish or in English? And at the end of url it is good to put -> .html? For exampe www.test.com/test/test-test.html ? What is the best way to get great ranking? I am a new digital marketing manager and its my first time who works with a programmer who doesn't know. I need to know as soon as possible, because they want to be "on air" tomorrow! Thank you very much for your help! Regards, Marios
Technical SEO | | marioskal0 -
How to fix these unwanted URLs?
Right now i have wordpress, one page website, but google also show wp-content. KIndly check below in google. site:http://baltimoreelite.com/ How I can fix this issue?
Technical SEO | | marknorman0 -
I noticed all my SEOed sites are getting attacked constantly by viruses. I do wordpress sites. Does anyone have a good recommendation to protect my clients sites? thanks
We have tried all different kinds of security plugins but none seem to work long term.
Technical SEO | | Carla_Dawson0 -
Site Blacklisted
Good morning. Just done my WMT ritual morning check and one of my sites has been blacklisted for malware. It's a wordpress site - I've run various scans, e.g. http://sitecheck.sucuri.net/scanner/ and also installed wordfence and scanned with that and wordfence produced some offending files which I have now deleted. I've also installed website defender in the hope that it wont happen again. I'm pretty good with staying on top of updates and rarely let a few days pass without upgrading new version of wordpress or plugins etc. I've also checked my users to make sure no new admins or anything and also changes passwords. I've asked for a review from Google and just wondered how long these reviews take? Also, has anybody got any advice, is there anything else I should be doing? Thanks
Technical SEO | | littlesthobo0 -
Negative url name?
I have a new client who has the letters "BB" at the start of his url name, bbzautorepair.com. He was told by someone at Google Adwords that the letters "BB" in his url name could hurt him with Google rankings. Reason being that Google red flags anything or website to do with firearms, guns and ammunition. He was told that the letters "BB" could be mistaken or red flagged for "BB Gun". Seems a bit far fetched. Has anyone every heard of such a thing? Thanks
Technical SEO | | fun52dig
Gary Downey0 -
How can i redirect a url that has % in it?
Google webmaster tools shows a 400 eroor for an old link that contains a 30% off in it. The problem is the % I would like to 301 redirect this link : http://www.geographics.com/Graduation-Stationery,-35%-OFF-Printable-Certificates-Blank-Gift-Certificates/c1353_1354_1359/index.html to http://www.geographics.com/Graduation-Stationery-Printable-Certificates-Blank-Gift-Certificates/c1353_1354_1359/index.html We do not know how to do this in httaccess. Can you please advise? Thanks a lot! Madlena
Technical SEO | | Madlena0 -
How to turn WP site into Ecom site?
I have a couple of old wordpress sites that are old affiliate blogs. I currently sell products that are on amazon on these sites and sell quite a bit of volume. I have found a source and can afford the inventory to replace Amazon with my own product. So the dilema is how to turn these wordpress sites into ecommerce sites. The thing I am worried most about is that each site gets about 100-200 visitors a day for great buying keywords. I obviously don't want to lose my rankings. What are the options of turning a wordpress site into a store. I am not interested in plugins or some of the other solutions that make the store look very cheap and I would assume horribly convert. If you have inner pages ranking for keywords how does that work? Do the post pages become product pages? So to sum up I guess I am asking, what are the options that are of the higher quality that will also help me keep my rankings? Thanks
Technical SEO | | PEnterprises0