How to extract URLs from a site (without bringing the server down!)
-
Hi everybody.
One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.
However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.
Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!
-
Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):
http://www.screamingfrog.co.uk/seo-spider/
It's a good tool, and nice to have around, IMO.
-
Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?
-
why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv
-
Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?
-
-
Scrape Google?
-
Make your own scraper and keep the requests per second really low ?
-
Maybe the site has an automated sitemap somewhere ?
-
Google webmaster tools -> download "internal links" table
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can you help by advising how to stop a URL from referring to another URL on my website please?
Stopping a redirect from one URL to another due to a 404 error? Referred URL which is (https://webwritinglab.com/know-exactly-what-your-ideal-clients-want-in-8-easy-steps/%5Bnull%20id=43484%5D) Referring URL (https://webwritinglab.com/know-exactly-what-your-ideal-clients-want-in-8-easy-steps/)
Technical SEO | | Nichole.wynter20200 -
SEO URLs: 1\. URLs in my language (Greek, Greeklish or English)? 2\. Αt the end it is good to put -> .html? What is the best way to get great ranking?
Hello all, I must put URLs in my language Greek, Greeklish or in English? And at the end of url it is good to put -> .html? For exampe www.test.com/test/test-test.html ? What is the best way to get great ranking? I am a new digital marketing manager and its my first time who works with a programmer who doesn't know. I need to know as soon as possible, because they want to be "on air" tomorrow! Thank you very much for your help! Regards, Marios
Technical SEO | | marioskal0 -
Some of my website urls are not getting indexed while checking (site: domain) in google
Some of my website urls are not getting indexed while checking (site: domain) in google
Technical SEO | | nlogix0 -
On-site adjustment opinions
Hi folks, I've got a fairly interesting scenario. I'm trying to rank this page (http://www.staysa.co.za/sa/1-2-0-0-1/East-London/accommodation) better for the term, "accommodation east london". The client isn't keen on making many changes and it was built horribly with ASP, half CMS, half not. I have made the following changes today: I introduced two paragraphs of text below the H1 tag. I changed "East London Bed and Breakfast", "East London Conference Venues", "East London Cottage / Chalet" to just "Bed and Breakfast", "Conference Venues", "Cottage / Chalet" as the continual key phrase duplication in my experience is a bad move. I've made a change to the title tag (this is a huge mission as it's not CMS controlled, so I had to teach myself some basic ASP to do so). Meta data.. nightmare to change unfortunately, at least not without rewriting part of the CMS. I'm wondering, are there any other on-site factors that I'm missing? I'm not a fan of site-wide links, so I don't want to put an exact match anchor text link from the sidebar/footer to the page, not unless someone can motivate why I should. Keen to hear everyone's opinions 🙂
Technical SEO | | ChristopherM0 -
How can you get the right site links for your site?
Hello all, I have been trying to get Google to list relevant site links for my site when you type in our brand name, Loco2 or for when Loco2 comes up in a search result. Different things come up when you search Loco2 and Loco 2. We would like site links to look like how they do when you search Loco 2. However Loco2 is our brand name, NOT Loco 2. Does anyone know why Google is doing this and whether we can influence results? We have done as much as possible via Google webmaster, in terms of specifying the links we DO NOT want Google to list for Loco2. However, when you search "Loco2", results only show simple site links. Ideally what we want is: Loco2 to be recognised as the brand NOT Loco 2 The same results (substantial, identical) for Loco2 as for Loco 2 (think o2 and o 2) For the site links to reflect the main pages of our site (Times & Tickets, Engine Room forum etc.) Many thanks in advance! Anila
Technical SEO | | anilababla0 -
Keyword and URL
I have a client who has a popular name (like 'Joe Smith'). His blog URL has only his first name and the name of his company in it, like joe.company.com. His blog doesn't rank well at all in the first 3-4 Google SERPs. I was thinking of advising him to change the URL of his blog to joesmith.company.com, and having his webmaster do 301 redirects from the old URL to the new one. Do you think this is a good strategy, or would you recommend something else? I realize ranking isn't just about the URL, it's about links, etc. But I think making his URL more specific to his name could help. Any advice greatly appreciated! Jim
Technical SEO | | JamesAMartin0 -
How to do a no follow on site search
We have a site search that is causing a huge amount of errors as the SEOmoz crawler is showing these as duplicate content. Our first thought was to do a no-follow on the site-search directory, but we realized that the site search is /site-search.aspx and URl strings appear at the end for hundreds of pages. How dow we/how can we no-follow an undetermined amount of URL strings?
Technical SEO | | Apptixweb0 -
Partial Site Move -- Tell Google Entire Site Moved?
OK this one's a little confusing, please try to follow along. We recently went through a rebranding where we brought a new domain online for one of our brands (we'll call this domain 'B' -- it's also not the site linked to in my profile, not to confuse things). This brand accounted for 90% of the pages and 90% of the e-comm on the existing domain (we'll call the existing domain 'A') . 'A' was also redesigned and it's URL structure has changed. We have 301s in place on A that redirect to B for those 90% of pages and we also have internal 301s on A for the remaining 10% of pages whose URL has changed as a result of the A redesign What I'm wondering is if I should tell Google through webmaster tools that 'A' is now 'B' through the 'Change of Address' form. If I do this, will the existing products that remain on A suffer? I suppose I could just 301 the 10% of URLs on B back to A but I'm wondering if Google would see that as a loop since I just got done telling it that A is now B. I realize there probably isn't a perfect answer here but I'm looking for the "least worst" solution. I also realize that it's not optimal that we moved 90% of the pages from A to B, but it's the situation we're in.
Technical SEO | | badgerdigital0