How to extract URLs from a site (without bringing the server down!)
-
Hi everybody.
One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.
However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.
Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!
-
Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):
http://www.screamingfrog.co.uk/seo-spider/
It's a good tool, and nice to have around, IMO.
-
Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?
-
why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv
-
Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?
-
-
Scrape Google?
-
Make your own scraper and keep the requests per second really low ?
-
Maybe the site has an automated sitemap somewhere ?
-
Google webmaster tools -> download "internal links" table
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My site Metrics are not as per they should
Hi, I am regularly making links on my site to improve its metrics but i am confused how other people fastly improve their DA/PA and my DA/PA is not improving with that site. The same happened with spam score. It has been a month i disavow my links having spam score but instead of decrease in it, my spam score increased. Please advice. Is there any special way to use that help moz crawler to check site and update accordingly? Please help
Technical SEO | | AzadSeo37310 -
Changed URL of all web pages to a new updated one - Keywords still pick the old URL
A month ago we updated our website and with that we created new URLs for each page. Under "On-Page", the keywords we put to check ranking on are still giving information on the old urls of our websites. Slowly, some new URLs are popping up. I'm wondering if there's a way I can manually make the keywords feedback information from the new urls.
Technical SEO | | Champions0 -
Cross links between sites
hi, We have several ecommerce sites and we cross linked 3 of them by mistake. We realize that the sites were linked through WMT, We have shut down 2 of the sites about 2 months ago, but WMT still shows the links coming from those 2 sites. how do we make sure that google will see the sites are shut down. Is there a better of way resolving this issue. We are no longer using those sites, so do not need them to be active. whats the best solution to show google that the links are no longer there. Crawler shows that it was able to crawl the site 45 days after it is shut down. thanks nick
Technical SEO | | orion680 -
Two sites
Hi there just joined had nightmere of a time trying to get a website up and running..... now i have 2 .... one marketing person did and one i did the one i did performing better on google but other onre looks more profetional is there a way i can conbine the 2 under one site..... the one that looks better and getting the benifit of the one thats performing better...... Thanks steve......
Technical SEO | | stevetemple0 -
I am Posting an article on my site and another site has asked to use the same article - Is this a duplicate content issue with google if i am the creator of the content and will it penalize our sites - or one more than the other??
I operate an ecommerce site for outdoor gear and was invited to guest post on a popular blog (not my site) for a trip i had been on. I wrote the aritcle for them and i also will post this same article on my website. Is this a dup content problem with google? and or the other site? Any Help. Also if i wanted to post this same article to 1 or 2 other blogs as long as they link back to me as the author of the article
Technical SEO | | isle_surf0 -
Google has not been visiting my site
Hi I am working on a site at the moment http://www.cheapflightsgatwick.com and i had the site using a different template and in the search engines for the search term cheap flights gatwick we were fourth and for the term holiday magazine we were 12th in google but now we are not even in google on the first page for the search terms. But now after changing the template in joomla our rankings have gone out of the window. It took me about a day to sort out the site with the new template so i was not expecting any problems with the search engines but for some reason there is. If you put into the search engine www.cheapflightsgatwick.com then you will see that google has not visited the site for four days and also it is not showing the description and instead it is showing details about joomla. Can anyone let me know if there is anything i need to do to sort this out and why google is taking so long to visit my site
Technical SEO | | ClaireH-1848860 -
Dynamic Parameters in URL
I have received lots of warnings because of long urls. Most of them are because my website has many Attributes to FILTER out products. And each time the user clicks on one, its added to the URL. pls see my site here: www.theprinterdepo.com The warning is here: Although search engines can crawl dynamic URLs, search engine representatives have warned against using over 2 parameters in any given URL. The question to the community is: -What should I do? These attributes really help the user to find easier the products. I could remove some of the attributes, I am not sure if my ecommerce solution (MAGENTO), allows to change the behavior of this so that this does not use querystring parameters.
Technical SEO | | levalencia10 -
What are the pros and cons of moving one site onto a subdomain of another site?
Two sites. One has weaker sales. What would the benefits and problems for SEO of moving the weak site from its own domain to a subdomain of the stronger site?
Technical SEO | | GriffinHansen0