How to extract URLs from a site (without bringing the server down!)
-
Hi everybody.
One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.
However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.
Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!
-
Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):
http://www.screamingfrog.co.uk/seo-spider/
It's a good tool, and nice to have around, IMO.
-
Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?
-
why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv
-
Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?
-
-
Scrape Google?
-
Make your own scraper and keep the requests per second really low ?
-
Maybe the site has an automated sitemap somewhere ?
-
Google webmaster tools -> download "internal links" table
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Changing site URL structure
Hey everybody, I'm looking for a bit of advice. A few weeks ago Google sent me an email saying all pages with any text input on them need to switch to https for those pages. This is no problem, I was slowly switching the site to https anyway using a 301 redirect. However, my site also has a language subfolder in the url, mysite.com/en/ mysite.com/ru/ etc. Due to poor work on my part the translations of the site haven't been updated in a long time and lots of the pages are in english even on the russian version etc. So I'm thinking of just removing this url structure and just having mysite.com My plan is to 301 all requests to https and remove the language subfolder in the url at the same time. So far the https switching hasn't changed my rankings. Am I more at risk of losing my rankings by doing this? Thanks!
Technical SEO | | Ruhol0 -
Stuck with canonical URL - main site vs categorys?
Hello, I started to doubt myself. We have a classified advertisements website. On the main www.website.com page, almost all the advertisements are shown. Now we take those advertisements and also split them into categorys Category 1 / category 2 / category 3 / category 4 Now all those categories almost always have the same content as www.website.com except a bit less (because X amount of content is now divided also to 4-5 groups) For raking should i actually tell google that those categories are a copy of www.website.com or they should still be as they are?
Technical SEO | | advertisingcloud0 -
Site hacked in Jan. Redeveloped new site. Still not ranking. Should we change domain?
Our top ranking site in the UK was hacked at the end of 2014. http://www.ultimatefloorsanding.co.uk/ The site was the subject of a manual spam action from Google. After several unsuccessful attempts to clean it up, using Securi.net and reinstating old versions of the site, changing passwords etc. we took the decision to redevelop the site. We also changed hosting provider as we had received absolutely no support from them whatsoever in resolving the issue. So far we have: Removed the old website files off the server Developed a new website having implemented 301's for all the old URL's (except the spam ones) Submitted a reconsideration request for the manual spam action, which was accepted. Disavowed all the spammy inbound links through Webmaster Tools Implemented custom URL parameters through Google to not index the SPAM URLs ( which were using parameters) Our organic traffic is down by 63% compared to last year, and we are not ranking for most of our target keywords any longer. Is there anything that I am missing in the actions I have taken so far? We were advised that at this stage changing domain and starting again might be the way to go. However the current domain has been used by us since 2007, so it would be a big call. Any advice is appreciated, thanks. Sue - http://www.ultimatefloorsanding.co.uk/
Technical SEO | | galwaygirl0 -
Numbers in URL
Hey guys! Need your many awesome brains. 🙂 This may be a very basic question but am hoping you can help me out with some insights beyond "because Google says it's better". 🙂 I only recently started working with SEO, and I work for a SaaS website builder company that has millions of open/active user sites, and all our user sites URLs, instead of www.mydomainname.com/gallery or myusername.simplesite.com/about, we use numbers, so www.mysite.com/453112 or myusername.simplesite.com/426521 The Sales manager has asked me to figure out if it will pay off for us in terms of traffic (other benefits?) to change it from the number system to the "proper" and right way of setting up these URLs. He's looking for rather concrete answers, as he usually sits with paid search and is therefore used to the mindset of "if we do x it will yield us y in z months". I'm finding it quite difficult to find case studies/other concrete examples beyond the generic, vague implication that it will simply be "better" (when for example looking at SEO checklists and search engine guidelines). Will it make a difference? How so? I have to convince our developers of the importance and priority of this adjustment, or it will just drown in the many projects they already have. So truly, any insights would be so very welcome. Thank you!
Technical SEO | | michelledemaree2 -
Our sites have a high number of long urls. how does this affected ranking
Hi, A few of the sights in our networks have a high number of urls. How does this affect our rankings Thanks in advance for your help
Technical SEO | | Feily0 -
Wordpress site, combine Blog without hurting SEO - Need Expert Advice
Hi, I come from the old html days of Frontpage and then moved to Dreamweaver. I first worked with Wordpress at version 2.7 and was not all that impressed, but then recently I worked in the new version and was extremely impressed. So my knowledge of Wordpress is VERY limited and plan to build future sites with it. I need to know the best way to solve an issue for a customer. The client is http://www.nextgenrestoration.com/ Site was built years ago with Frontpage. The popularity of Blogs was hot so someone told them that if they add new content it would be better to use a blog, so they added a blog. So you have the following: www.nextgenrestoration.com (main site) then they installed wordpress in a folder (blog) www.nextgenrestoration.com/blog Original person that built the site quit. New person took over and said the main site needed to changed to Wordpress because they did not have Frontpage and all they knew was Wordpress. Main site was converted to Wordpress. They wanted to keep the original design so they did not use a stock template, they just built it with their design. I guess from looking at the Editor, they manually went in and put the design in to match. Now.. this last month, the person that had changed
Technical SEO | | Force7
the site to Wordpress quit. So I got involved because the new person they hired could not add content to the main website. If you add a page, it does not show up, you have to manually go in the php and add the link to the category. The new person knows how to use Wordpress but she knows nothing about PHP so is lost when it comes to manually adding content to the site. Here was my Thoughts. The main site needs to be rebuilt in a stock template so it automatically creates new pages, blog posts. I have to make sure that if we change the
main website that we could keep all the same links and page names. The girl
that built the site, if you hover over the links that she put it under ‘florida’,
that must be a category. But we would need to keep the same page names. I know
we could do a 301 redirect but this guy cannot lose traffic. He is already down
in hits after the last Panda update. My thought was, rebuild the main site in a stock template so
someone can actually add content easily to the site. Also build a new blog
section so it all matches. (personally the existing design looks old and dated and needs updating) If you look at the site now. The blog looks totally
different and it is not helping if a customer comes to the blog but cannot see
the navigation for the whole site. My thought was to just leave the old blog, it has a LOT of backlinks. But just add a new blog to the main site and all new content goes there. The old blog would stay just make sure we did build in some call to action so it sends them to the main site. Also, we found we cannot create a Blog on the
wordpress we have installed in the main directory. I am guessing because it
wants to name it /blog? I want to be sure we give this client the best advice on what to do without
hurting his existing seo and traffic. As you can tell, I am not qualified to really give the best advice since I am so new to Wordpress. This is a small company that really needs some help. Thanks in advance for your time! Force70 -
Will 301 redirecting a site multiple times still preserve the original site value?
Hi, All! If site www.abc.com was already 301 redirected to site www.def.com, and now the site owner wants to redirect www.def.com to www.ghi.com - is there any concern that it's not going to work, and some of the original linkjuice, rank, trust, etc. is going to vanish? Or as long as the 301s are set up right, should you be able to 301 indefinitely? Does anyone have any experience with actually doing this and seeing good/bad/neutral results? Thanks in advance! -Aviva B
Technical SEO | | debi_zyx0 -
Google.ca is showing our US site instead of our Canada Site
When our Canadian users who search on google.ca for our brand (e.g. Travelocity, Travelocity hotels, etc.), the first few results our from our US site (travelocity.com) rather than our Canadian site (travelocity.ca). In Google Webmaster Tools, we've adjusted the geotargeting settings to focus on the appropriate locale, but the wrong country TLD is still coming up at the top via google.ca. What's the best way to ensure our Canadian site comes up instead of the US site on google.ca? Thanks, Tory Smith
Technical SEO | | travelocitysearch
Travelocity0