Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to extract URLs from a site (without bringing the server down!)
-
Hi everybody.
One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.
However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.
Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!
-
Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):
http://www.screamingfrog.co.uk/seo-spider/
It's a good tool, and nice to have around, IMO.
-
Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?
-
why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv
-
Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?
-
-
Scrape Google?
-
Make your own scraper and keep the requests per second really low ?
-
Maybe the site has an automated sitemap somewhere ?
-
Google webmaster tools -> download "internal links" table
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Our clients Magento 2 site has lots of obsolete categories. Advice on SEO best practice for setting server level redirects so I can delete them?
Our client's Magento website has been running for at least a decade, so has a lot of old legacy categories for Brands they no longer carry. We're looking to trim down the amount of unnecessary URL Redirects in Magento, so my question is: Is there a way that is SEO efficient to setup permanent redirects at a server level (nginx) that Google will crawl to allow us at some point to delete the categories and Magento URL Redirects? If this is a good practice can you at some point then delete the server redirects as google has marked them as permanent?
Technical SEO | Sep 23, 2023, 10:06 PM | Breemcc0 -
Can anyone tell me why some of the top referrers to my site are porn site?
We noticed today that 4 of the top referring sites are actually porn sites. Does anyone know what that is all about? Thanks!
Technical SEO | May 7, 2015, 6:57 AM | thinkcreativegroup1 -
Does my "spam" site affect my other sites on the same IP?
I have a link directory called Liberty Resource Directory. It's the main site on my dedicated IP, all my other sites are Addon domains on top of it. While exploring the new MOZ spam ranking I saw that LRD (Liberty Resource Directory) has a spam score of 9/17 and that Google penalizes 71% of sites with a similar score. Fair enough, thin content, bunch of follow links (there's over 2,000 links by now), no problem. That site isn't for Google, it's for me. Question, does that site (and linking to my own sites on it) negatively affect my other sites on the same IP? If so, by how much? Does a simple noindex fix that potential issues? Bonus: How does one go about going through hundreds of pages with thousands of links, built with raw, plain text HTML to change things to nofollow? =/
Technical SEO | Mar 31, 2015, 11:06 AM | eglove0 -
Url folder structure
I work for a travel site and we have pages for properties in destinations and am trying to decide how best to organize the URLs basically we have our main domain, resort pages and we'll also have articles about each resort so the URL structure will actually get longer:
Technical SEO | Mar 29, 2015, 1:05 PM | Vacatia_SEO
A. domain.com/main-keyword/state/city-region/resort-name
_ domain.com/family-condo-for-rent/orlando-florida/liki-tiki-village_ _ domain.com/main-keyword-in-state-city/resort-name-feature _
_ domain.com/family-condo-for-rent/orlando-florida/liki-tiki-village/kid-friend-pool_ B. Another way to structure would be to remove the location and keyword folders and combine. Note that some of the resort names are long and spaces are being replaced dynamically with dashes.
ex. domain.com/main-keyword-in-state-city/resort-name
_ domain.com/family-condo-for-rent-in-orlando-florida/liki-tiki-village_ _ domain.com/main-keyword-in-state-city/resort-name-feature_
_ domain.com/family-condo-for-rent-in-orlando-florida/liki-tiki-village-kid-friend-pool_ Question: is that too many folders or should i combine or break up? What would you do with this? Trying to avoid too many dashes.0 -
Mobile site ranking instead of/as well as desktop site in desktop SERPS
I have just noticed that the mobile version of my site is sometimes ranking in the desktop serps either instead of as well as the desktop site. It is not something that I have noticed in the past as it doesn't happen with the keywords that I track, which are highly competitive. It is happening for results that include our brand name, e.g '[brand name][search term]'. The mobile site is served with mobile optimised content from another URL. e.g wwww.domain.com/productpage redirects to m.domain.com/productpage for mobile. Sometimes I am only seen the mobile URL in the desktop SERPS, other times I am seeing both the desktop and mobile URL for the same product. My understanding is that the mobile URL should not be ranking at all in desktop SERPS, could we be being penalised for either bad redirects or duplicate content? Any ideas as to how I could further diagnose and solve the problem if you do believe that it could be harming rankings?
Technical SEO | Aug 27, 2013, 6:26 AM | pugh0 -
What is the best way to find missing alt tags on my site (site wide - not page by page)?
I am looking to find all the missing alt tags on my site at once. I have a FF extension that use to do it page by page, but my site is huge and that will take forever. Thanks!!
Technical SEO | Nov 12, 2015, 4:17 PM | franchisesolutions1 -
Trailing Slashes In Url use Canonical Url or 301 Redirect?
I was thinking of using 301 redirects for trailing slahes to no trailing slashes for my urls. EG: www.url.com/page1/ 301 redirect to www.url.com/page1 Already got a redirect for non-www to www already. Just wondering in my case would it be best to continue using htacces for the trailing slash redirect or just go with Canonical URLs?
Technical SEO | Nov 27, 2011, 12:19 PM | upick-1623910 -
What are the pros and cons of moving one site onto a subdomain of another site?
Two sites. One has weaker sales. What would the benefits and problems for SEO of moving the weak site from its own domain to a subdomain of the stronger site?
Technical SEO | Apr 14, 2011, 6:01 PM | GriffinHansen0