Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
WWW used in research URL, or not to WWW
Long time user, infrequent poster.... thanks for taking my question... When I go to gather a series of data elements on a company's URL, the data changes (sometime dramatically) depending on whether the 'www.' is added to the URL & it seems related more to Page data than Domain. My question is about which data I should be using to assess the real strength of the site / page? Is there a 'best practice' question here, a personal preference or is there an actual difference in the performance of the www vs the non-www version? aquGYdz
Moz Pro | | SWGroves0 -
403 error for a member site
Perhaps a stupid question but SEOmoz registers 403 errors for pages behind a membersite (ie. they are restricted on purpose). Should I noindex these pages or just let SEOmoz register these "errors"?
Moz Pro | | Crunchii0 -
Why would my site return an error when using Open Site Explorer to crawl it?
I have built several new sites over the last few months for others, but recently built a new one for myself. I have gone through most of the checklists from this site to address on-page SEO, and now I am looking at link building. When using Open Site Explorer, I receive an error saying that no information about the URL is available, even when I add competitor sites. Wondering if this is a common issue and if there is a convenient remedy? thanks!
Moz Pro | | MindSpark0 -
Can we add sites to the crawl queue for OSE?
Is it possible to request that Open Site Explorer crawls a new URL on its next run? This tool is the first place I go to when working on a new site, and when there is "No Data Available" this is a little frustrating. I fully appreciate that this lack of data is usually a signal that the website is either very new or of low quality, however that if often the reason that I am brought in and would very much like to benchmark and provide initial analysis using this tool. It would make sense that OSE crawls the sites that Moz members are working on wouldnt it? Scott.
Moz Pro | | eseyo0 -
Competitive Link finder tool
I tried to use competitive link finder tool. it says that it will identify 10 most important links that my competitor get that I do not. The results that I get are inconsistent with linkscape tool. For example my competitor has a link on www lawyer-links dot info Google rank of that site is 1. Is that really one of their most important links?
Moz Pro | | SirMax0 -
I need a tool/tools to extract keywords from say 50 sites in one niche and then check the rank tor those sites
This is for telemarketing of seo services I want to have some insight into an industry before I call them could this be done with the adwords keyword tool api and then exported to excel. It would also be nice to have data on backlinks say from seo moz opensite explorer.. Its just that the research you do before you even call a potential client is so time consuming and you can never really check to see how they are ranking for there main keywords manually. We are trying to automate as much of this initial research as possible... Any Idea's Thanks
Moz Pro | | duncan2740 -
Does SEOmoz have a keyword visibility report / tool?
Know of a tool where I can show the overall success of a set of keywords, us vs. the competition? A visibility report where a #1 ranking is worth 30 points, #2 is worth 29, so on and so on down to #30 worth 1 point, outside of the top 30 is worth nothing. (something like this). Trying to show an overall visibility scorecard, not sure if I can do it here or with some other tool. Didn't see it on Raven. Thanks!
Moz Pro | | akim260 -
Ruling out subfolders in pro tool crawl
Is there a way to "rule out" a subfolder in the pro dashboard site crawl? We're working on a site that has 500,000+ pages in the forums, but its the CMS pages we're optimizing and don't want to spend the 10k limit on forum pages.
Moz Pro | | DeepRipples0