Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
New pages on my web site
I have created web sites that appear somewhere on Google in hardly any time at all, but I appear to have forgotten something or things are different for pages added recently to an existing website. I have added a page on a particular subject, optimized it using on page grader, so that I get an A, and a check mark for everything except H1 tags and rel=canonical which my web hosting provider does not support. I do have a check mark for accessible to search engines The page has the format http://www.domain.com/specific-keyword It is in the menu, so should have internal links to it, as I understand it. I have created a new site map, and submitted it in webmaster tools. Interestingly it says that of the 96 pages only 76 were indexed is this a clue? and why would they not index a page I have then shared the page on google plus, facebook, tumblr, pinterest and twitter and some others In OSE it comes up as domain authority 28 page authority 1, the social media shares do show up in metrics on the right but no links internal or external are shown, they do on other pages I created in the same way. Is it just a case of waiting or is their something I do to help thank you
Moz Pro | | singingtelegramsuk0 -
How to create effective Backlinks and promote very small sites?
I want to create effective backlinks and promote websites with having only 8-15 pages with very poor/basic content. Where client is unable to provide content, increase number of pages etc. So kindly suggest..
Moz Pro | | 1akal0 -
Where is the crawl test tool located in this new site?
Hi there, Where is the crawl test tool located in this new moz site? Formerly it was, http://pro.seomoz.org/tools/crawl-test Hope you can help. Thanks:)
Moz Pro | | steveovens1 -
Where do I post this list of hacked sites?
Hey guys, Fairly new to SEOmoz but loving it so far. I was working on a new clients site a noticed some spammy links added right before the tag. Used Open site explorer to list the domains linking to the url and found nearly 300 unsuspecting domains. Some like heartresearch.com.au which just drives me craaazy, I have already emailed them. Below is the list. http://www.opensiteexplorer.org/links.html?group=0&page=3&site=www.rhcie.com Short of emailing every single person can anyone suggest a forum or such that would be helpful for posting this information ? I know it's just a few links but it is frustrating to me and If I can do something about it I would like to. Thanks in advance. Jason
Moz Pro | | RedshiftWebDesign0 -
Is the on page optimization tool not working?
i received a grade f for one of my keywords/page. i corrected some of the points but when i tried to submit the form again, it doesn't check off those corrected items. is there something wrong with the tool right now? also, how does the tool work if i'm targeting 2 different keywords for one page? e.g. digital marketing philippines and digital marketing agency philippines I'm pretty sure one of the keywords will have problems with at least 3 critical and high importance on page factors (broad keyword usage in page title, exact keyword usage in page title, etc.) is there an effect if there's a critical factor left unchecked because using both keywords in the title might look redundant?
Moz Pro | | optimind0 -
Are the CSV downloads malformatted, when a comma appears in a URL?
Howdy folks, we've been a PRO member for about 24 hours now and I have to say we're loving it! One problem I am having with however is a CSV exported from our crawl diagnostics summary that I've downloaded. The CSV contains all the data fine, however I am having problems with it when a URL contains a comma. I am making a little tool to work with the CSVs we download and I can't parse it properly because there sometimes URLs contain commas and aren't quoted the same as other fields, such as meta_description_tag, are. Is there something simple I'm missing or is it something that can be fixed? Looking forward to learn more about the various tools. Thanks for the help.
Moz Pro | | Safelincs0 -
"no urls with duplicate content to report"
Hi there, i am trying to clean up some duplicate content issues on a website. The crawl diagnostics says that one of the pages has 8 other URLS with the same content. When i click on the number "8" to see the pages with duplicate content, i get to a page that says "no urls with duplicate content to report". Why is this happening? How do i fix it?
Moz Pro | | fourthdimensioninc0 -
Multi-languae site anda Campaigns
Hi, I need to optimize a multi-language site. It's an hotel chain website and has 4 languages. Each language version of the site must be optimize for a diferente Google Engine. The english version of the web must be optimized for Google United States, an so on. Do I need to create a new Campaign for each language? or can I use more than 4 Engines in one campaign. Thanks,
Moz Pro | | Dragut-Comunicacion0