Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is there a way for me to find out how a keyword would rank if it were on a specific site?
Is there a way for me to find out how a keyword would rank if it were on a specific site? For example, lets say that XYZ.com does not have the keyword "ABC". Is there a way for me to find out how the keyword "ABC" would rank if it were on XYZ.com?
Moz Pro | | TurboH0 -
Tool recommendation for Page Depth?
I'd like to crawl our ecommerce site to see how deep (clicks from home page) pages are. I want to verify that every category, sub-category, and product detail page is within three clicks of the home page for googlebot. Suggestions? Thanks!
Moz Pro | | Garmentory0 -
Duplicate URLs
A campaign that I ran said that my client's site had some 47,000+ duplicate pages and titles. I was wondering how I can possibly set that many 301 redirects, but a Moz help engineer said it has a lot to do with session IDs. See this set of duplicate URLs: http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring (clearly the main URL for the page)
Moz Pro | | AlanJacob
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring?PIPELINE_SESSION_ID=0ac00a2e0ad53eb90cb0b0304d178fc1
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring?PIPELINE_SESSION_ID=0ac3039d0ad4af2720b3ccd2238547ab
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring?PIPELINE_SESSION_ID=0ac071ed0ad4af292684b0746931158f To a crawler, that looks like 4 different pages, when it's clear that they're actually all different URLs for the same page. I was wondering if some of you, maybe with experience in site architecture, would have insight into how to address this issue? Thanks Alan0 -
Reliable Social Metrics Tool
Hi, Does anyone know of a reliable social metrics tool? So far I've tried Open Site Explorer, Tom Anthony's tools and SEO Quake With each one of them I get very, very different numbers. Cheers, Carlos
Moz Pro | | Carlos-R1 -
Is there a tool that tracks and records your links to your site
What I mean by this is we have a linkbuilder working for us and I'm looking for to record there progress with link building I've seen somthing in Majestic but is there one in SEOMOZ All teh best Steve
Moz Pro | | ibexinternet0 -
Why does site explorer keep crashing?
For around the last 24 hours, every time I run site explorer it gets to just over a quarter of the way through and then just stops. I've tried this with different sites and it makes no difference. Although this is the first time I've had this problem (I don't use the tool regularly myself) one of my colleagues says it often happens to her. Is this a problem SEOmoz can explain and/or fix?
Moz Pro | | Chuck-Boom0 -
Keyword ranking tool
Hello, What is best practice for knowing if a keyword is too difficult to try to rank for. i understand the keyword difficulty tool in seoMOZ, but am unsure of how it actually relates to if I should attempt to rank for the keyword. Is there a differential people use such as if a site has a Page Authority of 60 we will not try to rank for a keyword that has a keyword difficulty ranking of 40?
Moz Pro | | digitalops0 -
Where is the best place to add links on my site?
If I'd like to put links to other sites in my site, is it better to have a page named "Our Helpful Links" etc. instead of just adding them to the bottom of an existing page like I've seen on some sites? I'm asking because I'm wanting to make Google as happy as possible and still add them. Just in case it helps to look at the site yourself to give advise its; http://www.allstatetransmission.net If you see anything else there that I should work on feel free to be hard on it, I value any criticism. Thanks, Jeff
Moz Pro | | allstatetransmission0