Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Getting a URL Unaccessible on the page grader
I'm optimizing a site for a financial advisor, here is the site: http://www.mattkeenancfp.com I am getting the message "that URL is unaccessible" when I try to use the on-page grader. This is an emerald website too, I'm not sure if that has any effect on anything though.
Moz Pro | | ryanbilak0 -
How do you check the outbound links of a site?
There are great tools like http://www.opensiteexplorer.org that will tell you all about the inbound links. What about the more basic and easier question: What outgoing links does this site have?
Moz Pro | | SkinLaboratory2 -
Open Site Explorer Video Broken?
Hey, I went to watch this Open Site Explorer video, and found that I can't see any video - but I can hear the audio? http://www.seomoz.org/webinars/using-open-site-explorer-to-uncover-new-marketing-opportunities Wondering what's going on. Can anyone help? Thanks a bunch
Moz Pro | | naturalsociety0 -
Site Explorer - No Data Available for this URL
Hi All I have just joined on the trial offer, im not sure if i can afford the monthly payments, but im hoping SEOmoz will show me that i also cannot afford to be without it! In my proses of learning this site and flicking through each section to see what things do. However when i enter my URL into Site Explorer i get the following message "No Data Available for this URL" My site should be crawl-able, so how do i get to see data for my site/s. I wont post my URL here, as the site has a slightly adult theme.
Moz Pro | | jonny512379
If anyone could confirm if i can post "slightly adult" sites. Best Regards
Jon0 -
SEOMOZ Crawling Our Site
Hi there, We get a report from SEOMOZ every week which shows our performance within search. I noticed for our website www.unifor.com.au that it looks through over 10,000 pages, however our website sells less than 500 products so not sure why or how so many pages are trawled? If someone could let me know that would be great. It uses up a lot of bandwidth doing each of these searches so if the amount of pages being trawled reduced it would definitely assist. Thanks, Geoff
Moz Pro | | BeerCartel750 -
Truncate page URLs
We have some pages (for example a contact us form) for which the URL is modified by the CMS depending on the referring page (this helps to put the form submission in context for the sales reps who get the contact submission). The SEOmoz crawler considers each URL a new page -- and so numbers like in diagnostics are all inflated as the same page is listed multiple times (e.g. for too many links) Is there a setting to change what the crawler considers to be the same page? Here are two URLs for the same page that the reports treat as separate pages: http://www.spirent.com/About-Us/Contact_us.aspx?referurl=0F528F4D703D8BB3523738D6373AA8AD http://www.spirent.com/About-Us/Contact_us.aspx?referurl=10ACDA6055244E369395223437FDCF30 The page is actually: http://www.spirent.com/About-Us/Contact_us.aspx Thanks Ken
Moz Pro | | spirent.marcom0 -
The Keyword Difficulty Tool is not working,
I have been trying to so some keyword research for the past few hours, but really couldn't research even a single keyword. Is the Tool working??
Moz Pro | | vickygoal0 -
Keyword tool results - broad match or exact?
Do the SEOmoz keyword rankings display results for exact match or broad match?
Moz Pro | | ClaytonKendall0