Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Forward slash on URL on Duplicate Content Report
Hi I'm new to this whole Moz thing, so needing help from some kind people! I've just looked at my Duplicate Page Content report and there are loads of URLs in there which are the same but are just differentiated by adding / at the end of the URL, e.g. http://youngepilepsy.org.uk/news-and-events/events http://youngepilepsy.org.uk/news-and-events/events/ Is this be a canonical issue? I can't understand why though as these aren't at the root. However when we add inline text links within the page HTML, there are some URLs with / and some without, could that be the reason? Thanks for your help! Jackie
Moz Pro | | YoungEpilepsy1 -
On Page Ranking Tool Giving Weird Reports
My on page ranking tool is giving two entirely different reports for my website - I get one report for my domain name and a different report for the index.html even though essentially both of those pages are...well...the same. Not sure what's going on, hoping it's not an indication of a more serious issue? I appreciate any help!
Moz Pro | | Virage0 -
Settings to crawl entire site
Not sure what happened but I started a third campaign yesterday and only 1 pages was crawled, The other two campaigns has 472 and 10K respectively. What is the proper setting to choose in the beginning of campaign setup to have the entire site crawled. Not sure what I did different and I must be reading the instructions incorrectly. Thanks, Don
Moz Pro | | NicheGuy210 -
So many problems with my site.
Hi all.I was shocked when when i run a campaign and the warnings and recommendations about my site are so many.I know nothing about web design and the person who design it is asking me what are these problems and where did i get all these? any solution this are the problems 1.5XX (Server Error)
Moz Pro | | jubba
2.Duplicate Page Content(875)
3.Duplicate Page Title(875)
4.Overly-Dynamic URL(1048)
5.Too Many On-Page Links(60) and this is just a few of the problems.0 -
Crawl reports urls with duplicate content but its not the case
Hi guys!
Moz Pro | | MakMour
Some hours ago I received my crawl report. I noticed several records with urls with duplicate content so I went to open those urls one by one.
Not one of those urls were really with duplicate content but I have a concern because website is about product showcase and many articles are just images with href behind them. Many of those articles are using the same images so maybe thats why the seomoz crawler duplicate content flag is raised. I wonder if Google has problem with that too. See for yourself how it looks like: http://by.vg/NJ97y
http://by.vg/BQypE Those two url's are flagged as duplicates...please mind the language(Greek) and try to focus on the urls and content. ps: my example is simplified just for the purpose of my question. <colgroup><col width="3436"></colgroup>
| URLs with Duplicate Page Content (up to 5) |0 -
Could SEOMOZ make a tool thats connected directly to Webmaster Tools?
Could SEOMOZ make a tool that Crawls Internal Links that's connected (or crawls) directly from Webmaster Tools? (So link metrics on our domains would get updates more often) This would eliminate the cost of crawling the entire internet. I hate having to wait a month to see updates on my domain metrics and links. I would like to see updates regularly on my own domain. I understand you can just look at Webmaster tools to see your latest links but they don't have any link metrics. So it’s difficult to see your best links. If this already exists please refer me? Otherwise please show your support for this being developed?
Moz Pro | | charles10 -
On-Page URL
Hopefully I am missing something basic... I can't see how to specifically add and delete On-Page reports. It seems like running a report adds it but how to delete? Also, how does one change the URL for a report? I have re-organized some pages and can't seem the get the on-page report to keep my URL change. Here is what I tried. From the On-Page report card for a keyword I changed the URL and ran the test. Test runs ok but if I navigate back to the summary my old bad URL is still there.
Moz Pro | | Banknotes0 -
Site rank checking tool
Is there a tool where I can enter a URL and it will tell me all the terms a site ranks for? Basically I recently put up a new website and I want to know what terms I rank in the top 50 for in Google
Moz Pro | | KevinPatrick0