Anyways to pull anchor text?
-
Hi guys,
So basically i have a list of URLs/Domains and there backlinks (example: http://s29.postimg.org/ujxm0c4lj/screenshot_677.jpg) but i'm missing anchor text. Can anyone recommend any tools which can scan a backlink, locate the URL/Domain on the page and then pull the anchor text?
Cheers, Chris
<colgroup><col width="548"><col width="884"></colgroup>
| | | -
Hi Matt!
No i have not yet found a tool which can do this.
The _ScrapeBox Anchor Text plugin _CleverPhD mentioned can only do this for one domain at a time. I need it for multiple domains.
Any other suggestions?
-
Hi Jay! Did you get this worked out?
-
Thanks Jay. If I look on the backlinks side, they all seem to have the same subdomain in some form or another. You would just need to setup the regex in Screaming Frog to look for just that keyword in the subdomain so it should match all the variants of it.
That said, ignore everything I just posted. I was thinking earlier, "Surely there is scraper software out there that does this already." I did not take the time to look. Your mention of Scrapebox reminded me of that.
Scrapebox has a separate addon that does this
http://www.scrapebox.com/anchor-text-checker
The ScrapeBox Anchor Text Checker allows you to enter your domain and then load a list of URL’s that contain your backlink. It will scan all the URL’s containing your link and extract the anchor text used by the websites that link to you.
-
Basically want the anchor text, so I can easily identify the location of the link on the page without needing to view source and search for the URL.
This export is directly from: http://s29.postimg.org/ujxm0c4lj/screenshot_677.jpg
Scrapebox backlink checker which doesn't give you anchor text.
-
Ok. Can you be more specific on what you are trying to accomplish with this data? I think that may help my understanding of what you are trying to do.
-
Thanks CleverPhD, sorry should had mentioned i'm looking to do this for multiple domain names not just one. So the method you describe works great for a single domain.
-
Screaming Frog can do this with custom extraction and list mode. If I am reading your question correctly, you have a list of URLs and what pages on your site that they link to.
You would upload the list of URLs into Screaming Frog so it knows what pages to scan and run it in list mode
http://www.screamingfrog.co.uk/seo-spider/user-guide/configuration/#15
You would then use the custom extraction tool to grep for the ahref code that has a link to your domain
http://www.screamingfrog.co.uk/web-scraper/
You would need to plug in a regular expression to look for your domain (or versions of it) and then include the rest of the HTML tag that include the anchor text all the way through the ending .
You should then be able to import that data into a spreadsheet and use text to columns to split the anchor text into it's own column.
It is a little tricky as the regular expression may have to be tweaked depending on how other sites link to your site. Run the Frog on a test group of 10 or so to make sure it works. If you have a bunch of errors, take the error examples and tweak the regular expression based on those.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to Avoid Improper Special Characters from Being Pulled into SERPs
If you Google "Progressive careers," for the Progressive.com result you'll see a result where Google pulls in content from the page outside of the meta description (not uncommon) and also pulls in an in-page carrot that indicates a link. This character displays as a square on desktop / Android devices. The site includes this character as an accessibility best practice to indicate a link from a heading for screen readers. On iPhones, that square shows up as a soccer ball emoji even though the entity code is different than our character's entity code (our entity code= , soccer ball entity code is ⚽). Clearly, not the best experience. I know we cannot control what piece of the page is pulled in as the meta description, but does anyone have any tips to hide or help avoid pulling in that special character?
Intermediate & Advanced SEO | | P-C-A0 -
Alternative text vs Image title attribute
Hi there. Anyone know how important Image title attribute is ? Moz is bringing it up saying make sure my images are all title attributed as well as alternative texted but no wedding photographers mention it in SEO posts or blogs. Click on wedding photographers blogs and most don't seem to have it (Image title attribute) either but I don't want to miss anything out that I need to be doing as I am building a new site and if it needs to be done now I have the time to do it. Any advice is appreciated thank you people x
Intermediate & Advanced SEO | | Howelljones0 -
Google pulling brand snippets from only some of my pages. Different settings or are they just being selective?
Hi, Moz community For some of the category-pages, Google is showing some of the brands in the SERP, like this: http://www.screencast.com/t/62wldbwc
Intermediate & Advanced SEO | | Inevo
This is the page-url: https://www.gsport.no/sport/loep/lopeklaer/loepebukse For other category-pages that seemingly is built with similar code and settings, Google doesn't show brands in the snippet: http://www.screencast.com/t/zU9cg7odf
The page-url: https://www.gsport.no/sport/loep/lopeklaer/loepejakke This all begs the questions:
If the two pages contain the same code/html in terms of schema.org / rich snippets, why is Google choosing to display the brands in the SERP for only one of them? And is there something I can do in order to make them display the brands for all my pages? Thank you
Sigurd Bjurbeck, INEVO (digital agency)0 -
Low text-HTML ratios
Are low text-HTML ratios still a negative SEO ranking factor? Today I ran SEMRUSH site audit that showed 344 out of 345 pages on our website (www.nyc-officespace-leader.com) show an text-HTML ratio that ranges from 8% to 22%. This is characterized as a warning on SEMRUSH. This error did not exist in April when the last SEMRUSH audit was conducted. Is it worthwhile to try to externalize code in order to improve this ratio? Or to add text (major project on a site of this size)? These pages generally have 200-400 words of text. Certain URLs, for example www.nyc-officespace-leader.com/blog/nycofficespaceforlease more text, yet it still shows an text-HTML ratio of only 16%. We recently upgraded to the WordPress 4.2.1. Could this have bloated the code (CSS etcetera) to the detriment of the text-HTML ratio? If Google has become accustomed to more complex code, is this a ratio that I can ignore. Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
ALT Tag Labels that Use Near Duplicate Text-SEO No, No???
Greetings Moz Community: About 280 pages of my 650 page commercial real estate website are listing pages. Each listing page contains between two and five photos, each with a corresponding ALT tag. My developer has set up the labeling of the ALT tags in the following manner. I can create a label for the first photo, but each subsequent photo automatically gets the same label plus a number tagged to the ALT. Like this: alt="Flatiron Loft for Rent"
Intermediate & Advanced SEO | | Kingalan1
alt="Flatiron Loft for Rent - Photo 0"
alt="Flatiron Loft for Rent - Photo 1"
alt="Flatiron Loft for Rent - Photo 2"
alt="Flatiron Loft for Rent - Photo 3" Is this method neutral, positive or negative for SEO? I am concerned that this manner of labeling ALT tags might risk triggering a duplicate content penalty. In early July I migrated the site from Drupal to Wordpress. We changed the URL structure (adding a sub-directory) for the listings at that time. Google is refusing to index about 100 listing pages. Any chance the ALT tags are contributing to Google's reluctance to index the URLs? I might also add that images are hosted on Amazon's CDN. A sample listing URL is http://www.nyc-officespace-leader.com/listings/278-21st-street-flatiron-loft-for-rent
Note: (/listings/278) were added to the URL in July, representing the listing sub directory plus the listing number. I Look forward to hearing the opinion of the MOZ community!!! THANKS!!!
Alan1 -
Dynamically change anchor text and URLs remotely
Hey i'm looking to create a widget in javascript which i dynamically change the urls and anchor text which link the widget back to my site remotely (via php) once it spreads. I have heard peopled doing this before, but i can't seem to find a example. Does anyone know of any examples/widgets or anything which can do this?
Intermediate & Advanced SEO | | monster990 -
Internal Anchor Text - Partial or Exact Match Does It Matter?
When linking internally on an ecommerce site between pages and from a sitemap, is partial or exact match on the anchor text a significant factor? If it matters to Google, which is a better practice to use? I found plenty of info on external links, but precious little on internal links (which suggests it doesn't matter enough to worry about).
Intermediate & Advanced SEO | | AWCthreads0 -
Convert keyword rich PDFs to web pages (text & images)
SteriPEN is a portable water purifier that kills viruses, protozoa, e-coli, etc. Because of the technical and safety requirements nature of the product, our website has much documentation of testing, organisms affected, and more. These are in pdf form and can often be found through google search (and through links on specific pages). Because of the keyword-richness of these documents pertaining to microbes SteriPEN kills, etc. does it make sense to convert these pdf's into html text and images? Then I was thinking perhaps writing a blog post AND generating key links on important landing pages to these documents (as html). Removing pdfs may be harmful? Not a clue as to the cost/benefit.
Intermediate & Advanced SEO | | Timmmmy0