Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Using # in parameters?
I am trying to understand why a website would use # instead of a ? for its parameters? I have put an example of the URL below: http://www.warehousestationery.co.nz/office-supplies/adhesives-tapes-and-fastenings#prefn1=brand&prefn2=colour&prefv1=Command&prefv2=Clear Any help would be much appreciated.
Technical SEO | | CaitlinDW1 -
How can you best use additional domains with important keywords
Currently I have a corporate website that is ranking all right. However, I have some additional domains containing import search terms that I would like to use to get higher rankings for the corporate website, or allow these domains to generate more traffic for the corporate website. What are best practice in using these domains with keyword terms, to make most use of them, for ideally both ranking as well as generating additional traffic. All input is highly appreciated.
Technical SEO | | moojoo0 -
Do I have a problem with missing pages in Screaming Frog?
We have category pages and some of those pages have pagination due to us having additional items. Screaming Frog could not find the items that were after page 1. Is this a problem for Google? These item pages are still in the sitemap. I am sure they can find them to index them but does it hurt rankings at all.
Technical SEO | | EcommerceSite0 -
Find all links in the site and anchor text
Hi, Find all links in the site and anchor text and i need this done on my own website so i know if we dont have links that are anchored to numbers and punctuations that are not seen at all. Thanks
Technical SEO | | mtthompsons0 -
What damage can internal duplicated hidden links do to rankings?
Hi, I have a rental website, www.akilar.com, for Spain. My question is, on the home page we have links to the seperate regions of the country. Somehow in the redesign of the site, these links have been placed on every page of the site and hidden in the code at the top. The links are there as well on each page in the header, these are additional. The page quantity is over 2000 pages. Also this is taking the internal links well over the limit. In anyones opinion what damage has this caused as our rankings of late have fallen. Thanks very much for your help!
Technical SEO | | AkilarOffice0 -
Effective use of hReview
Hi fellow Mozzers! I am just in the process of adding various reviews to our site (a design agency), but I wanted to use the ratings in different ways depending on the page. So for the home page and the services (branding, POS, direct mail etc) I wanted to aggregate relevant reviews (giving us an average of all reviews for the home page, an average of ratings from all brand projects and so on). Then, I wanted to put specific reviews on our portfolio pages, so the review relates specifically to that project. This is the easiest to do as the hReview generator is geared up for reviews that come from one source, but I can't find a way of aggregating the star ratings to make an average rating rich snippet. Anyone know where I can get the coding for this? Thanks in advance! Nick.
Technical SEO | | themegroup0 -
How much effect does number of outbound links have on link juice?
I am interested in your thoughts on the effect of number of outbound links (obls) on link juice passed? ie If a page linking to you has a high number of obls, how do you compute the effect of these obls and relative negative effect on linkjuice. In the event that there are three sites on which you have been offered the opportunity of a link Site A PA 30 DA50 Obls on page 10 Site B PA 40 DA50 Obls on page 15 Site C PA 50 DA50 Obls on page 20 How would you appraise each of these prospective page links (ignoring anchor text, relevancy, etc which will be constant) Is there a rule of thumb on how to compare the linkjuice passed from a site relative to its PA and the number of obls? Is it as simple as page with 10 obls passes 10x juice of page with 100 obls?
Technical SEO | | seanmccauley0