Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should we use Cloudflare
Hi all, we want to speed up our website (hosted in Wordpress, traffic around 450,000 page views monthly), we use lots of images. And we're wondering about setting up on Cloudflare, however after searching a bit in Google I have seen some people say the change in IP, or possible sharing of Its with bad neighbourhoods, can really hit search rankings. So, I was wondering what the latest thinking is on this subject, would the increased speed and local server locations be a boost for SEO, moreso than a potential loss of rankings for changing IP? Thanks!
Technical SEO | Jun 3, 2020, 2:20 AM | tiromedia1 -
Can you use multiple videos without sacrificing load times?
We're using a lot of videos on our new website (www.4com.co.uk), but our immediate discovery has been that this has a negative impact on load times. We use a third party (Vidyard) to host our videos but we also tried YouTube and didn't see any difference. I was wondering if there's a way of using multiple videos without seeing this load speed issue or whether we just need to go with a different approach. Thanks all, appreciate any guidance! Matt
Technical SEO | Aug 23, 2016, 6:39 AM | MattWatts1 -
Canonical issues using Screaming Frog and other tools?
In the Directives tab within Screaming Frog, can anyone tell me what the difference between "canonicalised", "canonical", and "no canonical" means? They're found in the filter box. I see the data but am not sure how to interpret them. Which one of these would I check to find canonical issues within a website? Are there any other easy ways to identify canonical issues?
Technical SEO | Oct 6, 2015, 5:33 PM | Flock.Media0 -
What punctuation can you use in meta tags? Are there any Google does not like?
So I know you can use dashes and | in meta tags, but can anyone tell me what other punctuation you can use? Also, it'd be great to know what punctuation you can't use. Thanks!
Technical SEO | Aug 24, 2015, 9:04 PM | Trevorneo1 -
Screaming Frog showing 503 status code. Why?
Screaming Frog is showing a 503 code for images. If I go and use a header checker like SEOBook it shows 200. Why would that be? Here is an example link- http://germanhausbarn.com/wp-content/uploads/2014/07/36-UPC-5145536-John-Deere-Stoneware-Logo-Mug-pair-25.00-Heavy-4-mugs-470x483.jpg
Technical SEO | Mar 6, 2015, 8:21 PM | EcommerceSite0 -
How to fix broken links?
Hi, I use WordPress CMS with Yoast SEO plugin. I have just found out that my 403 errors increased dramatically. It seems that all my tags below of each post are being broken for some reason. When i click on the tags i get the following massage: **403 Forbidden Request forbidden by administrative rules. ** I assume it has something to do with the configuration within Yoast SEO plugin. Dose anyone know how should i fix that? Thanks, Raviv evsGujA
Technical SEO | Apr 22, 2013, 10:08 AM | Indiatravelz0 -
How to find and fix 404 and broken links?
Hi, My campaign is showing me many 404 problems and other tools are also showing me broken links, but the links they show me dose work and I cant seem to find the broken links or the cause of the 404. Can you help?
Technical SEO | Oct 26, 2012, 4:32 PM | Joseph-Green-SEO0 -
How Can I Block Archive Pages in Blogger when I am not using classic/default template
Hi, I am trying to block all the archive pages of my blog as Google is indexing them. This could lead to duplicate content issue. I am not using default blogger theme or classic theme and therefore, I cannot use this code therein: Please suggest me how I can instruct Google not to index archive pages of my blog? Looking for quick response.
Technical SEO | Sep 30, 2011, 11:44 PM | SoftzSolutions0