Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonical homepage link uses trailing slash while default homepage uses no trailing slash, will this be an issue?
Hello, 1st off, let me explain my client in this case uses BigCommerce, and I don't have access to the backend like most other situations. So I have to rely on BG to handle certain issues. I'm curious if there is much of a difference using domain.com/ as the canonical url while BG currently is redirecting our domain to domain.com. I've been using domain.com/ consistently for the last 6 months, and since we switches stores on Friday, this issue has popped up and has me a bit worried that we'll loose somehow via link juice or overall indexing since this could confuse crawlers. Now some say that the domain url is fine using / or not, as per - https://moz.com/community/q/trailing-slash-and-rel-canonical But I also wanted to see what you all felt about this. What says you?
Technical SEO | | Deacyde0 -
Updating inbound links vs. 301 redirecting the page they link to
Hi everyone, I'm preparing myself for a website redesign and finding conflicting information about inbound links and 301 redirects. If I have a URL (we'll say website.com/website) that is linked to by outside sources, should I get those outside sources to update their links when I change the URL to website.com/webpage? Or is it just as effective from a link juice perspective to simply 301 redirect the old page to the new page? Are there any other implications to this choice that I may want to consider? Thanks!
Technical SEO | | Liggins0 -
Exclude status codes in Screaming Frog
I have a very large ecommerce site I'm trying to spider using screaming frog. Problem is I keep hanging even though I have turned off the high memory safeguard under configuration. The site has approximately 190,000 pages according to the results of a Google site: command. The site architecture is almost completely flat. Limiting the search by depth is a possiblity, but it will take quite a bit of manual labor as there are literally hundreds of directories one level below the root. There are many, many duplicate pages. I've been able to exclude some of them from being crawled using the exclude configuration parameters. There are thousands of redirects. I haven't been able to exclude those from the spider b/c they don't have a distinguishing character string in their URLs. Does anyone know how to exclude files using status codes? I know that would help. If it helps, the site is kodylighting.com. Thanks in advance for any guidance you can provide.
Technical SEO | | DonnaDuncan0 -
Self-referencing links
I personally think that self-referencing links are silly. It's blatantly easy for Google to tell and my instinct says that the link juice for this would simply evaporate rather than passing back to itself. Does anyone have information backing me up from an authoritative source? I can't find any info about this linked to Matt Cutts, Rand or any of those I look up to.
Technical SEO | | IPROdigital0 -
Links from the same server has value or not
Hi Guys, Sometime ago one of the SEO experts said to me if I get links from the same IP address, Google doesn't count them as with much value. For an example, I am a web devleoper and I host all my clients websites on one server and link them back to me. Im wondering whether those links have any value when it comes to seo or should I consider getting different hosting providers? Regards Uds
Technical SEO | | Uds0 -
MBG Tracker...how to use it?
So I am a new blogger that has been submitting guest blog posts to a number of different blogs. It was recommended that I use the MBG Tracker so I can track the back links. The problem is that I am totally lost on how to use this tool. As I said before I am new to this whole thing and I am not really sure what constitutes a "base link" and a "back link." In the author bylines we are linking to different pages within a larger website. If anyone can help me I would really appreciate it!
Technical SEO | | Stroll0 -
Does Yelp pass link juice?
This is probably a profoundly obvious question, but I can't seem to find an explicit answer on the internet, so I'll ask it here: Yelp's links out to local business websites are not nofollow'd, but they go through a javascript-based redirect. My understanding is that javascript redirected links do not pass link juice, so a link from a yelp profile will not directly impact my page authority; however, it looks like yelp does use nofollow judiciously for internal links, so I don't understand why they would allow follow for these "useless" outbound links. Do yelp's javascript-redirected links pass link juice?
Technical SEO | | tvkiley0