Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Will a blog post about a collection of useful tools and web resources for a specific niche being seen as negative by google for too many links?
SEO newbie here, I'm thinking about creating a blog post about a collection of useful tools and web resources for my specific niche. It'd be 300 links or more, but with comments, and categorized nicely. It'd be a useful resource for my target audience to bookmark, and share. Will google see this as a negative? If so, what's the best way to do such a blog post? Thanks
Technical SEO | | ericzou0 -
How is this site ranking so well? Their link profile is awful and website is messy and difficult to use?
Hi folks, This question has been baffling me for some time now and I'm still struggling to get to the bottom of it. www.sterlingbuild.co.uk is the website of choice for Google when it comes to searches relating to roof windows, velux windows, fakro windows etc. I can't understand why? Their link profile is atrocious. I'm struggling to find one 'high quality' link in their profile at all. Most of their links are guest blog posts which Google is apparently now treating as spam, or links from other sites that they own - also spam. The design of the site is incredibly messy and confusing. But one of the biggest flaws of the site (which I am suspicious may also be what is helping them) is they list every single different size of window as a different product. So whereas with most websites in this market, you search for the type of window you want e.g. a VELUX GGL 3050 window, and then choose the size you need from a drop-down menu, Sterlingbuild list every size as a different product. So you have to scroll through reams of product listings to find the window type in the right size before you get to any information about the product itself. Not to mention, their site is riddled with duplicate content because 12 different sizes of product are not different products, they are the same product, just a different size, so they have the identical product description for numerous separate pages basically selling the same product. How on earth has Google decided this is the best website in the marketplace when it comes to roof windows?
Technical SEO | | LukeyB301 -
What is Too Many On-Page Links?
in campaigns i see " Too Many On-Page Links " what is this ? can anyone please tell me ?
Technical SEO | | constructionhelpline0 -
Best use of robots.txt for "garbage" links from Joomla!
I recently started out on Seomoz and is trying to make some cleanup according to the campaign report i received. One of my biggest gripes is the point of "Dublicate Page Content". Right now im having over 200 pages with dublicate page content. Now.. This is triggerede because Seomoz have snagged up auto generated links from my site. My site has a "send to freind" feature, and every time someone wants to send a article or a product to a friend via email a pop-up appears. Now it seems like the pop-up pages has been snagged by the seomoz spider,however these pages is something i would never want to index in Google. So i just want to get rid of them. Now to my question I guess the best solution is to make a general rule via robots.txt, so that these pages is not indexed and considered by google at all. But, how do i do this? what should my syntax be? A lof of the links looks like this, but has different id numbers according to the product that is being send: http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167 I guess i need a rule that grabs the following and makes google ignore links that contains this: view=send_friend
Technical SEO | | teleman0 -
Why are my links not being counted?
I have a site that has over 400 links going to it. When I use Moz open site explorer or any other SEO tool its says I have only 12 links. Does anyone know why this could be happening?
Technical SEO | | Goopping0 -
Too many on page links
Hi All, As we all know, having to much links on a page is an obstacle for search engine crawlers in terms of the crawl allowance. My category pages are labeled as pages with to many "one page" links by the SEOmoz crawler. This probably comes from the fact that each product on the category page has multiple links (on the image and model number). Now my question is, would it help to setup a text-link with a clickable area as big as the product area? This means every product gets just one link. Would this help get the crawlers deeper in these pages and distribute the link-juice better? Or is Google smart enough already to figure out that two links to the same product page shouldn't be counted as two? Thanks for your replies guys. Rich
Technical SEO | | Horlogeboetiek0 -
Too Many On Page LInk
The analysis of my site is showing that I have a problem with too many on-page links. Most of this is due to our menu, and wanting users to be able to quickly get to the shopping category they are looking for. We end up with over 200 links in order to get the menu we want. How are other people dealing with a robust menu, but avoiding getting dinged for too many links? One of our pages in question is: http://www.milosport.com/category/2176-snowboards.aspx
Technical SEO | | dantheriver0 -
Is this seen as a Link Exchange
If i give a self serve banner ad to someone on my blog or a image with a link and they give me a text link ad is that in googles eyes a link exchange or a one way link.
Technical SEO | | DavidKonigsberg0