Strange 404s in Screaming Frog
-
I just ran a website (Drupal) through screaming frog and the only 404s I found related to web pages which were the same as URLs already used on the website plus the company phone number so... www.company.com/[their phone number] - www.company.com/services[their phone number] - any ideas what might be causing this problem?
-
Hi Luke,
As the guys above replied with, sounds like an a href with a phone number
If you check the 'inlinks' (via the lower window tab), you'll be able to see the source of these errors (the pages they are located). Obviously you can then view the source code & find the exact link, and what might be the issue.
Hope that helps!
Feel free to pop through any further questions directly to our support btw (http://www.screamingfrog.co.uk/seo-spider/support/), I only spotted this via a Google alert.
(We try and reply super quick & will always look into any problems!)
Cheers.
Dan
-
This is typically caused by a link on the page that is not formed correctly.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Old URLs that have 301s to 404s not being de-indexed.
We have a scenario on a domain that recently moved to enforcing SSL. If a page is requested over non-ssl (http) requests, the server automatically redirects to the SSL (https) URL using a good old fashioned 301. This is great except for any page that no longer exists, in which case you get a 301 going to a 404. Here's what I mean. Case 1 - Good page: http://domain.com/goodpage -> 301 -> https://domain.com/goodpage -> 200 Case 2 - Bad page that no longer exists: http://domain.com/badpage -> 301 -> https://domain.com/badpage -> 404 Google is correctly re-indexing all the "good" pages and just displaying search results going directly to the https version. Google is stubbornly hanging on to all the "bad" pages and serving up the original URL (http://domain.com/badpage) unless we submit a removal request. But there are hundreds of these pages and this is starting to suck. Note: the load balancer does the SSL enforcement, not the CMS. So we can't detect a 404 and serve it up first. The CMS does the 404'ing. Any ideas on the best way to approach this problem? Or any idea why Google is holding on to all the old "bad" pages that no longer exist, given that we've clearly indicated with 301s that no one is home at the old address?
Intermediate & Advanced SEO | | boxclever0 -
Sreaming Frog vs. Yoast - meta description clash
Hi all, I'm working on a site where when I crawl it with SF, SF doesn't pick up on the meta description (as in the source code it IS blank). However, the meta description has been set via the Yoast Wordpress plugin and it does exist in the source code and is shown in the SERPs. The code looks like this: <title>Dining Table and Chairs set</title> So my question is: will this be affecting SEO and how the website is ranking if all the actual are blank? Thank you
Intermediate & Advanced SEO | | Bee1591 -
Site's pages has GA codes based on Tag Manager but in Screaming Frog, it is not recognized
Using Tag Assistant (Google Chrome add-on), we have found that the site's pages has GA codes. (also see screenshot 1) However, when we used Screaming Frog's filter feature -- Configuration > Custom > Search > Contain/Does Not Contain, (see screenshot 2) SF is displaying several URLs (maybe all) of the site under 'Does Not Contain' which means that in SF's crawl, the site's pages has no GA code. (see screenshot 3) What could be the problem why SF states that there is no GA code in the site's pages when in fact, there are codes based on Tag Assistant/Manager? Please give us steps/ways on how to fix this issue. Thanks! SgTovPf VQNOJMF RCtBibP
Intermediate & Advanced SEO | | jayoliverwright0 -
Very strange HTML docs - what should I do with them through site migration?
I've just been looking at a website and it includes numerous web pages with addresses like this. I click on the URL and it takes me to a fully functional web page (not an image) and when I run it through Screaming Frog this comes up as an HTML page. The site has around 150 unique pages and over 450 pages like this one - how should I deal with these pages during an SEO migration (only a few are backlinked to)? I look forward to reading your thoughts. http://www.[companyname].co.uk/property/caravan-sleeps-4/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/images/cottageTypes/blank.png
Intermediate & Advanced SEO | | McTaggart0 -
Strange internal links and trying to improve PR ? - Please advise
Hi All, I've been looking at the internal links on my eCommerce site to try and improve PR and get it as efficient as possible so link juice isnt getting wasted etc and I've come across some odd ones I would like some advice on My website currently has between 125-146 links on every page (Sitemap approx 3500 pages). From what I read ,the ideal number of links is under 100 but can someone confirm is this is still the case ?..Is it a case of less is more , in terms of improving a page PR etc ? in terms of link juice strength etc so it's not getting diluted to unnecessary pages. One of my links is a bad url ( my domain + phone number for reason) which currently goes to a 404 page ?. - Is this okay or do we need to track down the link and remove it. I don't want link juice getting wasted as it's on every page. Another one of my links is my domain.name/# and another one with some characters after the # which both to the home page. Example www.domain.co.uk/# and www.domain.co.uk#abcde both go to homepage. Is this okay or am I potentially getting duplicate content as If I put these urls in , they go to my home page. I have a link on every page which opens up outlook (email) on the contact us. Should this really be changed to a button with a contact us form opening up instead ? I currently have 9 links on the bottom on every page i.e About it , delivery , hire terms,.contact us , trade accounts , privacy, sitemap. When I check , these pages seem to be my strongest pages in terms of PR. Is that because they are on every page?.. Should I look to reduce these links as they are accessible from the navigation menu apart from privacy and sitemap. Any advice on this would be greatly appreciated ? thanks Pete
Intermediate & Advanced SEO | | PeteC120 -
2.3 million 404s in GWT - learn to live with 'em?
So I’m working on optimizing a directory site. Total size: 12.5 million pages in the XML sitemap. This is orders of magnitude larger than any site I’ve ever worked on – heck, every other site I’ve ever worked on combined would be a rounding error compared to this. Before I was hired, the company brought in an outside consultant to iron out some of the technical issues on the site. To his credit, he was worth the money: indexation and organic Google traffic have steadily increased over the last six months. However, some issues remain. The company has access to a quality (i.e. paid) source of data for directory listing pages, but the last time the data was refreshed some months back, it threw 1.8 million 404s in GWT. That has since started to grow progressively higher; now we have 2.3 million 404s in GWT. Based on what I’ve been able to determine, links on this particular site relative to the data feed are broken generally due to one of two reasons: the page just doesn’t exist anymore (i.e. wasn’t found in the data refresh, so the page was simply deleted), or the URL had to change due to some technical issue (page still exists, just now under a different link). With other sites I’ve worked on, 404s aren’t that big a deal: set up a 301 redirect in htaccess and problem solved. In this instance, setting up that many 301 redirects, even if it could somehow be automated, just isn’t an option due to the potential bloat in the htaccess file. Based on what I’ve read here and here, 404s in and of themselves don’t really hurt the site indexation or ranking. And the more I consider it, the really big sites – the Amazons and eBays of the world – have to contend with broken links all the time due to product pages coming and going. Bottom line, it looks like if we really want to refresh the data on the site on a regular basis – and I believe that is priority one if we want the bot to come back more frequently – we’ll just have to put up with broken links on the site on a more regular basis. So here’s where my thought process is leading: Go ahead and refresh the data. Make sure the XML sitemaps are refreshed as well – hopefully this will help the site stay current in the index. Keep an eye on broken links in GWT. Implement 301s for really important pages (i.e. content-rich stuff that is really mission-critical). Otherwise, just learn to live with a certain number of 404s being reported in GWT on more or less an ongoing basis. Watch the overall trend of 404s in GWT. At least make sure they don’t increase. Hopefully, if we can make sure that the sitemap is updated when we refresh the data, the 404s reported will decrease over time. We do have an issue with the site creating some weird pages with content that lives within tabs on specific pages. Once we can clamp down on those and a few other technical issues, I think keeping the data refreshed should help with our indexation and crawl rates. Thoughts? If you think I’m off base, please set me straight. 🙂
Intermediate & Advanced SEO | | ufmedia0 -
Recent ranking drop followed by strange behavior in SERPS
Recently on the evening of August 5th almost all of the keywords our pages ranked highly for dropped by anywhere from 5 to 8 pages. The only activity during this time was an article that had been picked up by a major news outlet and then was apparently copied onto other sources with links to our domain and article. More puzzling though is rather than simply having the same page show up lower for a keyword, in a number of instances a different page is now shown for the result, often with less or no relevance to the keyword. In some cases, for a single keyword phrase we've seen as many as 10 different pages rotated throughout the day when performing a search. Prior to our rankings falling, we've never seen this behavior.
Intermediate & Advanced SEO | | BrianQuinn0 -
Strange URLs, how do I fix this?
I've just check Majestic and have seen around 50 links coming from one of my other sites. The links all look like this: http://www.dwww.mysite.com
Intermediate & Advanced SEO | | JohnPeters
http://www.eee.mysite.com
http://www.w.mysite.com The site these links are coming from is a html site. Any ideas whats going on or a way to get rid of these urls? When I visit the strange URLs such as http://www.dwww.mysite.com, it shows the home page of http://www.mysite.com. Is there a way to redirect anything like this back to the home page?0