Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
ScreamingFrog won't crawl my site.
-
Hey guys,
My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.
Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspxIs it because the products are being loaded in Javascript?
What's your recommendation?All best,
Fred. -
Hi,
Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?
Our dev site is: https://msc-nop.com
Our regular site is: https://medicalscrubscollection.com
Thanks in advance!
-
I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.
-
Cheers @Andy & @Patrick
Hi Fred,
I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site
Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx
If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.
However, I'll leave you to verify that -
Hope that helps!
Cheers
Dan
-
I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently
-Andy
-
Hi there
It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:
- The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
- The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
- The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
- The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
- The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
- The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
- The website is using framesets. The SEO Spider does not crawl the frame src attribute.
- The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.
Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.
It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.
Hope this helps! Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
After hack and remediation, thousands of URL's still appearing as 'Valid' in google search console. How to remedy?
I'm working on a site that was hacked in March 2019 and in the process, nearly 900,000 spam links were generated and indexed. After remediation of the hack in April 2019, the spammy URLs began dropping out of the index until last week, when Search Console showed around 8,000 as "Indexed, not submitted in sitemap" but listed as "Valid" in the coverage report and many of them are still hack-related URLs that are listed as being indexed in March 2019, despite the fact that clicking on them leads to a 404. As of this Saturday, the number jumped up to 18,000, but I have no way of finding out using the search console reports why the jump happened or what are the new URLs that were added, the only sort mechanism is last crawled and they don't show up there. How long can I expect it to take for these remaining urls to also be removed from the index? Is there any way to expedite the process? I've submitted a 'new' sitemap several times, which (so far) has not helped. Is there any way to see inside the new GSC view why/how the number of valid URLs in the indexed doubled over one weekend?
Intermediate & Advanced SEO | | rickyporco0 -
Crawl Stats Decline After Site Launch (Pages Crawled Per Day, KB Downloaded Per Day)
Hi all, I have been looking into this for about a month and haven't been able to figure out what is going on with this situation. We recently did a website re-design and moved from a separate mobile site to responsive. After the launch, I immediately noticed a decline in pages crawled per day and KB downloaded per day in the crawl stats. I expected the opposite to happen as I figured Google would be crawling more pages for a while to figure out the new site. There was also an increase in time spent downloading a page. This has went back down but the pages crawled has never went back up. Some notes about the re-design: URLs did not change Mobile URLs were redirected Images were moved from a subdomain (images.sitename.com) to Amazon S3 Had an immediate decline in both organic and paid traffic (roughly 20-30% for each channel) I have not been able to find any glaring issues in search console as indexation looks good, no spike in 404s, or mobile usability issues. Just wondering if anyone has an idea or insight into what caused the drop in pages crawled? Here is the robots.txt and attaching a photo of the crawl stats. User-agent: ShopWiki Disallow: / User-agent: deepcrawl Disallow: / User-agent: Speedy Disallow: / User-agent: SLI_Systems_Indexer Disallow: / User-agent: Yandex Disallow: / User-agent: MJ12bot Disallow: / User-agent: BrightEdge Crawler/1.0 (crawler@brightedge.com) Disallow: / User-agent: * Crawl-delay: 5 Disallow: /cart/ Disallow: /compare/ ```[fSAOL0](https://ibb.co/fSAOL0)
Intermediate & Advanced SEO | | BandG0 -
Site structure: Any issues with 404'd parent folders?
Is there any issue with a 404'd parent folder in a URL? There's no links to the parent folder and a parent folder page never existed. For example say I have the following pages w/ content: /famous-dogs/lassie/
Intermediate & Advanced SEO | | dsbud
/famous-dogs/snoopy/
/famous-dogs/scooby-doo/ But I never (and maybe never plan to) created a general **/famous-dogs/ **page. Sitemaps.xml does not link to it, nor does any page on my site. Is there any concerns with doing this? Am I missing out on any sort of value that might pass to a parent folder?0 -
Can't generate a sitemap with all my pages
I am trying to generate a site map for my site nationalcurrencyvalues.com but all the tools I have tried don't get all my 70000 html pages... I have found that the one at check-domains.com crawls all my pages but when it writes the xml file most of them are gone... seemingly randomly. I have used this same site before and it worked without a problem. Can anyone help me understand why this is or point me to a utility that will map all of the pages? Kindly, Greg
Intermediate & Advanced SEO | | Banknotes0 -
Splitting One Site Into Two Sites Best Practices Needed
Okay, working with a large site that, for business reasons beyond organic search, wants to split an existing site in two. So, the old domain name stays and a new one is born with some of the content from the old site, along with some new content of its own. The general idea, for more than just search reasons, is that it makes both the old site and new sites more purely about their respective subject matter. The existing content on the old site that is becoming part of the new site will be 301'd to the new site's domain. So, the old site will have a lot of 301s and links to the new site. No links coming back from the new site to the old site anticipated at this time. Would like any and all insights into any potential pitfalls and best practices for this to come off as well as it can under the circumstances. For instance, should all those links from the old site to the new site be nofollowed, kind of like a non-editorial link to an affiliate or advertiser? Is there weirdness for Google in 301ing to a new domain from some, but not all, content of the old site. Would you individually submit requests to remove from index for the hundreds and hundreds of old site pages moving to the new site or just figure that the 301 will eventually take care of that? Is there substantial organic search risk of any kind to the old site, beyond the obvious of just not having those pages to produce any more? Anything else? Any ideas about how long the new site can expect to wander the wilderness of no organic search traffic? The old site has a 45 domain authority. Thanks!
Intermediate & Advanced SEO | | 945010 -
Other domains hosted on same server showing up in SERP for 1st site's keywords
For the website in question, the first domain alphabetically on the shared hosting space, strange search results are appearing on the SERP for keywords associated with the site. Here is an example: A search for "unique company name" shows the results: www.uniquecompanyname.com as the top result. But on pages 2 and 3, we are getting results for the same content but for domains hosted on the same server. Here are some examples with the domain name replaced: UNIQUE DOMAIN NAME PAGE TITLE
Intermediate & Advanced SEO | | Motava
ftp.DOMAIN2.com/?action=news&id=63
META DESCRIPTION TEXT UNIQUE DOMAIN NAME PAGE TITLE 2
www.DOMAIN3.com/?action=news&id=120
META DESCRIPTION TEXT2 UNIQUE DOMAIN NAME PAGE TITLE 2
www.DOMAIN4.com/?action=news&id=120
META DESCRIPTION TEXT2 UNIQUE DOMAIN NAME PAGE TITLE 3
mail.DOMAIN5.com/?action=category&id=17
META DESCRIPTION TEXT3 ns5.DOMAIN6.com/?action=article&id=27 There are more but those are just some examples. These other domain names being listed are other customer domains on the same VPS shared server. When clicking the result the browser URL still shows the other customer domain name B but the content is usually the 404 page. The page title and meta description on that page is not displayed the same as on the SERP.As far as we can tell, this is the only domain this is occurring for.So far, no crawl errors detected in Webmaster Tools and moz crawl not completed yet.0 -
Splitting a Site into Two Sites for SEO Purposes
I have a client that owns a business that really could be easily divided into two separate business in terms of SEO. Right now his web site covers both divisions of his business. He gets about 5500 visitors a month. The majority go to one part of his business and around 600 each month go to the other. So about 11% I'm considering breaking off this 11% and putting it on an entirely different domain name. I think I could rank better for this 11%. The site would only be SEO'd for this particular division of the company. The keywords would not be in competition with each other. I would of course link the two web sites and watch that I don't run into any duplicate content issues. I worry about placing the redirects from the pages that I remove to the new pages. I know Google is not a fan of redirects. Then I also worry about the eventual drop in traffic to the main site now. How big of a factor is traffic in rankings? Other challenges include that the business services 4 major metropolitan areas. Would you do this? Have you done this? How did it work? Any suggestions?
Intermediate & Advanced SEO | | MSWD0 -
Is it possible to Spoof Analytics to give false Unique Visitor Data for Site A to Site B
Hi, We are working as a middle man between our client (website A) and another website (website B) where, website B is going to host a section around websites A products etc. The deal is that Website A (our client) will pay Website B based on the number of unique visitors they send them. As the middle man we are in charge of monitoring the number of Unique visitors sent though and are going to do this by monitoring Website A's analytics account and checking the number of Unique visitors sent. The deal is worth quite a lot of money, and as the middle man we are responsible for making sure that no funny business goes on (IE false visitors etc). So to make sure we have things covered - What I would like to know is 1/. Is it actually possible to fool analytics into reporting falsely high unique visitors from Webpage A to Site B (And if so how could they do it). 2/. What could we do to spot any potential abuse (IE is there an easy way to spot that these are spoofed visitors). Many thanks in advance
Intermediate & Advanced SEO | | James770