ScreamingFrog won't crawl my site.
-
Hey guys,
My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.
Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspxIs it because the products are being loaded in Javascript?
What's your recommendation?All best,
Fred. -
Hi,
Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?
Our dev site is: https://msc-nop.com
Our regular site is: https://medicalscrubscollection.com
Thanks in advance!
-
I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.
-
Cheers @Andy & @Patrick
Hi Fred,
I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site
Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx
If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.
However, I'll leave you to verify that -
Hope that helps!
Cheers
Dan
-
I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently
-Andy
-
Hi there
It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:
- The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
- The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
- The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
- The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
- The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
- The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
- The website is using framesets. The SEO Spider does not crawl the frame src attribute.
- The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.
Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.
It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.
Hope this helps! Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why isn't there a browser tab title AND meta title?
Personal opinion; as a user, it makes sense for me to want a full 50+ character meta title which displays in a search engine that helps me determine if I want to click that link AND a concise browser tab title that tells me which page and brand I have open. As a search engine, I would (possibly wrongly) suppose that having one more piece user-facing of information would be helpful in understanding a page and that page's relation to the rest of the website. Theoretical example Meta title: A great title for the website I've been dreaming of! | OurBrand Browser tab title: Home | OurBrand
Intermediate & Advanced SEO | | sb10300 -
What's the best way to use redirects on a massive site consolidation
We are migrating 13 websites into a single new domain and with that we have certain pages that will be terminated or moved to a new folder path so we need custom 301 redirects built for these. However, we have a huge database of pages that will NOT be changing folder paths and it's way too many to write custom 301's for. One idea was to use domain forwarding or a wild card redirect so that all the pages would be redirected to their same folder path on the new URL. The problem this creates though is that we would then need to build the custom 301s for content that is moving to a new folder path, hence creating 2 redirects on these pages (one for the domain forwarding, and then a second for the custom 301 pointing to a new folder). Any ideas on a better solution to this?
Intermediate & Advanced SEO | | MJTrevens0 -
Moving to https with a bunch of redirects my programmer can't handle
Hi Mozzers, I referred a client of mine (last time) to a programmer that can transition their site from http to https. They use a wordpress website and currently use EPS Redirects as a plugin that 301 redirects about 400 pages. Currently, the way EPS redirects is setup (as shown in the attachment) is simple: On the left side you enter your old url, and on the the right side is the newly 301'd url. But here's the issue, since my client made the transition to https, the whole wordpress backend is setup that way as well. What this means is, if my client finds another old http url that he wants to redirect, this plugin only allows them to redirect https to https. As of now, all old http to https redirects STILL work even though the left side of the plugin switched all url's to a default HTTPS. But my client is worried the next plugin update he will lose all http to https redirects. While asking our programmer to add all 400 redirects to .htaccess, he states that's too many redirects and could slow down the website. Well, we don't want to lose all 400 301's and jeopardize our SEO. Question: what does everyone suggest as an alternative solution/plugin to redirect old http urls to https and future https to https urls? Thank you all! Ol8km
Intermediate & Advanced SEO | | Shawn1240 -
Ranking 1st for a keyword - but when 's' is added to the end we are ranking on the second page
Hi everyone - hope you are well. I can't get my head around why we are ranking 1st for a specific keyword, but then when 's' is added to the end of the keyword - we are ranking on the second page. What could be the cause of this? I thought that Google would class both of the keywords the same, in this case, let's say the keyword was 'button'. We would be ranking 1st for 'button', but 'buttons' we are ranking on the second page. Any ideas? - I appreciate every comment.
Intermediate & Advanced SEO | | Brett-S0 -
Baffled by this site's inability to rank
Hi guys, I've been working on a site for quite a while and it has a really good link profile, excellent content, no errors or penalties (as far as I can tell) but for some reason it consistently ranks below a lot of thin poor quality websites with spammy EMDs and a few obviously paid links from old-skool business directories etc. It has a significantly higher DA and linking root domains that almost all of them. Also it just bounces around from #40 to #28 to#35 to #40 to #28 on a weekly basis for many of our primary keywords. There just seems to be no logic to this and it goes against everything I know and everything we're taught. (I should probably point out that I've been doing this quite a while and have a number of other sites ranking extremely well in quite a few different verticals), Has anyone ever experienced anything like this and what did you do? Before I throw in the towel it would be good to hear from others and try and understand why this happens and if there is anything else I can try to help my client and fix it. Many thanks in advance.
Intermediate & Advanced SEO | | Blaze-Communication0 -
Do links to PDF's on my site pass "link juice"?
Hi, I have recently started a project on one of my sites, working with a branch of the U.S. government, where I will be hosting and publishing some of their PDF documents for free for people to use. The great SEO side of this is that they link to my site. The thing is, they are linking directly to the PDF files themselves, not the page with the link to the PDF files. So my question is, does that give me any SEO benefit? While the PDF is hosted on my site, there are no links in it that would allow a spider to start from the PDF and crawl the rest of my site. So do I get any benefit from these great links? If not, does anybody have any suggestions on how I could get credit for them. Keep in mind that editing the PDF's are not allowed by the government. Thanks.
Intermediate & Advanced SEO | | rayvensoft0 -
Client Can't Write His Own Articles
Hello, I'm helping a client put together an FAQ and 5 thorough, graphically stimulating, articles. The client can easily write his FAQ articles. However, he's not knowledgeable enough to write the 5 thorough articles, and hiring an expert to write them from scratch would cost a huge chunk of money. Should we have a writer put together an outline or rough draft and present that to the expert for editing? The client can afford that. Or what's the best way to move forward without costing a huge amount of money?
Intermediate & Advanced SEO | | BobGW1 -
My site has multiple H1's, one in the logo image and one as a header. Is there any official stance from the search engines on this?
In doing some research on this issue, I came across this blog post which seems to suggest it certainly will be a trigger to search engines. http://www.seounique.com/blog/multiple-h1-tags-triggers-google-penalty/ Could be a false positive on his specific case, but I was wondering what the community thought. Thanks in advance!
Intermediate & Advanced SEO | | jim_shook0