ScreamingFrog won't crawl my site.
-
Hey guys,
My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.
Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspxIs it because the products are being loaded in Javascript?
What's your recommendation?All best,
Fred. -
Hi,
Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?
Our dev site is: https://msc-nop.com
Our regular site is: https://medicalscrubscollection.com
Thanks in advance!
-
I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.
-
Cheers @Andy & @Patrick
Hi Fred,
I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site
Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx
If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.
However, I'll leave you to verify that -
Hope that helps!
Cheers
Dan
-
I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently
-Andy
-
Hi there
It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:
- The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
- The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
- The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
- The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
- The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
- The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
- The website is using framesets. The SEO Spider does not crawl the frame src attribute.
- The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.
Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.
It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.
Hope this helps! Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
'?q=:new&sort=new' URL parameters help...
Hey guys, I have these types of URLs being crawled and picked up on by MOZ but they are not visible to my users. The URLs are all 'hidden' from users as they are basically category pages that have no stock, however MOZ is crawling them and I dont understand how they are getting picked up as 'duplicate content'. Anyone have any info on this? http://www.example.ch/de/example/marken/brand/make-up/c/Cat_Perso_Brand_3?q=:new&sort=new Even if I understood the technicality behind it then I could try and fix it if need be. Thanks Guys Kay
Intermediate & Advanced SEO | | eLab_London0 -
Duplicate Content through 'Gclid'
Hello, We've had the known problem of duplicate content through the gclid parameter caused by Google Adwords. As per Google's recommendation - we added the canonical tag to every page on our site so when the bot came to each page they would go 'Ah-ha, this is the original page'. We also added the paramter to the URL parameters in Google Wemaster Tools. However, now it seems as though a canonical is automatically been given to these newly created gclid pages; below https://www.google.com.au/search?espv=2&q=site%3Awww.mypetwarehouse.com.au+inurl%3Agclid&oq=site%3A&gs_l=serp.3.0.35i39l2j0i67l4j0i10j0i67j0j0i131.58677.61871.0.63823.11.8.3.0.0.0.208.930.0j3j2.5.0....0...1c.1.64.serp..8.3.419.nUJod6dYZmI Therefore these new pages are now being indexed, causing duplicate content. Does anyone have any idea about what to do in this situation? Thanks, Stephen.
Intermediate & Advanced SEO | | MyPetWarehouse0 -
18,000 'Title Element is too Long' Errors
How detrimental is this in the overall SEO scheme of things? Having checked 3 of our main competitors, they too seem to have similar issues... I am trying to look at a solution but it is proving very difficult! Thanks Andy
Intermediate & Advanced SEO | | TomKing0 -
I have a general site for my insurance agency. Should I create niche sites too?
I work with several insurance agencies and I get this questions several times each month. Most agencies offer personal and business insurance and in a certain geographic location. I recommend creating a quality general agency site but would they have more success creating other nice sites as well? For example, a niche site about home insurance and one about auto insurance. What would your recommendation be?
Intermediate & Advanced SEO | | lagunaitech1 -
I run an (unusual) clothing company. And I'm about to set up a version of our existing site for kids. Should I use a different domain? Or keep the current root domain?
Hello. I have a burning question which I have been trying to answer for a while. I keep getting conflicting answers and I could really do with your help. I currently run an animal fancy dress (onesie) company in the UK called Kigu through the domain www.kigu.co.uk. We're the exclusive distributor for a supplier of Japanese animal costumes and we've been selling directly through this domain for about 3 years. We rank well across most of our key words and get about 2000 hits each day. We're about to start selling a Kids range - miniature versions of the same costumes. We're planning on doing this through a different domain which is currently live - www.kigu-kids.co.uk. It' been live for about 3-4 weeks. The idea behind keeping them on separate domains is that it is a different target market and we could promote the Kids site separately without having to bring people through the adult site. We want to keep the adult site (or at least the homepage) relatively free from anything kiddy as we promote fancy dress events in nightclubs and at festivals for over 18s (don't worry, nothing kinky) and we wouldn't want to confuse that message. I've since been advised by an expert in the field that that we should set up a redirect from www.kigu-kids.co.uk and house the kids website under www.kigu.co.uk/kids as this will be better from an SEO perspective and if we don't we'll only be competing with ourselves. Are we making a big mistake by not using the same root domain for both thus getting the most of the link juice for the kids site? And if we do decide to switch to have the domain as www.kigu.co.uk/kids, is it a mistake to still promote the www.kigu-kids.co.uk (redirecting) as our domain online? Would these be wasted links? Or would we still see the benefit? Is it better to combine or is two websites better than one? Any help and advice would be much appreciated. Tom.
Intermediate & Advanced SEO | | KIGUCREW0 -
Our site has been penalized and it's proving to be very hard to get our rankings back...
So I have a question. We have used nearly every trick in the book to rank our site, including a ton of white hat stuff.... but then also a lot of black hat practices that resulted in us dropping in the rankings by about 30-40 positions. And getting back to where we were (top 10 for most keywords) is proving to be nearly impossible. We have a ton of great content coming off of the site and we actually offer a quality product. We follow most of the guidelines advocated here on SEOmoz. But the black hat stuff we did has really taken a toll. And it's gonna be pretty much impossible to go back in time and erase all of the Black Hat stuff we did. So what should we do? Should we design a completely new website with a new domain? What can be done to help?
Intermediate & Advanced SEO | | LilyRay0 -
My site has multiple H1's, one in the logo image and one as a header. Is there any official stance from the search engines on this?
In doing some research on this issue, I came across this blog post which seems to suggest it certainly will be a trigger to search engines. http://www.seounique.com/blog/multiple-h1-tags-triggers-google-penalty/ Could be a false positive on his specific case, but I was wondering what the community thought. Thanks in advance!
Intermediate & Advanced SEO | | jim_shook0 -
Questions about turning my wordpress site into an ecommerce site. Experience needed.
I have a wordpress site that is about a product that is now getting some great traffic. Right now It has affiliate stuff on it. I want to sell my own product so I will be turning this wordpress site into an ecommerce site. I want to redesign it so I am not looking for simple plugins to just add a cart. The part I am really confused about is what to do with my posts and categories? How does that work when turning this site into an ecommerce site? Lets say the site is "hats for adults" My post pages are things like "funny hats for adults", "hats for adult men" etc etc. Would I turn these posts pages into like category pages that have a category of products. Or should I create real categories and have my developer turn those into the ecommerce category pages and then redirect my posts to those categories? Maybe I don't even know what I am talking about. Is this even making sense? This is a small site (5posts and 1 category) and most of the traffic will come from the homepage keywords anyways.
Intermediate & Advanced SEO | | PEnterprises0