ScreamingFrog won't crawl my site.
-
Hey guys,
My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.
Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspxIs it because the products are being loaded in Javascript?
What's your recommendation?All best,
Fred. -
Hi,
Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?
Our dev site is: https://msc-nop.com
Our regular site is: https://medicalscrubscollection.com
Thanks in advance!
-
I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.
-
Cheers @Andy & @Patrick
Hi Fred,
I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site
Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx
If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.
However, I'll leave you to verify that -
Hope that helps!
Cheers
Dan
-
I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently
-Andy
-
Hi there
It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:
- The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
- The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
- The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
- The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
- The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
- The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
- The website is using framesets. The SEO Spider does not crawl the frame src attribute.
- The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.
Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.
It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.
Hope this helps! Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My "search visibility" went from 3% to 0% and I don't know why.
My search visibility on here went from 3.5% to 3.7% to 0% to 0.03% and now 0.05% in a matter of 1 month and I do not know why. I make changes every week to see if I can get higher on google results. I do well with one website which is for a medical office that has been open for years. This new one where the office has only been open a few months I am having trouble. We aren't getting calls like I am hoping we would. In fact the only one we did receive I believe is because we were closest to him in proximity on google maps. I am also having some trouble with the "Links" aspect of SEO. Everywhere I see to get linked it seems you have to pay. We are a medical office we aren't selling products so not many Blogs would want to talk about us. Any help that could assist me with getting a higher rank on google would be greatly appreciated. Also any help with getting the search visibility up would be great as well.
Intermediate & Advanced SEO | | benjaminleemd1 -
301 redirecting a site that currently links to the target site
I have a personal blog that has a good amount of back links pointing at it from high quality relevant authoritative sites in my niche. I also run a company in the same niche. I link to a page on the company site from the personal blog article that has bunch of relevant links pointing at it (as it's highly relevant to the content on the personal blog). Overview: Relevant personal blog post has a bunch of relevant external links pointing at it (completely organic). Relevant personal blog post then links (externally) to relevant company site page and is helping that page rank. Question: If I do the work to 301 the personal blog to the company site, and then link internally from the blog page to the other relevant company page, will this kill that back link or will the internal link help as much as the current external link does currently? **For clarity: ** External sites => External blog => External link to company page VS External sites => External blog 301 => Blog page (now on company blog) => Internal link to target page I would love to hear from anyone that has performed this in the past 🙂
Intermediate & Advanced SEO | | Keyword_NotProvided0 -
Site's disappearnce in web rankings
I'm currently doing some work on a website: http://www.abetterdriveway.com.au. Upon starting, I detected a lot of spammy links going to this website and sort to remove them before submitting a disavow report. A few months later, this site completely disappeared in the rankings, with all keywords suddenly not ranked. I realised that the test website (which was put up to view before the new site went live) was still up on another URL and Google was suddenly ranking that site instead. Hence, I ensured that test site was completely removed. 3 weeks later however, the site (www.abetterdriveway.com.au) still remains unranked for its keywords. Upon checking Web Master Tools, I cannot see anything that stands out. There is no manual action or crawling issues that I can detect. Would anyone know the reason for this persistent disappearance? Is it something I will just have to wait out until ranking results come back, or is there something I am missing? Help here would be much appreciated.
Intermediate & Advanced SEO | | Gavo0 -
Can't get auto-generated content de-indexed
Hello and thanks in advance for any help you can offer me! Customgia.com, a costume jewelry e-commerce site, has two types of product pages - public pages that are internally linked and private pages that are only accessible by accessing the URL directly. Every item on Customgia is created online using an online design tool. Users can register for a free account and save the designs they create, even if they don't purchase them. Prior to saving their design, the user is required to enter a product name and choose "public" or "private" for that design. The page title and product description are auto-generated. Since launching in October '11, the number of products grew and grew as more users designed jewelry items. Most users chose to show their designs publicly, so the number of products in the store swelled to nearly 3000. I realized many of these designs were similar to each and occasionally exact duplicates. So over the past 8 months, I've made 2300 of these design "private" - and no longer accessible unless the designer logs into their account (these pages can also be linked to directly). When I realized that Google had indexed nearly all 3000 products, I entered URL removal requests on Webmaster Tools for the designs that I had changed to "private". I did this starting about 4 months ago. At the time, I did not have NOINDEX meta tags on these product pages (obviously a mistake) so it appears that most of these product pages were never removed from the index. Or if they were removed, they were added back in after the 90 days were up. Of the 716 products currently showing (the ones I want Google to know about), 466 have unique, informative descriptions written by humans. The remaining 250 have auto-generated descriptions that read coherently but are somewhat similar to one another. I don't think these 250 descriptions are the big problem right now but these product pages can be hidden if necessary. I think the big problem is the 2000 product pages that are still in the Google index but shouldn't be. The following Google query tells me roughly how many product pages are in the index: site:Customgia.com inurl:shop-for Ideally, it should return just over 716 results but instead it's returning 2650 results. Most of these 1900 product pages have bad product names and highly similar, auto-generated descriptions and page titles. I wish Google never crawled them. Last week, NOINDEX tags were added to all 1900 "private" designs so currently the only product pages that should be indexed are the 716 showing on the site. Unfortunately, over the past ten days the number of product pages in the Google index hasn't changed. One solution I initially thought might work is to re-enter the removal requests because now, with the NOINDEX tags, these pages should be removed permanently. But I can't determine which product pages need to be removed because Google doesn't let me see that deep into the search results. If I look at the removal request history it says "Expired" or "Removed" but these labels don't seem to correspond in any way to whether or not that page is currently indexed. Additionally, Google is unlikely to crawl these "private" pages because they are orphaned and no longer linked to any public pages of the site (and no external links either). Currently, Customgia.com averages 25 organic visits per month (branded and non-branded) and close to zero sales. Does anyone think de-indexing the entire site would be appropriate here? Start with a clean slate and then let Google re-crawl and index only the public pages - would that be easier than battling with Webmaster tools for months on end? Back in August, I posted a similar problem that was solved using NOINDEX tags (de-indexing a different set of pages on Customgia): http://moz.com/community/q/does-this-site-have-a-duplicate-content-issue#reply_176813 Thanks for reading through all this!
Intermediate & Advanced SEO | | rja2140 -
Is 301 redirecting your index page to the root '/' safe to do or do you end up in an endless loop?
Hi I need to tidy up my home page a little, I have some links to our index.html page but I just want them to go to the root '/' so I thought I could 301 redirect it. However is this safe to do? I'm getting duplicate page notifications in my analytic reportings tools about the home page and need a quick way to fix this issue. Many thanks in advance David
Intermediate & Advanced SEO | | David-E-Carey0 -
1 Ecommerce site for several product segments or 1 Ecommerce site for each product segment ?
I am currently struggling with the decision whether to create individual ecommerce sites for each of 3 consumer product segments or rather to integrate them all under one umbrella domain. Obviously integration under 1 domain makes link building easier, but I am not sure how far google will favor in rankings websites focussed on one topic=product segment. Product segments are medium competitive.Product segments are not directly related but there may be some overlap in customer demographics- Any thoughts ?
Intermediate & Advanced SEO | | lcourse1 -
Lots of city pages - How do I ensure we don't get penalized
We are planning on having a job posting page for each city that we are looking to hire new CFO partners in. But, the problem is, we have LOTS of locations. I was wondering what would be the best way to have similar content on each page (since the job description and requirements are the same for each job posting) without being hit by Google for having duplicate content? One of the main reasons we have decided to have location based pages is that we have noticed visitors to our site are searching for "cfo job in [location] but we notice that most of these visitors then leave. We believe it to be because the pages they land on make no mention of the location that they were looking for and is a little incongruent with what they were expecting. We are looking to use the following URLs and TItle/Description as an example: | http://careers.b2bcfo.com/cfo-jobs/Alabama/Birmingham | CFO Careers in Birmingham, AL | | Are you looking for a CFO Career in Birmingham, Alabama ? We're looking for partners there. Apply today! | | Any advice you have for this would be greatly appreciated. Thank you.
Intermediate & Advanced SEO | | B2B.CFO0 -
Google Maps results doesn't show my site url but rather the maps url, why is this?
For several of my clients landing pages that show up in the Maps results the website url has been overwritten by the maps url (maps.google.com). Even though on my places page I have the correct website set up. Does anyone have any idea why they would be doing this and how I can correct it? Thanks kinldy in advance, Aaron. maps-url.png
Intermediate & Advanced SEO | | afranklin0