Can't crawl website with Screaming frog... what is wrong?
-
Hello all - I've just been trying to crawl a site with Screaming Frog and can't get beyond the homepage - have done the usual stuff (turn off JS and so on) and no problems there with nav and so on- the site's other pages have indexed in Google btw.
Now I'm wondering whether there's a problem with this robots.txt file, which I think may be auto-generated by Joomla (I'm not familiar with Joomla...) - are there any issues here? [just checked... and there isn't!]
If the Joomla site is installed within a folder such as at
e.g. www.example.com/joomla/ the robots.txt file MUST be
moved to the site root at e.g. www.example.com/robots.txt
AND the joomla folder name MUST be prefixed to the disallowed
path, e.g. the Disallow rule for the /administrator/ folder
MUST be changed to read Disallow: /joomla/administrator/
For more information about the robots.txt standard, see:
http://www.robotstxt.org/orig.html
For syntax checking, see:
http://tool.motoricerca.info/robots-checker.phtml
User-agent: *
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/ -
For anyone wondering; The answer above by Ecommerce Site (odd name btw) works - 21-Nov-2016.
-
This is the best I could find to so someone who had a similar problem with Joomla-
"In the premium version you can slow down the crawl rate under 'speed' in the configuration. In the free lite version, you can crawl the site and then right click on any URLs with a 403 response and press 're-spider'. The server will generally then allow you to crawl these pages (and return a 200 ok response) as you're not requesting too many at once, so you might have to re-spider them individually."
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should you 'noindex' Checkout Pages?
Today I was reviewing my Moz analytics and suddenly noticed 1,000 issues with pages without a meta description. I reviewed the list and learned it is 1,000 checkout pages. That's because my website has thousands of agency pages from which you can buy a product, and it reflects that difference on each version of the checkout. So, I was thinking about no-indexing (but continuing to 'follow') these checkout pages, but wondering if it has any knock-on effects I may be unaware of? Any assistance is much appreciated. Luke
Intermediate & Advanced SEO | | Luke_Proctor0 -
Blacklisted website no longer blacklisted, but will not appear on Google's search engine.
We have a client who before us, had a website that was blacklisted by Google. After we created their new website, we submitted an appeal through Google's Webmaster Tools, and it was approved. One year later, they are still unable to rank for anything on Google. The keyword we are attempting to rank for on their home page is "Day in the Life Legal Videos" which shouldn't be too difficult to rank for after a year. But their website cannot be found. What else can we do to repair this previously blacklisted website after we're already been approved by Google? After doing a link audit, we found only one link with a spam score of 7, but I highly doubt that is what is causing this website to no longer appear on Google. Here is the website in question: https://www.verdictvideos.com/
Intermediate & Advanced SEO | | rodneywarner0 -
Google doesn't index image slideshow
Hi, My articles are indexed and images (full size) via a meta in the body also. But, the images in the slideshow are not indexed, have you any idea? A problem with the JS Example : http://www.parismatch.com/People/Television/Sport-a-la-tele-les-femmes-a-l-abordage-962989 Thank you in advance Julien
Intermediate & Advanced SEO | | Julien.Ferras0 -
'Nofollow' footer links from another site, are they 'bad' links?
Hi everyone,
Intermediate & Advanced SEO | | romanbond
one of my sites has about 1000 'nofollow' links from the footer of another of my sites. Are these in any way hurtful? Any help appreciated..0 -
Any ideas, what i'm doing wrong?
Hi, I have done alot of work over the past few weeks to fix errors but I seem to be slding down the rankings again! I have attached a screenshot of the competitive link analysis. The 2nd competitor along is pushing past me and I just can't see why. What steps should I take and what is the priority? Thanks, T23 auAMrVT.jpg
Intermediate & Advanced SEO | | tekton230 -
SEO and marketing for a company that doesn't want to promote their primary website
Hi All! One of my new clients is in a semi-grey-hat industry, and is in perpetual danger of having their real websites (of which they have several), blocked by the Chinese firewall (which is where their target market is). So their idea is to use neutral sites to write information (Squidoo, article site, maybe a stand-alone WP site with a few pages) and promote those pages. The idea being that China is less likely to block those sites, and then the link to the actual website from those pages could always be changed if China blocks the website listed. I'm a little dubious as to how feasible this is - how do you promote a Squidoo page? Or an article on an article site for semi-competitive keywords? Besides on-page SEO (which may not be enough), is there anything you can really do post-Penguin? If anyone has any ideas as to the above - or as to how else to effectively market sites when you can't market the site and brand directly, I'd be very happy to hear. Thanks!
Intermediate & Advanced SEO | | debi_zyx0 -
How to check a website's architecture?
Hello everyone, I am an SEO analyst - a good one - but I am weak in technical aspects. I do not know any programming and only a little HTML. I know this is a major weakness for an SEO so my first request to you all is to guide me how to learn HTML and some basic PHP programming. Secondly... about the topic of this particular question - I know that a website should have a flat architecture... but I do not know how to find out if a website's architecture is flat or not, good or bad. Please help me out on this... I would be obliged. Eagerly awaiting your responses, BEst Regards, Talha
Intermediate & Advanced SEO | | MTalhaImtiaz0 -
Can you see the 'indexing rules' that are in place for your own site?
By 'index rules' I mean the stipulations that constitute whether or not a given page will be indexed. If you can see them - how?
Intermediate & Advanced SEO | | Visually0