Screaming Frog, Moz and other crawlers
-
Hi
Ignorant question, but is it possible to use Screaming Frog or the Moz crawler or any other reputable crawler for a site still in development i.e. it is yet to be indexed? If so, could someone provide some quick instructions on how this can be done.
Thanks in advance for any support.
Neil
-
Thanks for the answer, much appreciated.
-
Yes, thanks for the answer.
-
Tim,
As long as you can get to the site remotely via a website address you should be good to go. However, if the site is blocking crawlers via robots.txt file or meta robots tag rogerbot won't access it. On the other hand, screamingfrog has a setting to tell it to ignore the robots.txt file if one exists.
-
If the site is online or on a domain or somewhere where Screaming Frog can reach it you can spider it with the software yes.
But it has to be online somewhere so the software can reach it.
Did this answer your question?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Limit MOZ crawl rate on Shopify or when you don't have access to robots.txt
Hello. I'm wondering if there is a way to control the crawl rate of MOZ on our site. It is hosted on Shopify which does not allow any kind of control over the robots.txt file to add a rule like this: User-Agent: rogerbot Crawl-Delay: 5 Due to this, we get a lot of 430 error codes -mainly on our products- and this certainly would prevent MOZ from getting the full picture of our shop. Can we rely on MOZ's data when critical pages are not being crawled due to 430 errors? Is there any alternative to fix this? Thanks
Moz Bar | | AllAboutShapewear2 -
Moz Keyword Explorer how long does it take to gather metrics
Looking at some low volume keywords, I get lots of "Gathering Metrics". These don't seem to be updating with metrics. How long does this typically take, should I set and come back in a couple of hours, overnight? Should I leave the screen open, or will this stop it working? Justin
Moz Bar | | GrouchyKids0 -
Alternative to Moz Content?
Hi, Looks like moz content is really gone 😞 Does anyone have an alternative that does sort of the same thing?
Moz Bar | | mikeymosh1 -
MOZ crawler has been finding a lot of 803 and 804 errors
During last 3 weeks MOZ crawler has been finding a lot of 803 and 804 errors. Meanwhile all pages seem to be working fine. What could cause it?
Moz Bar | | Paruyr0 -
I update content and then craw but the MOZ spider still shows old content. Do I need to update something else?
"This shows but was replaced a day before I ran Moz crawer: | We provide a full service for low cost automated phone calls, robocalls, Bulk SMS service, Political robo calls without needing computer skills | "
Moz Bar | | ThomasDaBomb
I look in the link on website and see:
<title>Our customers talk about: Currently the tremendous growth of organi</title> Why does the craw not reflect the current content? Thanks.
Thomas0 -
Moz Crawler URL paramaters & duplicate content
Hi all, this is my first post on Moz Q&A 🙂 Questions: Does the Moz Crawler take into account rel="canonical" for search results pages with sorting / filtering URL parameters? How much time does it take for an issue to disappear from the issues list after it's been corrected? Does it come op in the next weekly report? I'm asking because the crawler is reporting 50k+ pages crawled, when in reality, this number should be closer to 1000. All pages with query parameters have the correct canonical tag pointing to the root URL, so I'm wondering whether I need to noindex the other pages for the crawler to report correct data?: Original (canonical URL): DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas Filter active URL: DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas&booking_date=&booking_days=1&booking_persons=1&priceFilter%5B%5D=0%2C500&includedPriceFilter%5B%5D=drinks-soft Also, if noindex is the only solution, will it impact the ranking of the pages involved? Note: Google and Bing are semi-successful in reporting index page count, each reporting around 2.5k result pages when using the site:DOMAIN.com query. The rel canonical tag was missing for a short period of time about 4 weeks ago, but since fixing the issue these pages still haven't been deindexed. Appreciate any suggestions regarding Moz Crawler & Google / Bing index count!
Moz Bar | | Vukan_Simic0 -
Moz Crawl Test: Referrer is sitemap.gz?
Hi,
Moz Bar | | Titan552
I'm looking at a crawl test report, and I'm seeing that most of the pages have the sitemamp.gz file listed as the referrer. As I recall in my other reports the referrer is usually the root domain - unless of course there's a redirect. Does having sitemap.gz as the referrer indicate a problem? If so, what problem does it indicate? Thanks!0 -
Moz "Crawl Diagnostics" doesn't respect robots.txt
Hello, I've just had a new website crawled by the Moz bot. It's come back with thousands of errors saying things like: Duplicate content Overly dynamic URLs Duplicate Page Titles The duplicate content & URLs it's found are all blocked in the robots.txt so why am I seeing these errors?
Moz Bar | | Vitalized
Here's an example of some of the robots.txt that blocks things like dynamic URLs and directories (which Moz bot ignored): Disallow: /?mode=
Disallow: /?limit=
Disallow: /?dir=
Disallow: /?p=*&
Disallow: /?SID=
Disallow: /reviews/
Disallow: /home/ Many thanks for any info on this issue.0