Mozbot Can Not Crawl Entire Domain
-
I'm trying to crawl Redken.com in Moz Analytics and the Search Diagnostics is only crawling 4 pages. The domain uses a "select your country" the first time you visit, and it seems as though the bot is not getting beyond that (aka, not clicking on "USA") and is therefore not crawling the rest of the domain. There is no country specific URL other than redken.com.
I've tried entering both "redken.com" and "www.redken.com" as the URL, but no luck.
Any tips?
-
It's caused by the way you have build your site. If you click on redken.com - you get the choice of language. If you select "USA" you're redirected with 302 to redken.com/USA - then with 302 to redken.com/?country=USA then with 302 to redken.com I guess for browsers you store this somewhere (cookie?) - however for a simple bot (like Moz - but I have the same with Screaming Frog) - you just go back where you started = redken.com which again will start the same loop.
So - only 4 url's can be crawled. The other countries are on different url's so will not be included in the crawl.
Google bot is smarter and acts more like a real browser so will crawl the site - but Mozbot can't do that.
rgds
Dirk
Update - I actually forgot one redirect - redken.com first is redirected with 302 to redken.com/international
PS The site is horribly slow as well - and the redirect chain is certainly not helping.
-
Well, I just noticed that website is in flash! I believe non of crawl bots are able to crawl flash websites.
It seems that if I try to access redken.com it redirects me to flash version (/international).
Actually, now I can't recreate that. Super weird. Is there something "special" going on with automatic redirects? Look into that.
-
Thanks for the response!
These are the pages it crawled.
<colgroup><col width="420"></colgroup>
| http://redken.com |
| http://www.redken.com/ |
| http://www.redken.com/international/ |
| http://www.redken.com/USA |
| http://www.redken.com/?country=USA |Robots.txt looks clean, nothing that should have stopped it from crawling more.
-
Hi there.
Which pages are those 4 pages? Is your robots.txt blocking it for some reason maybe?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can I access old data/keyword research if I cancel my Moz Pro account?
I'm currently on the free month trial period for Moz Pro and I will probably cancel the account before the free period ends, but if I want to renew my subscription later, what happens to all the previous data? And does all the keyword research I've done disappear when I cancel it, or is it restored when I renew the subscription? Any insight is helpful! Thank you!
Getting Started | | TeamOneRep0 -
5xx Crawl Issue might not be issues at all. Help
Hi, I ran a crawl test on our website and it came back with 900 5xx potential errors. When I started opening these links 1 by 1 I could see they were actually working. So i exported the full list of 900 and went to the website: https://httpstatus.io/ pasted the links by 100 and used that. They came back with status codes of 301 / 301 / 200 which i believe means they are okay. After reading it says that my programmer may need to see if we are blocking the MOZ BOT or to slow the MOZ BOT down. I guess I'm wondering if this is not done is the site actually having these 5xx errors when Google is Crawling or is it just showing 900 errors because of MOZ BOT but actually things are okay? I know the simple answer is to get the programmer to fix the MOZ BOT issue to know for sure but getting programmers to do things take a lot of time so I'm trying to get a better idea here. Thanks for your input.
Getting Started | | Cfarcher1 -
Crawling issue
Hi, I have to set up a campaign for a webshop. This webshop is a subdomain itself. First question: The two subfolders I need to track are /nl_BE and /fr_BE. What is the best way to handle this? Shall I set up two different campaigns for each subfolder, or shall I just make one campaign and add tags to keywords? **Second question: **it seems like Moz can't crawl enough pages. There are no disallows in the robots.txt. Should I try putting the following at the top into my robots.txt? User-agent: rogerbot
Getting Started | | Mat_C
Disallow: Or is it because I want to crawl only a subdomain that it doesn't work? Thanks0 -
Site Crawl - Crawls only homepage?
Hi Moz Comunity! Joined Moz just 2 weeks ago and slowly trying to get used to tools available in here! Great tools and info available on this site! My concern is that Site Crawl of Moz in my Campaign seems to have crawled only my homepage and no other sub-domains, is there any reason for this? FOr some reason it seems that Moz interacts only with my homepage? Even when I tried the Keyword Exlporer set on Keyword to see if any of my pages rank for any keywords, it seems only my homepage was ranking for a few keywords. It's possible my other sub-domains don't rank for any keywords yet but still, seems suspicious... I have added a link to Site Crawl that says it has crawled only 3 pages on my site, and all are my homepage... Thanks for any help! Jacob s!AlxV7sobbcgmhJB_fXcF4EPzbPSovA
Getting Started | | Shotlife_Studio0 -
Crawl Diagnostics Help
Hi there Where can i find my campaigns crawl diagnostics? I need to find where this information can be found and specific issues? Is this possible, i cant seem to find this info. regards Ana
Getting Started | | Starsia200000 -
How to get moz to crawl a staging domain that is blocked by robots.txt
Is it possible to get Moz to do a crawl report on a domain blocked by robots.txt and actually display all errors instead of only one saying the domain was blocket in robots.txt? Anything i can add to robots.txt to make moz able to do the crawl report but still hinder google from crawling a staging domain?
Getting Started | | classifiedtech0 -
Cant download my crawl csv
When I click on the [download csv] in my crawl campaign section nothing happens.
Getting Started | | digitalmedialounge0