Mozbot Can Not Crawl Entire Domain
-
I'm trying to crawl Redken.com in Moz Analytics and the Search Diagnostics is only crawling 4 pages. The domain uses a "select your country" the first time you visit, and it seems as though the bot is not getting beyond that (aka, not clicking on "USA") and is therefore not crawling the rest of the domain. There is no country specific URL other than redken.com.
I've tried entering both "redken.com" and "www.redken.com" as the URL, but no luck.
Any tips?
-
It's caused by the way you have build your site. If you click on redken.com - you get the choice of language. If you select "USA" you're redirected with 302 to redken.com/USA - then with 302 to redken.com/?country=USA then with 302 to redken.com I guess for browsers you store this somewhere (cookie?) - however for a simple bot (like Moz - but I have the same with Screaming Frog) - you just go back where you started = redken.com which again will start the same loop.
So - only 4 url's can be crawled. The other countries are on different url's so will not be included in the crawl.
Google bot is smarter and acts more like a real browser so will crawl the site - but Mozbot can't do that.
rgds
Dirk
Update - I actually forgot one redirect - redken.com first is redirected with 302 to redken.com/international
PS The site is horribly slow as well - and the redirect chain is certainly not helping.
-
Well, I just noticed that website is in flash! I believe non of crawl bots are able to crawl flash websites.
It seems that if I try to access redken.com it redirects me to flash version (/international).
Actually, now I can't recreate that. Super weird. Is there something "special" going on with automatic redirects? Look into that.
-
Thanks for the response!
These are the pages it crawled.
<colgroup><col width="420"></colgroup>
| http://redken.com |
| http://www.redken.com/ |
| http://www.redken.com/international/ |
| http://www.redken.com/USA |
| http://www.redken.com/?country=USA |Robots.txt looks clean, nothing that should have stopped it from crawling more.
-
Hi there.
Which pages are those 4 pages? Is your robots.txt blocking it for some reason maybe?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site with 2 domains - 1 domain SEO opimised & 1 is not. How best to handle crawlers?
Situation: I have a dual domain site:
Getting Started | | DGAU
Domain 1 - www.domain.com is SEO optimised with product pages and should of course be indexed.
Domain 2 - secure.domain.com is not SEO optimised and simply has checkout and payment gateway pages. I've discovered that Moz automatically crawls Domain 2 - the secure.domain.com site and consequently picks up hundreds of errors.
I have put an end to this by adding a robots.txt to stop rogerbot and dotbot (mozs crawlers) from crawling domain 2. This fixes my errors in Moz reports however after doing more research into 'Crawler Control' I figure this might be the best option. My Question: Instead of using robots.txt to stop moz from crawing all of Domain 2 should I use on each page of domain 2? I believe this would then allow moz and google to crawl Domain 2 but also tell them both not to index it.
My understanding is that this would be best, and might even help my overall SEO by telling google not to give any SEO value to the Domain 2 pages?0 -
Site Crawl - Crawls only homepage?
Hi Moz Comunity! Joined Moz just 2 weeks ago and slowly trying to get used to tools available in here! Great tools and info available on this site! My concern is that Site Crawl of Moz in my Campaign seems to have crawled only my homepage and no other sub-domains, is there any reason for this? FOr some reason it seems that Moz interacts only with my homepage? Even when I tried the Keyword Exlporer set on Keyword to see if any of my pages rank for any keywords, it seems only my homepage was ranking for a few keywords. It's possible my other sub-domains don't rank for any keywords yet but still, seems suspicious... I have added a link to Site Crawl that says it has crawled only 3 pages on my site, and all are my homepage... Thanks for any help! Jacob s!AlxV7sobbcgmhJB_fXcF4EPzbPSovA
Getting Started | | Shotlife_Studio0 -
How can I find out what is the list of keywords I currently use in my website?
How can I find out what is the list of keywords I currently use in my website? In other words I want to know my current state of keywords
Getting Started | | Rosalia.Perez0 -
Why can't I Ctrl + click on links on Moz any more?
I'm interested if it's just me that gets frustrated by this? I've just Ctrl + clicked a few links to open them in separate tabs and then realised that none of them had opened. I know it's been like this for a while. It's a usability issue as it goes against expected norms, and now I have to right-click and then click "Open in new tab" on each link, which is more time-consuming and frustrating. More and more websites seem to be losing their Ctrl + click on links ability (JavaScript often breaks it). I don't know if there's a Mac equivalent... Anyway, I hope that doesn't seem like I'm too angry. It just frustrates me a little and I hope it gets fixed. 🙂 Edit - I've just realised these are getting blocked by Chrome's pop-up blocker - but why? It's only an issue on a small number of websites.
Getting Started | | Alex-Harford1 -
How to locate page with the duplicate title? (Crawl Diagnostics - Duplicate Titles Warning)
I am looking through my crawl diagnostics and one of my errors states that a page has a duplicate title. My problem is that I do not know how to find the duplicate. Any advice here?
Getting Started | | bearpaw0 -
How I can have my name in Custom Report
Dear Team, How i can edit and display my name in Custom report in place of MOZ.
Getting Started | | 1akal0 -
High Number of Crawl Errors for Blog
Hello All, We have been having an issue with very high crawl errors on websites that contain blogs. Here is a screenshot of one of the sites we are dealing with: http://cl.ly/image/0i2Q2O100p2v . Looking through the links that are turning up in the crawl errors, the majority of them (roughly 90%) are auto-generated by the blog's system. This includes category/tag links, archived links, etc. A few examples being: http://www.mysite.com/2004/10/ http://www.mysite.com/2004/10/17/ http://www.mysite.com/tagname As far as I know (please correct me if I'm wrong!), search engines will not penalize you for things like this that appear on auto-generated pages. Also, even if search engines did penalize you, I do not believe we can make a unique meta tag for auto-generate pages. Regardless, our client is very concerned seeing these high number of errors in the reports, even though we have explained the situation to him. Would anyone have any suggestions on how to either 1) tell Moz to ignore these types of errors or 2) adjust the website so that these errors now longer appear in the reports? Thanks so much! Rebecca
Getting Started | | Level2Designs0 -
How to get moz to crawl a staging domain that is blocked by robots.txt
Is it possible to get Moz to do a crawl report on a domain blocked by robots.txt and actually display all errors instead of only one saying the domain was blocket in robots.txt? Anything i can add to robots.txt to make moz able to do the crawl report but still hinder google from crawling a staging domain?
Getting Started | | classifiedtech0