Roger bot taking a long time to crawl site
-
Hi all, I've noticed Roger bot is taking a long time to crawl my new site. It started on the 28th Feb 2013 and is still going. There aren't many pages at the moment. Any ideas please?
thanks a lot, Mark.
-
Hi Peter
thanks for your reply. The crawl has now completed and given me some more areas to work on, it's a great tool.
I was so preoccupied with 'hiding' the site over the last couple of months with the easy code:
User-agent: * Disallow: /
I hadn't thought beyond this.
I've noticed Google has now recognised the new robots.txt which has allowed the sitemap to be accepted..
I'll look at your notes, thank you, and work out my next move. I'll let you know how I get on too.
I know (well think) I have to get noindex, follow for 'sorted' category pages...
all the best, Mark.
-
Hi Mike
The crawl has now completed, thank you. I think the results will keep me occupied
all the best, Mark.
-
Hi Mark,
Sorry it's taking a while to crawl your new site.
While I'm not exactly sure what the delay is, one of the possible reasons is through your robots.txt. Here's what I see in a short snippet from your robots.txt:
# Crawlers Setup User-agent: * Crawl-delay: 30 # Allowable Index Allow: /*?p= Allow: /index.php/blog/ Allow: /catalog/seo_sitemap/category/ Allow: /catalogsearch/result/ Allow: /media/ # Directories Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ Disallow: /lib/ Disallow: /magento/ Disallow: /pkginfo/ Disallow: /report/ From here, the formatting looks a little awkward. What's going on is that you're telling Roger bot to only look at these:
Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
Allow: /catalogsearch/result/
Allow: /media/While the syntax is OK, not every crawler out there will follow the allow directive. Here's an example something you can use.
# Crawlers Setup User-agent: * Crawl-delay: 30 Disallow: / Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ From here you're telling the crawler to disallow nothing except these directories. Please let us know once you implement this method is that will actually fix the crawl. Thanks for reaching out! Best, Peter Li SEOmoz Help Team ```
-
Hi Mark,
This sounds like a bug or issue with the SEOmoz software.
Contact help@seomoz.org and ask one of the help associates to look into this for you.
If you do not have many pages, it definitely shouldn't take that long.
The help team responds extremely quickly!
Good luck.
Mike
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl at a stand still
Hello Moz'ers, More questions about my Shopify migration...it seems that I'm not getting indexed very quickly (it's been over a month since I completed the migration) - I have done the following: used an Seo app to find and complete redirects (right away) used the same app to straighten out title tags, metas and alt tags submitted the sitemap re-submitted my main product URL's via Fetch checked the Console - no reported blocks or crawl errors I will mention that I had to assign my blog to a sub-domain because Shopify's blog platform is awful. I had a lot of 404's on the blog, but fixed those. The blog was not a big source of traffic (I'm an ecomm business) Also, I didn't have a lot of backlinks, and most of those came along anyway. I did have a number of 8XX and 9XX errors, but I spoke to Shopify about them and they found no issues. In the meantime, those issues pretty much disappeared in the MOZ reporting. Any duplicate page issues now have a 200 code since I straightened out the title tags. So what am I missing here? Thanks in advance, Sharon
Technical SEO | | Sharon2016
www.zeldassong.com0 -
Google Bot Noindex
If a site has the tag, can it still be flagged for duplicate content?
Technical SEO | | MayflyInternet0 -
Site Link Issues
For several search terms I get site links for the page http://www.waikoloavacationrentals.com/kolea-rentals/kolea-condos/ It makes sense that that page be a site link as it is one of my most used pages, but the problem is google gave it the site link "Kolea 10A". I am having 0 luck making any sense of why that was chosen. It should be something like "Kolea Condos" or something of that nature. Does anyone have any thoughts on where google is coming up with this?
Technical SEO | | RobDalton0 -
Trying to get google to know my site is a magazine site is this wrong
Hi, i have put a line to describe what my site is at the top of my site and i want to know if this is wrong or not. We have dropped frok being number one in google for lifestyle magazine to now number seven. Before we had to redo our site we were number one and then we dropepd to around number four when we finished the site and now we are number seven and i need to try and get back up there. To help google know we are a lifestyle magazine i have put a line at the top of the site and i want to know if this looks out of place and if i should take it down. i need advice on how to get google to know we are a lifestyle magazine and get back in the top five of google my site is www.in2town.co.uk any help would be great
Technical SEO | | ClaireH-1848860 -
Remove Site from Google
How can I get my website out of google? I want all pages completely gone. Thanks!
Technical SEO | | tylerfraser0 -
Does this page crawl well?
I just put up a page that uses an image map to illustrate a national currency note. http://www.antiquebanknotes.com/NationalCurrency/National-Bank-Note-Information.aspx My goal with this page is get results for National Bank Note. But I know image maps are wierd creatures and not good for linking. My question is, will Google index my tooltips and find this page useful and therefore worthy? I think the content is useful for my users but I just don't know if the implementation will work well. This screen will eventually have 5 or 6 notes on it and I don't want to do it the concensus is negative... Thanks for any advice.
Technical SEO | | Banknotes0 -
Best SEO strategy for a site that has been down
Because of hosting problems we're trying to work out, our domain was down all weekend, and we have lost all of our rankings. Doe anyone have any experience with this kind of thing in terms of how long it takes to figure out where you stand once you have the site back up? what the best SEO strategy is for immediately addressing this problem? Besides just plugging away at getting links like normal, is there anything specific we should do right away when the site goes back up? Resubmit a site map, etc? Thanks!
Technical SEO | | OneClickVentures0 -
Blocking Google from Crawling Parameters
Hi guys: What is the best way to keep Google from crawling certain urls with parameters? I used the setting in Webmaster Tools, but that doesn't seem to be helping at all. Can I use robots.txt or some other method? Thanks! Some examples are: <colgroup><col width="797"></colgroup> www.mayer-johnson.com/category/assistive-technology?manufacturer=179 www.mayer-johnson.com/category/assistive-technology?manufacturer=226 www.mayer-johnson.com/category/assistive-technology?manufacturer=227 <colgroup><col width="797"></colgroup> www.mayer-johnson.com/category/english-language-learners?condition=212 www.mayer-johnson.com/category/english-language-learners?condition=213 www.mayer-johnson.com/category/english-language-learners?condition=214 <colgroup><col width="797"></colgroup>
Technical SEO | | DanaDV
| www.mayer-johnson.com/category/english-language-learners?roles=164 |
| www.mayer-johnson.com/category/english-language-learners?roles=165 |
| www.mayer-johnson.com/category/english-language-learners?roles=197 | | |0