Roger bot taking a long time to crawl site
-
Hi all, I've noticed Roger bot is taking a long time to crawl my new site. It started on the 28th Feb 2013 and is still going. There aren't many pages at the moment. Any ideas please?
thanks a lot, Mark.
-
Hi Peter
thanks for your reply. The crawl has now completed and given me some more areas to work on, it's a great tool.
I was so preoccupied with 'hiding' the site over the last couple of months with the easy code:
User-agent: * Disallow: /
I hadn't thought beyond this.
I've noticed Google has now recognised the new robots.txt which has allowed the sitemap to be accepted..
I'll look at your notes, thank you, and work out my next move. I'll let you know how I get on too.
I know (well think) I have to get noindex, follow for 'sorted' category pages...
all the best, Mark.
-
Hi Mike
The crawl has now completed, thank you. I think the results will keep me occupied
all the best, Mark.
-
Hi Mark,
Sorry it's taking a while to crawl your new site.
While I'm not exactly sure what the delay is, one of the possible reasons is through your robots.txt. Here's what I see in a short snippet from your robots.txt:
# Crawlers Setup User-agent: * Crawl-delay: 30 # Allowable Index Allow: /*?p= Allow: /index.php/blog/ Allow: /catalog/seo_sitemap/category/ Allow: /catalogsearch/result/ Allow: /media/ # Directories Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ Disallow: /lib/ Disallow: /magento/ Disallow: /pkginfo/ Disallow: /report/ From here, the formatting looks a little awkward. What's going on is that you're telling Roger bot to only look at these:
Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
Allow: /catalogsearch/result/
Allow: /media/While the syntax is OK, not every crawler out there will follow the allow directive. Here's an example something you can use.
# Crawlers Setup User-agent: * Crawl-delay: 30 Disallow: / Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ From here you're telling the crawler to disallow nothing except these directories. Please let us know once you implement this method is that will actually fix the crawl. Thanks for reaching out! Best, Peter Li SEOmoz Help Team ```
-
Hi Mark,
This sounds like a bug or issue with the SEOmoz software.
Contact help@seomoz.org and ask one of the help associates to look into this for you.
If you do not have many pages, it definitely shouldn't take that long.
The help team responds extremely quickly!
Good luck.
Mike
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Tracing Redirects to a Site
I wonder if anyone has used any tools where you can trace the redirects pointing to a site? I know there are a number of tools out there that can be used to check where a URL redirects to, but I was wondering if anyone has used a tool where I could trace all redirects with the final URL? I am using this for competitor research so I don't have access to Analytics or Webmaster Tools.
Technical SEO | | BeattieGroup0 -
How long does it take before a site is back in the SERP after a manual spamaction is revoked
Hi, A client of ours has a website with a manual spam action (duplicate content). Because they made some mistakes with redirects while moving the site from a URL to another google penitalized the site. We fixed the errors and the spamction is revoked. My question is how long it ussualy takes before the first results are back in the SERP. In WMT Google says "some time". But has anyone got some more information on it? Best Regards, Sam
Technical SEO | | U-Digital0 -
Title Element Too Long; Should I remove site name even if keyword(s)?
Hi all I have numerous pages (37) with a title element that is too long.
Technical SEO | | andystorey
Over by 24 is the worst. Here's an example http://cycling-jersey-collection.com/browse-collection/de-nardi-colpack-serhiy-honchar-ukrainian-national-champion-santini-jersey/ Now, the easy route would be to remove "- Cycling Jersey Collection" (the name of the site) which would solve all of these too long warnings. However, given I want to rank well (and I do) for "cycle jersey collection" would removing these hurt my ranking position? Thanks andy0 -
Multiple Domains for One Site
We are building a site for a new miniature golf course. They have a long name, which they don't want me to mention, but it's equivalent to a name like Golden State Golf and Putt. They also have a restaurant with its own name and brand that will be a part of the mini golf course and its website, much how Hotel websites have their restaurants on their sites. Before becoming our client they purchased golfandputt.com and want to go with this domain for simplicity sake. In addition to this domain name they purchased 7 others that contain the bussiness' full name in some way, such as: goldenstategolfandputt.com goldenstategolfandputt.net, goldenstategolf-guitar.com etc., As well as: 3 variations of the golfandputt.com domain 3 variations of the restaurants name They wish to have all of these redirect to the main website or the restaurant page to "help with SEO," as they told me. From what I have researched on SEOmoz it seems better to simply optimize the website for Golden State Golf and Putt and the restaurant page for the restaurant's name. Additionally, I'm worried that redirecting the domains to the site will actually hurt them in rankings. If someone can shed some light on what the best practices for this sort of situation are I'd be much appreciative. Apologies in advance for the lengthy explanation but its a bit of a unique situation.
Technical SEO | | TVI0 -
Does this page crawl well?
I just put up a page that uses an image map to illustrate a national currency note. http://www.antiquebanknotes.com/NationalCurrency/National-Bank-Note-Information.aspx My goal with this page is get results for National Bank Note. But I know image maps are wierd creatures and not good for linking. My question is, will Google index my tooltips and find this page useful and therefore worthy? I think the content is useful for my users but I just don't know if the implementation will work well. This screen will eventually have 5 or 6 notes on it and I don't want to do it the concensus is negative... Thanks for any advice.
Technical SEO | | Banknotes0 -
How a google bot sees your site
So I have stumbled across various websites like this: http://www.smart-it-consulting.com/internet/google/googlebot-spoofer/ The concept here is to be able to view your site as a googlebot sees it. However, the results are a little puzzling. Google is reading the text on my page but not the title tags according to the results. Are websites like this accurate OR does Google not read title tags and H1 tags anymore? Also on a slighly related note. I noticed the results show the navigation bar is being read first by google, is this bad and should the navigation bar be optimized for keywords as well? If it did, it would read a bit funny and the "humans" would be confused.
Technical SEO | | StreetwiseReports0 -
Crawl diagnostic summary
In my crawl diagnostic summary its showing an error with duplicate page title and duplicate page content...why its been shown and how it can be rectified? I have pne page web site so i was unable to give options for sub domain name is it because of tht?I hope this error wont hamper my SEO process.
Technical SEO | | strasshgoa0 -
Google.ca is showing our US site instead of our Canada Site
When our Canadian users who search on google.ca for our brand (e.g. Travelocity, Travelocity hotels, etc.), the first few results our from our US site (travelocity.com) rather than our Canadian site (travelocity.ca). In Google Webmaster Tools, we've adjusted the geotargeting settings to focus on the appropriate locale, but the wrong country TLD is still coming up at the top via google.ca. What's the best way to ensure our Canadian site comes up instead of the US site on google.ca? Thanks, Tory Smith
Technical SEO | | travelocitysearch
Travelocity0