GWT False Reporting or GoogleBot has weird crawling ability?
-
Hi I hope someone can help me.
I have launched a new website and trying hard to make everything perfect. I have been using Google Webmaster Tools (GWT) to ensure everything is as it should be but the crawl errors being reported do not match my site. I mark them as fixed and then check again the next day and it reports the same or similar errors again the next day.
Example:
http://www.mydomain.com/category/article/ (this would be a correct structure for the site).
GWT reports:
http://www.mydomain.com/category/article/category/article/ 404 (It does not exist, never has and never will) I have been to the pages listed to be linking to this page and it does not have the links in this manner. I have checked the page source code and all links from the given pages are correct structure and it is impossible to replicate this type of crawl.
This happens accross most of the site, I have a few hundred pages all ending in a trailing slash and most pages of the site are reported in this manner making it look like I have close to 1000, 404 errors when I am not able to replicate this crawl using many different methods.
The site is using a htacess file with redirects and a rewrite condition.
Rewrite Condition:
Need to redirect when no trailing slash
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !.(html|shtml)$
RewriteCond %{REQUEST_URI} !(.)/$
RewriteRule ^(.)$ /$1/ [L,R=301]The above condition forces the trailing slash on folders.
Then we are using redirects in this manner:
Redirect 301 /article.html http://www.domain.com/article/
In addition to the above we had a development site whilst I was building the new site which was http://dev.slimandsave.co.uk now this had been spidered without my knowledge until it was too late. So when I put the site live I left the development domain in place (http://dev.domain.com) and redirected it like so:
<ifmodule mod_rewrite.c="">RewriteEngine on
RewriteRule ^ - [E=protossl]
RewriteCond %{HTTPS} on
RewriteRule ^ - [E=protossl:s]RewriteRule ^ http%{ENV:protossl}://www.domain.com%{REQUEST_URI} [L,R=301]</ifmodule>
Is there anything that I have done that would cause this type of redirect 'loop' ?
Any help greatly appreciated.\
-
Yeah - do this!
-
Anyone any thoughts on this?
-
Sorry I also should add that the url structure that google generates is like this:
http://www.domain.com/category/article/
http://www.domain.com/category/article/same-category/differentarticle/
http://www.domain.com/category/article/same-category/another-different-article/
http://www.domain.com/category/article/another-different-category/differentarticle/
etc, it is like it gets to a category article and then moves sideways and somehow adds the move onto the current url without keeping hold of the suffix of the URL
-
Doesn't sound like GWT is false reporting. May want to check your trailing slash URL rewrite. It seems like there is an issue there as what you are describing sounds like the URLs are being written incorrectly and causing the incorrect URLs to be generated and show up in GWT.
Your 301 looks ok and if the dev site was spidered and indexed, you should just add the site to GWT and then use the URL removal tool to remove the site from the index, then remove the site and redirect.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Where are the crawled URLS in webmaster tools coming from?
When looking at the crawl errors in Webmaster Tools/Search Console, where is Google pulling these URLs from? Sitemap?
Technical SEO | | SEOhughesm0 -
I've had a sudden a increase in crawl issues as of yesterday (like 300 from a steady 10, does anyone else have this issue?
the main issue is that it's now indexing both www and http:// - anyone else got this issue or had any changes suddenly on their crawl results?
Technical SEO | | beckyhy0 -
Crawl Diagnostics: Duplicate Content Issues
The Moz crawl diagnostic is showing that I have some duplicate content issues on my site. For the most part, these are variations of the same product that are listed individually (i.e size/color). What would be the best way to deal with this? Choose one variation of the product and add a canonical tag? Thanks
Technical SEO | | inhouseseo0 -
Weird Cigarette URLs showing up in Google Webmaster Tools
Hi there, I'm noticing a bunch of URLs showing up in my google webmaster tools that are all cigarette related (they are appearing as 404s in the crawl error report). They are throwing 404 errors which is why they are listed here... Anyone have any idea of what this could be? I recently switched from Wordpress to Shopify and these weird URLs just started appearing on my webmaster tools in the last week. Kinda bizarre / a little alarming! Thanks,
Technical SEO | | TheBatesMillStore
Bianca0 -
All other things equal, do server rendered websites rank higher than JavaScript web apps that follow the AJAX Crawling Spec?
I instinctively feel like server rendered websites should rank higher since Google doesn't truly know that the content its getting from an AJAX site is what the user is seeing and Google isn't exactly sure of the page load time (and thus user experience). I can't find any evidence that would prove this, however. A website like Monocle.io uses pushstate, loads fast, has good page titles, etc., but it is a JavaScript single page application. Does it make any difference?
Technical SEO | | jeffwhelpley0 -
Crawl rate
Hello, In google WMT my site has the following message. <form class="form" action="/webmasters/tools/settings-ac?hl=en&siteUrl=http://www.prom-hairstyles.org/&siteUrl=http://www.prom-hairstyles.org/&hl=en" method="POST">Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate.Why would this be?A bit of backgound - this site was hammered by Penguin or maybe panda but seems to be dragging itself back up (maybe) but has dropped from several thousand visitors/day to 100 or so.Cheers,Ian</form>
Technical SEO | | jwdl0 -
424 Crawl Notices Found - Most of these notices are 301 redirects for our blog. Are notices something that would keep me from ranking well for my keywords?
212 are rel canonical and 176 are 301 permanent re-direct. An example of the re-direct is a change I made to the /trackback 302 status on my blog like; http://www.bluesunproperties.com/2012-spring-biker-rally-thunder-beach/trackback/ Are these Crawl Notices something that I should spend resources on, or should I focus more on my errors and warnings?
Technical SEO | | classa0 -
Crawl Errors for duplicate titles/content when canonicalised or noindexed
Hi there, I run an ecommerce store and we've recently started changing the way we handle pagination links and canonical links. We run Magento, so each category eg /shoes has a number of parameters and pages depending on the number of products in the category. For example /shoes?mode=grid will display products in grid view, /shoes?mode=grid&p=2 is page 2 in grid mode. Previously, all URL variations per category were canonicalised to /shoes. Now, we've been advised to paginate the base URLs with page number only. So /shoes has a pagination next link to /shoes?p=2, page 2 has a prev link to /shoes and a next link to /shoes?p=3. When any other parameter is introduced (such as mode=grid) we canonicalise that back to the main category URL of /shoes and put a noindex meta tag on the page. However, SEOMoz is picking up duplicate title warnings for urls like /shoes?p=2 and /shoes?mode=grid&p=2 despite the latter being canonicalised and having a noindex tag. Presumably search engines will look at the canonical and the noindex tag so this shouldn't be an issue. Is that correct, or should I be concerned by these errors? Thanks.
Technical SEO | | Fergus_Macdonald0