Crawl Diagnostics - unexpected results
-
I received my first Crawl Diagnostics report last night on my dynamic ecommerce site.
It showed errors on generated URLs which simply are not produced anywhere when running on my live site. Only when running on my local development server.
It appears that the Crawler doesn't think that it's running on the live site.
For example
http://www.nordichouse.co.uk/candlestick-centrepiece-p-1140.html
will go to a Product Not Found page, and therefore Duplicate Content errors are produced.
Running
http://www.nhlocal.co.uk/candlestick-centrepiece-p-1140.html
produces the correct product page and not a Product Not Found page
Any thoughts?
-
Hi Nordichouse,
Sorry it took awhile for me to get back to you on this.
I agree with the SEOmoz techs, it doesn't matter if it is a crawler or a actual person, if you go to an invalid url you should be redirected as 301 to the actual page. If the product doesn't exist it should not allow for superfluous urls.
So basically what you should have is if the product exist then the site redirects to the correct URL. If it doesn't exist then send any query for that product to the same page and display the oscommerce product not found message. By doing this you prevent the system from creating upteenthousand urls for each product.
If you want to test what I mean you can visit our store a www.rubberstore.com/catalog and try a few urls like:
catalog/nipple-clips-p-1000.html
we don't have a product with the id of 1000 so you'll get redirected to the not found message and the root page
-p-1000.htmlhowever if you try:
catalog/a-fake-url-p-29.html
you'll get redirected to our actual product page matching this product id.Hope that makes since. All this is done with the .htaccess url re-writter I posted above.
-
Don
Yes, that is how it is done and there is no problem with that. The above is just how inbound URLs get processed.
The issue here is how the crawler works. The only possible way for this particular URL to be generated is for a certain parameter to be appended to the URL - and that would be unusual (unless SEOmoz techies tell me different)
Alan
-
Did you ever have a product with the id of 1140? If you look at your products table just check the auto number in the product_id column..
If you did and it was live at some point it could be finding the old product based on the old url it used to have.
If you never made that product live then I don't know how a crawler could of found a product that doesn't exist unless they starting using some technology that I'm unaware of.
Since you said you use OSC this what we use to deal with the problem I outlined above..
Begin Ultimate SEO V2.2d
Options +FollowSymLinks
RewriteEngine On# RewriteBase instructions
# Change RewriteBase dependent on how your shop is accessed as below.
# http://www.mysite.com = RewriteBase /
# http://www.mysite.com/catalog/ = RewriteBase /catalog/
# http://www.mysite.com/catalog/shop/ = RewriteBase /catalog/shop/# Change the following line using the instructions above
RewriteBase /catalog/RewriteRule ^(.)-p-(.).html$ product_info.php?products_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-c-(.).html$ index.php?cPath=$2&%{QUERY_STRING}
RewriteRule ^(.)-m-(.).html$ index.php?manufacturers_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-pi-(.).html$ popup_image.php?pID=$2&%{QUERY_STRING}
RewriteRule ^(.)-by-(.).html$ all-products.php?fl=$2&%{QUERY_STRING}
RewriteRule ^(.)-t-(.).html$ articles.php?tPath=$2&%{QUERY_STRING}
RewriteRule ^(.)-a-(.).html$ article_info.php?articles_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-au-(.).html$ articles.php?authors_id=$2&%{QUERY_STRING}
#RewriteRule ^(.)-pr-(.).html$ product_reviews.php?products_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-pri-(.).html$ product_reviews_info.php?products_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-f-(.).html$ faqdesk_info.php?faqdesk_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-fc-(.).html$ faqdesk_index.php?faqPath=$2&%{QUERY_STRING}
RewriteRule ^(.)-fri-(.).html$ faqdesk_reviews_info.php?faqdesk_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-fra-(.).html$ faqdesk_reviews_article.php?faqdesk_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-i-(.).html$ information.php?info_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-links-(.).html$ links.php?lPath=$2&%{QUERY_STRING}
RewriteRule ^(.)-pm-([0-9]+).html$ info_pages.php?pages_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-n-(.).html$ newsdesk_info.php?newsdesk_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-nc-(.).html$ newsdesk_index.php?newsPath=$2&%{QUERY_STRING}
RewriteRule ^(.)-nri-(.).html$ newsdesk_reviews_info.php?newsdesk_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-nra-(.).html$ newsdesk_reviews_article.php?newsdesk_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-po-([0-9]+).html$ pollbooth.php?pollid=$2&%{QUERY_STRING}End Ultimate SEO V2.2d
You may try it to see if it helps fix your issue.
-
Thanks, Don
You are right in your analysis - it is osC, but highly modified by myself. Yes, it does redirect.
That, however, is not the point. On the live site, the URL containing 1140 (for example) is never generated.
The mystery is how the Crawler can find something that isn't there! Magic.
Alan
-
Hi nordichouse,
You may want to check with your CMS provider. The urls are similar to Oscommerce which I'm experienced with, but I can see that isn't an Oscommerce setup. The system should have some sort of URL re-writer to deal with this problem.
The issue that I see is the system actually doesn't care what you type in between .co.uk/ and -p-1140.html
For example try this url to get a valid product..
http://www.nordichouse.co.uk/nipple-clips-p-1000.html
which is the same as
http://www.nordichouse.co.uk/-p-1000.html
But should 301 redirect to: http://www.nordichouse.co.uk/linen-style-collection-p-1000.htmlOscommerce has a URL 301 re-writer that prevents the system for using incorrect URL's I would hope your system does as well.
I'm not trying to avoid helping you, but the without an exact knowledge of how the system handles URL's it generates it is hard to troubleshoot, however since it is a CMS somebody who works on it should already have this knowledge.
My best,
Don
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why might Google be crawling via old sitemap, when the new one has been submitted and verified?
We have recently relaunched Scoutzie.com and re-submitted our new sitemap to Google. When I look on Webmaster tools, our new sitemap has been submitted just fine, but at the same time, Google is finding a lot of 404s when crawling the site. My understanding, it is still using crawling the old links, which do not exists. How can I tell Google to refresh it's index and to stop looking at all the old links?
Moz Pro | | scoutzie0 -
Crawl diagnostics incorrectly reporting duplicate page titles
Hi guys, I have a question in regards to the duplicate page titles being reported in my crawl diagnostics. It appears that the URL parameter "?ctm" is causing the crawler to think that duplicate pages exist. In GWT, we've specified to use the representative URL when that parameter is used. It appears to be working, since when I search site:http://www.causes.com/about?ctm=home, I am served a single search result for www.causes.com/about. That begs the question, why is the SEOMoz crawler saying there is duplicate page titles when Google isn't (doesn't appear under the HTML improvements for duplicate page titles)? A canonical URL is not used for this page so I'm assuming that may be one reason why. The only other thing I can think of is that Google's crawler is simply "smarter" than the Moz crawler (no offense, you guys put out an awesome product!). Any help is greatly appreciated and I'm looking forward to being an active participant in the Q&A community! Cheers, Brad
Moz Pro | | brad_dubs0 -
Site Redesign Launch - How Can I crawl for immediate review
Just redesigned my site and want to have a crawl done to check for errors or any items which need to be cleaned up. Anyone know how I can do this as SEOMoz only crawls once per week. Thanks!
Moz Pro | | creativemobseo0 -
How to force SeoMoz to re-crawl my website?
Hi, I have done a lot of changes on my website to comply with SeoMoz advices, now I would like to see if I have better feedback from the tool, how can I force it to re-crawl a specific campaign? (waiting another week is too long :-))
Moz Pro | | oumma0 -
Why does Crawl Diagnostics report this as duplicate content?
Hi guys, we've been addressing a duplicate content problem on our site over the past few weeks. Lately, we've implemented rel canonical tags in various parts of our ecommerce store, over time, and observing the effects by both tracking changes in SEOMoz and Websmater tools. Although our duplicate content errors are definitely decreasing, I can't help but wonder why some URLs are still being flagged with duplicate content by our SEOmoz crawler. Here's an example, taken directly from our Crawl Diagnostics Report: URL with 4 Duplicate Content errors:
Moz Pro | | yacpro13
/safety-lights.html Duplicate content URLs:
/safety-lights.html ?cat=78&price=-100
/safety-lights.html?cat=78&dir=desc&order=position /safety-lights.html?cat=78 /safety-lights.html?manufacturer=514 What I don't understand, is all of the URLS with URL parameters have a rel canonical tag pointing to the 'real' URL
/safety-lights.html So why is SEOMoz crawler still flagging this as duplicate content?0 -
SEOMoz Crawling Only 1 Page
I entered a new site into my dashboard 2 days ago - everything looked kosher, there were a few hundred pages crawled and a whole bunch of errors. I came back this morning to start work on the site and SEOMoz has crawled the site again, this time returning only 1 page and 0 errors. I haven't even logged in to the site since the first crawl, so I couldn't have broken anything. Has anyone seen this before?
Moz Pro | | Junction0 -
How long is a full crawl?
It's been now over 3 days that the dashboard for one of our campaigns shows "Next Crawl in Progress!". I am not complaining about the length... but I have to agree that SEOMoz is quite addictive, and it's quite frustrating to see that everyday 🙂 Thanks
Moz Pro | | jgenesto0 -
Crawling One Page
I set up a profile for a site with many pages, opting for setting up as a root directory. When SEOMoz crawled, they only found one page. Any ideas for why this would be? Thanks!
Moz Pro | | Group160