Googlebot crawling partial URLs
-
Hi guys,
I've checked my email this morning and I've got a number of 404 errors over the weekend where Google has tried to crawl some of my existing pages but not found the full URL.
Instead of hitting 'domain.com/folder/complete-pagename.php' it's hit 'domain.com/folder/comp'.
This is definitely Googlebot/2.1; http://www.google.com/bot.html (66.249.72.53) but I can't find where it would have found only the partial URL. It certainly wasn't on the domain it's crawling and I can't find any links from external sites pointing to us with the incorrect URL. GoogleBot is doing the same thing across a single domain but in different sub-folders.
Having checked Webmaster Tools there aren't any hard 404s and the soft ones aren't related and haven't occured since August. I'm really confused as to how this is happening..
Thanks!
-
This is why I love this forum. We recently started seeing these urls in our GWT report. We have hundreds of truncated urls that end in "..." that go nowhere. We can't figure out where these are coming from. We thought it could be G's relatively new privacy policy w/ not passing along the data, but we're not sure. Anyone have any thoughts on that?
Thanks!
-
@vitalscom - it's at least good to know someone else has experienced this!
Due to the volume I don't consider doing 301s a permanent solution. Fortunately there is a noindex on our 404 page so Google et al shouldn't take these errors into consideration.
-
I'm seeing it too - It looks like it's coming from Superpages but the truncated URLs are not actually hyperlinks, so why is Google following them is a good question.
http://swbd-out.superpages.com/webresults.htm?qkw=Find+A+Physician&qcat=web
I'm fixing this on my end with a modrewrite in HTACCESS, all of my sites truncated URL problems either end in ".." or "..." so any URL that ends in those two instances will get 301 redirected to the homepage.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawled page count in Search console
Hi Guys, I'm working on a project (premium-hookahs.nl) where I stumble upon a situation I can’t address. Attached is a screenshot of the crawled pages in Search Console. History: Doing to technical difficulties this webshop didn’t always no index filterpages resulting in thousands of duplicated pages. In reality this webshops has less than 1000 individual pages. At this point we took the following steps to result this: Noindex filterpages. Exclude those filterspages in Search Console and robots.txt. Canonical the filterpages to the relevant categoriepages. This however didn’t result in Google crawling less pages. Although the implementation wasn’t always sound (technical problems during updates) I’m sure this setup has been the same for the last two weeks. Personally I expected a drop of crawled pages but they are still sky high. Can’t imagine Google visits this site 40 times a day. To complicate the situation: We’re running an experiment to gain positions on around 250 long term searches. A few filters will be indexed (size, color, number of hoses and flavors) and three of them can be combined. This results in around 250 extra pages. Meta titles, descriptions, h1 and texts are unique as well. Questions: - Excluding in robots.txt should result in Google not crawling those pages right? - Is this number of crawled pages normal for a website with around 1000 unique pages? - What am I missing? BxlESTT
Intermediate & Advanced SEO | | Bob_van_Biezen0 -
Best SEO url woocommerce, what to do?
Hi! Today we have our product categories indexed (by misstake) and for one of our desired keywords, a category have the nr 1 rank. By misstake, we didnt set nofollow, noindex on our categories, just tags, archives etc. We are now migrating to from Ithemes Exchange to Woocommerce and ime looking on improving our SEO urls for the categories. For keyword "Key1" we rank with this url: http://site/product-category/Key1. The seo meta title and description where untouched when we launched the site last spring so it doesnt look so good.. The plan is to stripe out product-category and instead ad some description ( i have a newly written text of 95 words, 519 letters without space with they keyword precent 5 times in a natural way ) to that particular category and have the url as following: http://site/key1 and then have a 301 redirect for the old http://site/product-category/Key1. What do you think of this? What shall i consider? on the right track? Grateful for any help! // Jonas
Intermediate & Advanced SEO | | knubbz0 -
URL strategy mobile website
Hello everyone, We are facing a challenging decision about where our website (Flash Gaming website) is going. We are in the process of creating html5 games in the same theme of the flash games that we provide to our users. Now our main concern is to decide how to show this new content to the user? Shall we create brand new set of urls such as : http://www.mydomain.com/games/mobile/kids/ Or shall we adapt the main desktop url : http://www.mydomain.com/games/kids/ and show the users two different versions of the page depending on whether they are using a mobile device (so they see a mobile version) or a pc/laptop (so they a see desktop version). Or even redirect people to a sub-domain : http://m.mydomain.com/ The main idea we had is to keep the same url structure, as it seems that google is giving the same search results if you are using a mobile device or not. And creating a new set of urls or even a sub-domain, may involve a lot of work to get those new links to the same PA as the desktop URL that is here and know since a while now. Also the desktop page game should not be accessible to the mobile devices, so should this be redirected (301?) to the mobile homepage of the site? But how google will look at the fact that one url is giving 2 different contents, CSS etc, and also all those redirects might look strange... we are worried that doing so will hurt the page authority and its ranking ... but we are trying to find the best way to combine SEO and user experience. Any input on this will be really appreciated. Cheers,
Intermediate & Advanced SEO | | drimlike0 -
Moving Code for Faster Crawl Through?
What are best practices for moving code into other folders to help speed up a crawling for bots? We once moved some javascript from an SEO's suggestion and the site suddenly looked like crap until we undid the changes. How do you figure our what code should be consolidated? What code do you use to indicate what has been moved and to where?
Intermediate & Advanced SEO | | siteoptimized0 -
Canonical URLs and Sitemaps
We are using canonical link tags for product pages in a scenario where the URLs on the site contain category names, and the canonical URL points to a URL which does not contain the category names. So, the product page on the site is like www.example.com/clothes/skirts/skater-skirt-12345, and also like www.example.com/sale/clearance/skater-skirt-12345 in another category. And on both of these pages, the canonical link tag references a 3rd URL like www.example.com/skater-skirt-12345. This 3rd URL, used in the canonical link tag is a valid page, and displays the same content as the other two versions, but there are no actual links to this generic version anywhere on the site (nor external). Questions: 1. Does the generic URL referenced in the canonical link also need to be included as on-page links somewhere in the crawled navigation of the site, or is it okay to be just a valid URL not linked anywhere except for the canonical tags? 2. In our sitemap, is it okay to reference the non-canonical URLs, or does the sitemap have to reference only the canonical URL? In our case, the sitemap points to yet a 3rd variation of the URL, like www.example.com/product.jsp?productID=12345. This page retrieves the same content as the others, and includes a canonical link tag back to www.example.com/skater-skirt-12345. Is this a valid approach, or should we revise the sitemap to point to either the category-specific links or the canonical links?
Intermediate & Advanced SEO | | 379seo0 -
URL Error or Penguin Penalty?
I am currently having a major panic as our website www.uksoccershop.com has been largely dropped from Google. We have not made any changes recently and I am not sure why this is happening, but having heard all sorts of horror stories of penguin update, I am fearing the worst. If you google "uksoccershop" you will see that the homepage does not rank. We previously ranked in the top 3 for "football shirts" but now we don't, although on page 2, 3 and 4 you will see one of our category pages ranking (this didn't used to happen). Some rankings are intact, but many have disappeared completely and in some cases been replaced by other pages on our site. I should point out our existing rankings have been consistently there for 5-6 years until today. I logged into webmaster tools and thankfully there is no warning message from Google about spam, etc, but what we do have is 35,000 URL errors for pages which are accessible. An example of this is: | URL: | http://www.uksoccershop.com/categories/5_295_327.html | | Error details In Sitemaps Linked from Last crawled: 6/20/12First detected: 6/15/12Googlebot couldn't access the contents of this URL because the server had an internal error when trying to process the request. These errors tend to be with the server itself, not with the request. Is it possible this is the cause of the issue (we are not currently sure why the URL's are being blocked) and if so, how severe is it and how recoverable?If that is unlikely to cause the issue, what would you recommend our next move is?All help is REALLY REALLY appreciated 🙂
Intermediate & Advanced SEO | | ukss19840 -
Mobile URLs stolen and I need them back!
Hi guys, Mobile SEO question. So some time in the past, my client accidentally got a whole bunch of m.example.co.nz URLs indexed due to a link on another website and the awesome relative URL links on my client website. However, now they're building a mobile website and they want all those m.example.co.nz URLs. My question is, if we build a new mobile website and use those mobile website URLs including those already indexed by Google, will Google automatically know after crawling those URLs that they are now for mobile users? Will it change the pages to it's mobile index? Or will it be a case of duplicate content? Thanks Kim
Intermediate & Advanced SEO | | Voonie0 -
Is it OK to have a site that has some URLs with hyphens and other, older, legacy URLs that use underscores?
I'm working with a VERY large site that has recently been redesigned/recategorized. They kept only about 20% of the URLs from the legacy site, the URLs that had revenue tied to them, and these URLs use underscores. Whereas the new URLs created for the site use hyphens. I don't think that this would be an issue for Google, as long as the pages are of quality, but I wanted to get everyone's opinion on this. Will it hurt me to have two different sets of URLs, those with using hyphens and those using underscores?
Intermediate & Advanced SEO | | Business.com0