Interest in optimise Google Crawl
-
Hello,
I have an ecommerce site with all pages crawled and indexed by Google.
But I have some pages with multiple urls like : www.sitename.com/product-name.html and www.sitename.com/category/product-name.html
There is a canonical on all these pages linking to the simplest url (so Google index only one page). So the multiple pages are not indexed, but Google still comes crawling them.
My question is : Did I have any interest in avoiding Google to crawl these pages or not ?
My point is that Google crawl around 1500 pages a day on my site, but there are only 800 real pages and they are all indexed on Google. There is no particular issue, so is it interesting to make it change ?
Thanks
-
Hi!
Have you no indexed the pages too? That may help to make sure that they aren't being crawled if that's concerning you. May at least give Google another signal not to crawl those pages.
Obviously it's not a catch all as there's only so much you can do to tell Google not to crawl a page. Sometimes if the alternative page is linked to internally (which it sounds like it is), then it will automatically crawl it even though you've said it has a canonical on it as you're showing that the page is important to your site.
May be worth testing a few pages to see if it has an impact.
-
Hi there!
From my experience, the best results I was ever able to achieve for a Client is when we consolidated all URLs to a single URL solution. Canonicals are amazing, no doubt. But I've experienced a canonical structure being ignored if there are instances where the canonical structure isn't 100% 'correct.'
If there is a way that you can have your website navigation & internal/XML sitemap reinforce your preferred URL, that would certainly reduce the number of URLs Google would crawl. Then, if you permanently (301) redirect all the now non-navigable URLs to the single preferred URL, you should see a significant boost in traffic (from consolidating all of the authority into a single page, now reinforced throughout your entire website).
If that's not possible, and you have to have multiple URLs within your site for budget/platform constraints, then yes, let Google crawl them. Otherwise the algo won't be able to see your canonical tag across them.
So in short: If you have a means to reduce the number of duplicates and redirect them - awesome. If you don't have a means to reduce duplicates, opening them up to Google is good, too.
For more information on making sure your canonical structure is set up properly, check out this Moz blog post: https://moz.com/blog/rel-confused-answers-to-your-rel-canonical-questions
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Homepage is deindexed in Google
Please help for some reason my website home page has disappeared, we have been working on the site but nothing that I can think of which would block it. There are no warnings in google console? Can anyone lend a hand in understanding what has gone wrong, I would really appreciate it. The site is: http://www.discountstickerprinting.co.uk/ Seems to be working again but I had to fetch the home page in google console, any idea why this has happened cannot afford a heat op at this age lol?
Intermediate & Advanced SEO | | BobAnderson0 -
Prevent Google from crawling Ajax
With Google figuring out how to make Ajax and JS more searchable/indexable, I am curious on thoughts or techniques to prevent this. Here's my Situation, we have a page that we do not ever want to be indexed/crawled or other. Currently we have the nofollow/noindex command, but due to technical changes for our site the method in which this information is being implemented if it is ever displayed it will not have the ability to block the content from search. It is also the decision of the business to not list the file in robots.txt due to the sensitivity of the content. Basically, this content doesn't exist unless something super important happens, and even if something super important happens, we do not want Google to know of its existence. Since the Dev team is planning on using Ajax/JS to pull in this content if the business turns it on, the concern is that it will be on the homepage and Google could index it. So the questions that I was asked; if Google can/does index, how long would that piece of content potentially appear in the SERPs? Can we block Google from caring about and indexing this section of content on the homepage? Sorry for the vagueness of this question, it's very sensitive in nature and I am trying to avoid too many specifics. I am able to discuss this in a more private way if necessary. Thanks!
Intermediate & Advanced SEO | | Shawn_Huber0 -
Google news and Yoast News
Hi, I have a blog, I want to send my blog to Google news with the plugin "Yoast news".
Intermediate & Advanced SEO | | JohnPalmer
If I'll change the meta-title and ill keep the title of the post as is, for example:
Meta-title (yoast) - TEXT for Search engines | My Brand name
Post tilte - for users - TExT For Users and BlaBla there is a problem? the title of the page and the title of the meta should be same for Google NEWS?0 -
Does having all client websites on same server/same Google Analytics red flag Google?
If you have several clients, and they are all on the same server, and also under ONE Google Analytics account, will that negatively impact with Google? They all have different content and addresses, some have the same template, but with different images.
Intermediate & Advanced SEO | | BBuck1 -
Google+ Page Question
Just started some work for a new client, I created a Google+ page and a connected YouTube page, then proceeded to claim a listing for them on google places for business which automatically created another Google+ page for the business listing. What do I do in this situation? Do I delete the YouTube page and Google+ page that I originally made and then recreate them using the Google+ page that was automatically created or do I just keep both pages going? If the latter is the case, do I use the same information to populate both pages and post the same content to both pages? That doesn't seem like it would be efficient or the right way to go about handling this but I could be wrong.
Intermediate & Advanced SEO | | goldbergweismancairo0 -
My own brand name disappeared from google?
Hi, about 20-30 hours ago my own brand name disappeared from google results (We redirected old domain to new one about a month ago) My website is: www.websiteplanet.com If you search for Website Planet in google you will not find our homepage any longer.
Intermediate & Advanced SEO | | Ouzan
Not only that the brand name disappeared but we also dropped in rankings and lost about %50 of the organic traffic we had. It's important for me to say that we have never done any sort of blackhat or even greyhat SEO, at all. I could probably come up with many ideas of why it happened but maybe one of you mozzers already experienced this and could enlighten me. Will really appreciate any kind of response/help. Thanks.0 -
Google Phone Numbers
What process is performed to get a company's phone number to show as "A" on google maps. Google displays the phone number for the company on the map as "A" first. It would be beneficial to get that position. Is there a sub-category of seo that does this? Thanks in advance!
Intermediate & Advanced SEO | | JML11790 -
Googlebot crawling partial URLs
Hi guys, I've checked my email this morning and I've got a number of 404 errors over the weekend where Google has tried to crawl some of my existing pages but not found the full URL. Instead of hitting 'domain.com/folder/complete-pagename.php' it's hit 'domain.com/folder/comp'. This is definitely Googlebot/2.1; http://www.google.com/bot.html (66.249.72.53) but I can't find where it would have found only the partial URL. It certainly wasn't on the domain it's crawling and I can't find any links from external sites pointing to us with the incorrect URL. GoogleBot is doing the same thing across a single domain but in different sub-folders. Having checked Webmaster Tools there aren't any hard 404s and the soft ones aren't related and haven't occured since August. I'm really confused as to how this is happening.. Thanks!
Intermediate & Advanced SEO | | panini0