Can I, in Google's good graces, check for Googlebot to turn on/off tracking parameters in URLs?
-
Basically, we use a number of parameters in our URLs for event tracking. Google could be crawling an infinite number of these URLs. I'm already using the canonical tag to point at the non-tracking versions of those URLs....that doesn't stop the crawling tho.
I want to know if I can do conditional 301s or just detect the user agent as a way to know when to NOT append those parameters.
Just trying to follow their guidelines about allowing bots to crawl w/out things like sessionID...but they don't tell you HOW to do this.
Thanks!
-
No problem Ashley!
It sounds like that would fall under cloaking, albeit pretty benign as far as cloaking goes. There's some more info here. The Matt Cutts video on that page has a lot of good information. Apparently any cloaking is against Google's guidelines. I would suspect you could get away with it, but I'd be worried everyday about a Google penalty getting handed down.
-
The syntax is correct. Assuming the site: and inurl: operators work in Bing, as they do in Google, then Bing is not indexing URLs with the parameters.
That article you've referred to only tells how to sniff out Google...one of a couple. What it doesn't tell me, unfortunately, is if there are any consequences of doing so and taking some kind of action...like shutting off the event tracking parameters in this case.
Just to be clear...thanks a bunch for helping out!
-
My sense from what you told me is that canonicals should be working in your case. What you're trying to use them for is what they're intended to do. You're sure the syntax is correct, and they're in the of the page or being set in the HTTP header?
Google does set it up so you can sniff out Googlebot and return different content (see here), but that would be unusual to do given the circumstances. I doubt you'd get penalized for cloaking for redirecting parameterized URLs to canonical ones for only Googlebot, but I'd still be nervous about doing it.
Just curious, is Bing respecting the canonicals?
-
Yeah, we can't noindex anything because there literally is NO way to crawl the site without picking up tracking parameters.
So we're saying that there is literally no good/approved way to say "oh look, it's google. let's make sure we don't put any of these params on the URL."? Is that the consensus?
-
If these duplicate pages have URLs that are appearing in search results, then the canonicals aren't working or Google just hasn't tried to reindex those pages yet. If the pages are duplicates, and you've set the canonical correctly, and entered them in Google Webmaster Tools, over time those pages should drop out of the index as Google reindexes them. You could try submitting a few of these URLs with parameters to Google to reindex manually in Google Webmaster Tools, and see if afterward they disappear from the results pages. If they do, then it's just a matter of waiting for Googlebot to find them all.
If that doesn't work, you could try something tricky like adding meta noindex tags to the pages with URL parameters, wait until they fall out of the index, and then add canonical tags back on, and see if those pages come back into the SERPs. If they do, then Google is ignoring your canonical tags. I hate to temporarily noindex any pages like this... but if they're all appearing separately in the SERPs anyhow, then they're not pooling their link juice properly anyway.
-
Thank you for your response. Even if I tell them that the parameters don't alter content, which I have, that doesn't stop how many pages google has to crawl. That's my main concern...that googlebot is spending too much time on these alternate URLs.
Plus there are millions of these param-laden URLs in the index, regardless of the canonical tag. There is currently no way for google to crawl the site without parameters that change constantly throughout each visit. This can't be optimal.
-
You're doing the right thing by adding canonicals to those pages. You can also go into Google Webmaster Tools and let them know that those URL parameters don't change the content of the pages. This really is the bread and butter of canonical tags. This is the problem they're supposed to solve.
I wouldn't sniff out Googlebot just to 301 those URLs with parameters to the canonical versions. The canonicals should be sufficient. If you do want to sniff out Googlebot, Google's directions are here. You don't do it by user agent, you do a reverse DNS lookup. Again, I would not do this in your case.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does Google ignore content styled with 'display:none'?
Do you know if an H1 within a div that has a 'display: none' style applied will still be crawled and evaluated by Google? We have that situation on this page on line 136: view-source:https://www.junk-king.com/services/items-we-take/foreclosure-cleanouts Of course we also have an H1 up at the top of the page and are concerned that the second one will cause interference with our SEO efforts. I've seen conflicting and inconclusive information on line - not sure. Thanks for any help.
Intermediate & Advanced SEO | | rastellop0 -
A client rebranded a few years ago and doesn't want to be associated with it's old brand name. He wishes not to appear when the old brand is searched in Google, is there something we can do?
The problem is there was redirection between the old branded site and the new one, and now when you type in the name of the old brand, the new one comes up. I have desperately tried to convince this client there is nothing we can do about it, dozens of news articles crop up with the two brands together as this was a hot topic a few years ago, but just in case I missed something I thought I'd ask the community of experts here on Moz. An example for this would be Tyco Healthcare that became covidien in 2007. When you type tyco healthcare, covidien crops up here and there. Any ideas? Thanks!
Intermediate & Advanced SEO | | Netsociety0 -
Does content revealed by a 'show more' button get crawled by Google?
I have a div on my website with around 500 words of unique content in, automatically when the page is first visited the div has a fixed height of 100px, showing a couple of hundred words and fading out to white, with a show more button, which when clicked, increases the height to show the full content. My question is, does Google crawl the content in that div when it renders the page? Or disregard it? Its all in the source code. Or worse, do they consider this cloaking or hidden content? It is only there to make the site more useable for customers, so i don't want to get penalised for it. Cheers
Intermediate & Advanced SEO | | SEOhmygod0 -
Partial Match or RegEx in Search Console's URL Parameters Tool?
So I currently have approximately 1000 of these URLs indexed, when I only want roughly 100 of them. Let's say the URL is www.example.com/page.php?par1=ABC123=&par2=DEF456=&par3=GHI789= All the indexed URLs follow that same kinda format, but I only want to index the URLs that have a par1 of ABC (but that could be ABC123 or ABC456 or whatever). Using URL Parameters tool in Search Console, I can ask Googlebot to only crawl URLs with a specific value. But is there any way to get a partial match, using regex maybe? Am I wasting my time with Search Console, and should I just disallow any page.php without par1=ABC in robots.txt?
Intermediate & Advanced SEO | | Ria_0 -
How is Google's algorithm evolving in terms of DA vs PA value?
how is Google evolving in terms of value for DA vs PA? Is having a link from a DA 75 + PA 25 better than having a link from a DA 50 + PA 50, assuming such 2 websites are otherwise identical? I have a couple of .EDU backlinks where DA is around 80, though PA 1. Would be DA 40 with a PA 40 be more valuable? I hear Google is placing increasing value on the domain and less on the page authority.
Intermediate & Advanced SEO | | knielsen
Any insight appreciated thank you0 -
Robots.txt: Can you put a /* wildcard in the middle of a URL?
We have noticed that Google is indexing the language/country directory versions of directories we have disallowed in our robots.txt. For example: Disallow: /images/ is blocked just fine However, once you add our /en/uk/ directory in front of it, there are dozens of pages indexed. The question is: Can I put a wildcard in the middle of the string, ex. /en/*/images/, or do I need to list out every single country for every language in the robots file. Anyone know of any workarounds?
Intermediate & Advanced SEO | | IHSwebsite0 -
What's going on with my organic traffic from Google?
I am working on eCommerce website Vista Stores. My website's traffic is going down due to certain reason. I have done R & D and have assumption with auto generated content which I have added on few product pages. You can find out attachment to know more about current situation of traffic. 6789134845_d1a1578960_b.jpg
Intermediate & Advanced SEO | | CommercePundit0 -
Charity project for local women's shelter - need help: will Google notice if you alter the document title with Javascript after the page loads?
I am doing some pro-bono work with a local shelter for female victims of domestic abuse. I am trying to help visitors to the site cover their tracks by employing a document.title change when the page loads using JavaScript. This shelter receives a lot of traffic from Google. I worry that the Google bots will see this javascript change and somehow penalize this site or modify the title in the SERPs. Has anyone had any experience with this kind of javascript maneuver? All help would be greatly appreciated!
Intermediate & Advanced SEO | | jkonowitch0