Blocking Google from telemetry requests
-
At Magnet.me we track the items people are viewing in order to optimize our recommendations. As such we fire POST requests back to our backends every few seconds when enough user initiated actions have happened (think about scrolling for example). In order to eliminate bots from distorting statistics we ignore their values serverside.
Based on some internal logging, we see that Googlebot is also performing these POST requests in its javascript crawling. In a 7 day period, that amounts to around 800k POST requests. As we are ignoring that data anyhow, and it is quite a number, we considered reducing this for bots.
Though, we had several questions about this:
1. Do these requests count towards crawl budgets?
2. If they do, and we'd want to prevent this from happening: what would be the preferred option? Either preventing the request in the frontend code, or blocking the request using a robots.txt line?The latter question is given by the fact that a in-app block for the request could lead to different behaviour for users and bots, and may be Google could penalize that as cloaking. The latter is slightly less convenient from a development perspective, as all logic is spread throughout the application.
I'm aware one should not cloak, or makes pages appear differently to search engine crawlers. However these requests do not change anything in the pages behaviour, and purely send some anonymous data so we can improve future recommendations.
-
Hi Rogier,
- Yes, this is usually counting towards crawl budgets as Googlebot is doing this per request.
- It depends on how your request is being set up obviously, otherwise, I would advise going with the exclusion for the robots.txt that you're already heading towards.
Hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why wont google Index this page?
A week ago i accidentally changed this page settings in my CMS to "disable & dont index" as i was going to replace this page with another, but this didnt happen, but i forgot to switch the settings back! http://www.over50choices.co.uk/funeral-planning/funeral-plans Anyhow in an effort to get it back up quickly i submitted in GWTs but its still not indexed. When i use several SEO on page checking tools it has the Meta Title data as "Form" and not the correct title. Any ideas please? Yours frustrated Ash
Technical SEO | | AshShep10 -
Block Domain in robots.txt
Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it.
Technical SEO | | zeepartner0 -
How to block my web from SeoMOz Crawler
Hi, I want to block mysite.com/forum from SeoMoz Crawler, how do i do that? i just want to block the forum and let the other part of my site still crawlable by seomoz. Thanks Regards
Technical SEO | | Gomu20 -
Blocking https from being crawled
I have an ecommerce site where https is being crawled for some pages. Wondering if the below solution will fix the issue www.example.com will be my domain In the nav there is a login page www.example.com/login which is redirecting to the https://www.example.com/login If I just disallowed /login in the robots file wouldn't it not follow the redirect and index that stuff? The redirect part is what I am questioning.
Technical SEO | | Sean_Dawes0 -
Google Search Parameters
Couple quick questions. Is using the parameter pws=0 still useful for turning off personalization? Is there a way to set my location as a URL parameter as well? For instance, I want to set my location to United States, can this be done with a URL param the same way as pws=0?
Technical SEO | | nbyloff0 -
Odd Google Indexing Issue
I have encountered something odd with Google indexing. According to the Google cache my site was last updated on April 6. I had been making a series of changes on April 7th and none of them show up in the cached version of the site (naturally). Then, on the 8th, my rankings seem to have dropped about 6 places and the main SERP is showing a text that isn't even on the Web site. The cached version has the correct page title from the page that was indexed on the 6th. How do I learn where Google is picking this up from? There is a clean page title tag on my Web site. I've checked the server, etc to see what's going on. The text isn't completely unrelated, but it definitely impacted my ranking. Does Google ever have these hiccups when indexing?
Technical SEO | | VERBInteractive0 -
Schema.org support by google
Hi, Some time ago I have implemented schema.org product schema on all of my product pages. The rich snippet tool provided by Google shows that the relevant info is extracted (they say that preview is not supported so no preview there) My issue is that I can't see any rich snippet displayed for any on my searches, which leads me to think that there is something wrong with my implementation. sample page: www.funstuff.co.il/tabid/62/ProdID/933/products.aspx [editor's note: adult oriented content, NSFW] Any one has an Idea? Thanks, Asaf
Technical SEO | | AsafY0