Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Prevent Google from crawling Ajax
-
With Google figuring out how to make Ajax and JS more searchable/indexable, I am curious on thoughts or techniques to prevent this.
Here's my Situation, we have a page that we do not ever want to be indexed/crawled or other. Currently we have the nofollow/noindex command, but due to technical changes for our site the method in which this information is being implemented if it is ever displayed it will not have the ability to block the content from search. It is also the decision of the business to not list the file in robots.txt due to the sensitivity of the content. Basically, this content doesn't exist unless something super important happens, and even if something super important happens, we do not want Google to know of its existence.
Since the Dev team is planning on using Ajax/JS to pull in this content if the business turns it on, the concern is that it will be on the homepage and Google could index it. So the questions that I was asked; if Google can/does index, how long would that piece of content potentially appear in the SERPs? Can we block Google from caring about and indexing this section of content on the homepage?
Sorry for the vagueness of this question, it's very sensitive in nature and I am trying to avoid too many specifics. I am able to discuss this in a more private way if necessary.
Thanks!
-
Toby, thanks for the suggestion! I believe that this will help accomplish what we need. My Dev gave the "oh S" I should've thought of that response.
-
You may find that you have to wrap the code that gets called when Ajax fires in something to catch the user agent. I.e. if your making an Ajax request to a php script in order to return data, you could wrap that php code in something like this (please excuse the Sudo code):
if(in_array($_SERVER['HTTP_USER_AGENT'], $knownagents){
//known webspider, or blocked agent, return nothing.
return "";
} else {
//not a known spider so continue.
}
?>
Thats very generalised but you get the idea. I put a short list together in JSON format a while back, you can find it here if its of any use: https://www.source-control.co.uk/knownspiders/spiders.php
PM me if you need any more specific help than that with development, hopefully someone else will have a slightly easier way of dealing with this though heh
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can Google Crawl & Index my Schema in CSR JavaScript
We currently only have one option for implementing our Schema. It is populated in the JSON which is rendered by JavaScript on the CLIENT side. I've heard tons of mixed reviews about if this will work or not. So, does anyone know for sure if this will or will not work. Also, how can I build a test to see if it does or does not work?
Intermediate & Advanced SEO | | MJTrevens0 -
Google Penalty Checker Tool
What is the best tool to check for the google penalty, What penalty hit the website. ?
Intermediate & Advanced SEO | | Michael.Leonard0 -
How important is admin-ajax.php?
Hi there! It's been a long time since I last did a technical audit of a site. I've currently playing with the 'fetch as google' tool to find out if we're blocking anything vital. The site is based on Wordpress, and after a recent hacking incident, a previous SEO moved the login portal from domain.com/wp-admin/ to domain.com/pr3ss/wp-admin/ - to stop people finding it. Fair enough. But they then updated the robots.txt file to look like this: User-agent: *
Intermediate & Advanced SEO | | Muhammad-Isap
Disallow: /pr3ss/wp-admin/ Now, some pages are trying to draw on theme elements like: http://www.domain.com/pr3ss/wp-admin/admin-ajax.php
http://www.domain.com/pr3ss/wp-content/themes/bestpracticegroup/images/column_wrapper_bg.png And are naturally being blocked (not that this seems to affect the way pages are rendering in Google's eyes) A good SEO friend of mine has suggested allowing the theme folder, and any sub folders where this becomes an issue. What are your thoughts? Is it even worth disallowing the /pr3ss/wp-admin/ path? Cheers guys and gals! All the best, John. I've found a couple of the theme's0 -
SameAs Markup for Google Knowledge Graph
I am trying to get my content in the Google Knowledge graph. Everything I've read thus far about Knowledge Graph tells us how to get in for branded terms (e.g. company name or your own name). But I am looking for ways to have my content be indexed and shown in Google graph. For example, if you search for "mayonnaise for hair" you will see Knowledge graph show us a snippet from an article on RealSimple.com. **How do you get your content to show here? ** I've been reading a lot about SameAs markup, but it seems to only help for branded terms, so companies can have a knowledge box for their brand. But does it help for non-branded keywords? I appreciate any advice. Thanks.
Intermediate & Advanced SEO | | TMI.com1 -
How can I make sure Google is crawling a link from an iframe (video)?
Do they crawl backlinks from an iframe example from a Youtube video embedded in a blog post? TIA!
Intermediate & Advanced SEO | | zpm20140 -
Would you rate-control Googlebot? How much crawling is too much crawling?
One of our sites is very large - over 500M pages. Google has indexed 1/8th of the site - and they tend to crawl between 800k and 1M pages per day. A few times a year, Google will significantly increase their crawl rate - overnight hitting 2M pages per day or more. This creates big problems for us, because at 1M pages per day Google is consuming 70% of our API capacity, and the API overall is at 90% capacity. At 2M pages per day, 20% of our page requests are 500 errors. I've lobbied for an investment / overhaul of the API configuration to allow for more Google bandwidth without compromising user experience. My tech team counters that it's a wasted investment - as Google will crawl to our capacity whatever that capacity is. Questions to Enterprise SEOs: *Is there any validity to the tech team's claim? I thought Google's crawl rate was based on a combination of PageRank and the frequency of page updates. This indicates there is some upper limit - which we perhaps haven't reached - but which would stabilize once reached. *We've asked Google to rate-limit our crawl rate in the past. Is that harmful? I've always looked at a robust crawl rate as a good problem to have. Is 1.5M Googlebot API calls a day desirable, or something any reasonable Enterprise SEO would seek to throttle back? *What about setting a longer refresh rate in the sitemaps? Would that reduce the daily crawl demand? We could set increase it to a month, but at 500M pages Google could still have a ball at the 2M pages/day rate. Thanks
Intermediate & Advanced SEO | | lzhao0 -
Buying a domain banned by google
Hi , I came across a super domain for my business but found out that it was a great domain with 100s of link backs but is now banned by Google search engine meaning Google does not index content from that domain. Since the domains linkbacks are from my domin does it make sense to but that domain and redirect those link backs to another (301) and hope that the new domain gets some juice ... I know it is sounding crazy and may not be the best thing to do ethically but still wanted to check if its possible to get some juice.. Rgds Avinash
Intermediate & Advanced SEO | | Avinashmb0 -
How to prevent Google from crawling our product filter?
Hi All, We have a crawler problem on one of our sites www.sneakerskoopjeonline.nl. On this site, visitors can specify criteria to filter available products. These filters are passed as http/get arguments. The number of possible filter urls is virtually limitless. In order to prevent duplicate content, or an insane amount of pages in the search indices, our software automatically adds noindex, nofollow and noarchive directives to these filter result pages. However, we’re unable to explain to crawlers (Google in particular) to ignore these urls. We’ve already changed the on page filter html to javascript, hoping this would cause the crawler to ignore it. However, it seems that Googlebot executes the javascript and crawls the generated urls anyway. What can we do to prevent Google from crawling all the filter options? Thanks in advance for the help. Kind regards, Gerwin
Intermediate & Advanced SEO | | footsteps0