Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Prevent Google from crawling Ajax
-
With Google figuring out how to make Ajax and JS more searchable/indexable, I am curious on thoughts or techniques to prevent this.
Here's my Situation, we have a page that we do not ever want to be indexed/crawled or other. Currently we have the nofollow/noindex command, but due to technical changes for our site the method in which this information is being implemented if it is ever displayed it will not have the ability to block the content from search. It is also the decision of the business to not list the file in robots.txt due to the sensitivity of the content. Basically, this content doesn't exist unless something super important happens, and even if something super important happens, we do not want Google to know of its existence.
Since the Dev team is planning on using Ajax/JS to pull in this content if the business turns it on, the concern is that it will be on the homepage and Google could index it. So the questions that I was asked; if Google can/does index, how long would that piece of content potentially appear in the SERPs? Can we block Google from caring about and indexing this section of content on the homepage?
Sorry for the vagueness of this question, it's very sensitive in nature and I am trying to avoid too many specifics. I am able to discuss this in a more private way if necessary.
Thanks!
-
Toby, thanks for the suggestion! I believe that this will help accomplish what we need. My Dev gave the "oh S" I should've thought of that response.
-
You may find that you have to wrap the code that gets called when Ajax fires in something to catch the user agent. I.e. if your making an Ajax request to a php script in order to return data, you could wrap that php code in something like this (please excuse the Sudo code):
if(in_array($_SERVER['HTTP_USER_AGENT'], $knownagents){
//known webspider, or blocked agent, return nothing.
return "";
} else {
//not a known spider so continue.
}
?>
Thats very generalised but you get the idea. I put a short list together in JSON format a while back, you can find it here if its of any use: https://www.source-control.co.uk/knownspiders/spiders.php
PM me if you need any more specific help than that with development, hopefully someone else will have a slightly easier way of dealing with this though heh
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl Stats Decline After Site Launch (Pages Crawled Per Day, KB Downloaded Per Day)
Hi all, I have been looking into this for about a month and haven't been able to figure out what is going on with this situation. We recently did a website re-design and moved from a separate mobile site to responsive. After the launch, I immediately noticed a decline in pages crawled per day and KB downloaded per day in the crawl stats. I expected the opposite to happen as I figured Google would be crawling more pages for a while to figure out the new site. There was also an increase in time spent downloading a page. This has went back down but the pages crawled has never went back up. Some notes about the re-design: URLs did not change Mobile URLs were redirected Images were moved from a subdomain (images.sitename.com) to Amazon S3 Had an immediate decline in both organic and paid traffic (roughly 20-30% for each channel) I have not been able to find any glaring issues in search console as indexation looks good, no spike in 404s, or mobile usability issues. Just wondering if anyone has an idea or insight into what caused the drop in pages crawled? Here is the robots.txt and attaching a photo of the crawl stats. User-agent: ShopWiki Disallow: / User-agent: deepcrawl Disallow: / User-agent: Speedy Disallow: / User-agent: SLI_Systems_Indexer Disallow: / User-agent: Yandex Disallow: / User-agent: MJ12bot Disallow: / User-agent: BrightEdge Crawler/1.0 (crawler@brightedge.com) Disallow: / User-agent: * Crawl-delay: 5 Disallow: /cart/ Disallow: /compare/ ```[fSAOL0](https://ibb.co/fSAOL0)
Intermediate & Advanced SEO | | BandG0 -
Google not Indexing images on CDN.
My URL is: http://bit.ly/1H2TArH We have set up a CDN on our own domain: http://bit.ly/292GkZC We have an image sitemap: http://bit.ly/29ca5s3 The image sitemap uses the CDN URLs. We verified the CDN subdomain in GWT. The robots.txt does not restrict any of the photos: http://bit.ly/29eNSXv. We used to have a disallow to /thumb/ which had a 301 redirect to our CDN but we removed both the disallow in the robots.txt as well as the 301. Yet, GWT still reports none of our images on the CDN are indexed.
Intermediate & Advanced SEO | | alphonseha
The above screenshot is from the GWT of our main domain.The GWT from the CDN subdomain just shows 0. We did not submit a sitemap to the verified subdomain property because we already have a sitemap submitted to the property on the main domain name. While making a search of images indexed from our CDN, nothing comes up: http://bit.ly/293ZbC1While checking the GWT of the CDN subdomain, I have been getting crawling errors, mainly 500 level errors. Not that many in comparison to the number of images and traffic that we get on our website. Google is crawling, but it seems like it just doesn't index the pictures!?
Can anyone help? I have followed all the information that I was able to find on the web but yet, our images on the CDN still can't seem to get indexed.
0 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
How does google recognize original content?
Well, we wrote our own product descriptions for 99% of the products we have. They are all descriptive, has at least 4 bullet points to show best features of the product without reading the all description. So instead using a manufacturer description, we spent $$$$ and worked with a copywriter and still doing the same thing whenever we add a new product to the website. However since we are using a product datafeed and send it to amazon and google, they use our product descriptions too. I always wait couple of days until google crawl our product pages before i send recently added products to amazon or google. I believe if google crawls our product page first, we will be the owner of the content? Am i right? If not i believe amazon is taking advantage of my original content. I am asking it because we are a relatively new ecommerce store (online since feb 1st) while we didn't have a lot of organic traffic in the past, i see that our organic traffic dropped like 50% in April, seems like it was effected latest google update. Since we never bought a link or did black hat link building. Actually we didn't do any link building activity until last month. So google thought that we have a shallow or duplicated content and dropped our rankings? I see that our organic traffic is improving very very slowly since then but basically it is like between 5%-10% of our current daily traffic. What do you guys think? You think all our original content effort is going to trash?
Intermediate & Advanced SEO | | serkie1 -
Number of images on Google?
Hello here, In the past I was able to find out pretty easily how many images from my website are indexed by Google and inside the Google image search index. But as today looks like Google is not giving you any numbers, it just lists the indexed images. I use the advanced image search, by defining my domain name for the "site or domain" field: http://www.google.com/advanced_image_search and then Google returns all the images coming from my website. Is there any way to know the actual number of images indexed? Any ideas are very welcome! Thank you in advance.
Intermediate & Advanced SEO | | fablau1 -
How to find all indexed pages in Google?
Hi, We have an ecommerce site with around 4000 real pages. But our index count is at 47,000 pages in Google Webmaster Tools. How can I get a list of all pages indexed of our domain? trying to locate the duplicate content. Doing a "site:www.mydomain.com" only returns up to 676 results... Any ideas? Thanks, Ben
Intermediate & Advanced SEO | | bjs20100 -
Multiple Authors Google + Authorship
Hello, I took a look through past questions but can't seem to find a definitive answer on setting up Google + Authorship credit (for multiple authors) using a Wordpress blog. Has anyone had experience setting this up? Or could you recommend solid reading/research? I took a look at a couple of Wordpress plug in's but just found them very confusing (so did our IT contact who will ultimately be setting up code for this.) Any direction or advice is appreciated.
Intermediate & Advanced SEO | | SEOSponge0 -
Why am I not ranking in Google, but I am in Yahoo and Bing?
The website in question is: www.stbarthexclusives.com Our keywords are currently ranking for both Bing and Yahoo, but we're not appearing anywhere on Google. The website is being crawled successfully, but we still don't have any results. I hoping somebody can point me in the general right direction to fix/correct this problem. Additionally, there's a decent amount of "rel=canonical tags" on the website. If that helps your evaluation. Any advice would be greatly appreciated
Intermediate & Advanced SEO | | Endora0