Blocking AJAX Content from being crawled
-
Our website has some pages with content shared from a third party provider and we use AJAX as our implementation. We dont want Google to crawl the third party's content but we do want them to crawl and index the rest of the web page. However, In light of Google's recent announcement about more effectively indexing google, I have some concern that we are at risk for that content to be indexed.
I have thought about x-robots but have concern about implementing it on the pages because of a potential risk in Google not indexing the whole page. These pages get significant traffic for the website, and I cant risk.
Thanks,
Phil
-
Hey Phil. I think I've fully understood your situation but just to be clear I'm presuming you've URL's exposing 3rd party JSON/XML content that you don't want being indexed by Google. Probably the most foolproof method for this case is using the "X-Robots-Tag" HTTP header convention (http://code.google.com/web/controlcrawlindex/docs/robots_meta_tag.html). I would recommend going with "X-Robots-Tag: none", which should do the trick (I really don't think "noarchive" or other options are required if they're not indexing it at all). You'll need to modify your server-side scripts to do this. I'm assuming there's not much pain required for you (or the 3rd-party?) to do this. Hope this helps! ~bryce
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Footer Content Issue
Please check given screenshot URL. As per the screenshot we are using highlighted content through out the website in the footer section of our website (https://www.mastersindia.co/) . So, please tell us how Google will treat this content. Will Google count it as duplicate content or not? What is the solution in case if the Google treat it as duplicate content. Screenshot URL: https://prnt.sc/pmvumv
Technical SEO | | AnilTanwarMI0 -
Crawl depth and www
I've run a crawl on a popular amphibian based tool, just wanted to confirm... should http://www.homepage be at crawl depth 0 or 1? The audit shows http://homepage at level 0 and http://www.homepage at level 1 through a redirect. Thanks
Technical SEO | | Focus-Online-Management0 -
Webmaster tools crawl stats
Hi I have a clients site that was having aprox 30 - 50 pages crawled regularly since site launch up until end of Jan. On the 21st Jan the crawled pages dropped significantly from this average to about 11 - 20 pages per day. This also coincided with a massive rankings drop on the 22nd which i thought was something to do with panda although it later turned out the hosts had changed the DNS and exactly a week after fixing it the rankings returned so i think that was the cause not panda. However i note that the crawl rate still hasn't returned to what it was/previous average and is still following the new average of 10-20 pages per day rather than the 30-50 pages per day. Does anyone have any ideas why this is ? I have since added a site map but hasnt increased crawl rate since A bit of further info if it helps in any way is that In the indexed status section says 48 pages ever crawled with 37 pages indexed. There are 48 pages on the site. The site map section says 37 submitted with 35 indexed. I would have thought that since dynamic site map would submit all urls Any clarity re the above much appreciated ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
Auto-loading content via AJAX - best practices
We have an ecommerce website and I'm looking at replacing the pagination on our category pages with functionality that auto-loads the products as the user scrolls. There are a number of big websites that do this - MyFonts and Kickstarter are two that spring to mind. Obviously if we are loading the content in via AJAX then search engine spiders aren't going to be able to crawl our categories in the same way they can now. I'm wondering what the best way to get around this is. Some ideas that spring to mind are: detect the user agent and if the visitor is a spider, show them the old-style pagination instead of the AJAX version make sure we submit an updated Google sitemap every day (I'm not sure if this a reasonable substitute for Google being able to properly crawl our site) Are there any best practices surrounding this approach to pagination? Surely the bigger sites that do this must have had to deal with these issues? Any advice would be much appreciated!
Technical SEO | | paul.younghusband0 -
How to avoid duplicate content penalty when our content is posted on other sites too ?
For recruitment company sites, their job ads are posted muliple times on thier own sites and even on other sites too. These are the same ads (job description is same) posted on diff. sites. How do we avoid duplicate content penalty in this case?
Technical SEO | | Personnel_Concept0 -
Duplicate content, how to solve?
I have about 400 errors about duplicate content on my seomoz dashboard. However I have no idea how to solve this, I have 2 main scenarios of duplication in my site: Scenario 1: http://www.theprinterdepo.com/catalogsearch/advanced/result/?name=64MB+SDRAM+DIMM+MEMORY+MODULE&sku=&price%5Bfrom%5D=&price%5Bto%5D=&category= 3 products with the same title, but different product models, as you can note is has the same price as well. Some printers use a different memory product module. So I just cant delete 2 products. Scenario 2: toners http://www.theprinterdepo.com/brother-high-capacity-black-toner-cartridge-compatible-73 http://www.theprinterdepo.com/brother-high-capacity-black-toner-cartridge-compatible-75 In this scenario, products have a different title but the same price. Again, in this scenario the 2 products are different. Thank you
Technical SEO | | levalencia10 -
How do I combat content theft?
A new site popped up that has completely replicated a site own by my client. This site is literally a copycat, scraped all the content, and copied the design down to the colors. I've already reported the site to the hosting provider and filled a spam report on Google. I noticed that the author changed some of the text, and internal links so that they don't link to our site anymore. Some of these were missed. I'm also going to take a couple preventative actions like change stuff in .htaccess, but that doesn't help me now, just in case it happens again in the future. I'm wondering what else i can or should be doing?
Technical SEO | | flowsimple0 -
Blocking Google from Crawling Parameters
Hi guys: What is the best way to keep Google from crawling certain urls with parameters? I used the setting in Webmaster Tools, but that doesn't seem to be helping at all. Can I use robots.txt or some other method? Thanks! Some examples are: <colgroup><col width="797"></colgroup> www.mayer-johnson.com/category/assistive-technology?manufacturer=179 www.mayer-johnson.com/category/assistive-technology?manufacturer=226 www.mayer-johnson.com/category/assistive-technology?manufacturer=227 <colgroup><col width="797"></colgroup> www.mayer-johnson.com/category/english-language-learners?condition=212 www.mayer-johnson.com/category/english-language-learners?condition=213 www.mayer-johnson.com/category/english-language-learners?condition=214 <colgroup><col width="797"></colgroup>
Technical SEO | | DanaDV
| www.mayer-johnson.com/category/english-language-learners?roles=164 |
| www.mayer-johnson.com/category/english-language-learners?roles=165 |
| www.mayer-johnson.com/category/english-language-learners?roles=197 | | |0