Separate Servers for Humans vs. Bots with Same Content Considered Cloaking?
-
Hi,
We are considering using separate servers for when a Bot vs. a Human lands on our site to prevent overloading our servers. Just wondering if this is considered cloaking if the content remains exactly the same to both the Bot & Human, but on different servers.
And if this isn't considered cloaking, will this affect the way our site is crawled? Or hurt rankings?
Thanks
-
The additional massive complexity, expense, upkeep and risk of trying to run a separate server just for bots is nowhere near worth it, in my opinion. (Don't forget, you'd also have to build a system to replicate the content between each server every time content/code is added or edited. That replication process could well use more resources than the bots do!)
I'd say you'd be much better off using all those resources towards a more robust primary server and let it do it's job.
In addition, as Lesley says, you can tune GoogleBot, and can actually schedule Bing's crawl times in their Webmaster Tools. Though for me, I'd want the search engine bots to get in and index my site just as soon as they were willing.
Lastly, it's only a few minutes' work to source a ready-made blacklist of "bad bots" useragents that you can quickly insert into your htaccess file to completely block a significant number of the most wasteful and unnecessary bots. You will want to update such a blacklist every few months as the worst offenders regularly change useragents to avoid just such blacklisting.
Does that make sense as an alternative?
Paul
-
I second what Jonathan says, but I would also like to add a couple of things. One thing I would keep in mind is reserve power on your server. If you are running the server close enough to its maximum traffic limit where a bot would matter, I would upgrade the whole server. All it takes is one nice spike from somewhere like hacker news or reddit to take your site offline, especially if you are running close to the red.
From my understanding you can actually adjust how and when Google will crawl you site also, https://developers.google.com/search-appliance/documentation/50/help_mini/crawl_fullcrawlsched
-
I've never known search engine bots to be particularly troublesome and overload servers. However, there are a few things you could do:
1. Setup Caching
2. Setup something like Cloudflare which would be able to block other threats.
I cannot imagine you are intending to block google, bing etc as I would definitely advise against cloaking the site like that from Google.
Of course it is difficult to make any specific comment as I have no idea to the extent of the problem you are suffering from. But something like caching \ cloudflare security features will help alot.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Timeviewer vs teamviewer
Hi everyone, here the issue I need to know if you can help or not, a website called timeviewer.tv. Now the problem is, when you search for 'timeviewer', Google will tell you do you want to 'search for teamviewer instead'? I want people to be able to search timeviewer purely without the need to click on search for timeviewer instead. How can this be accompplished? Thanks everyone. Luca
White Hat / Black Hat SEO | | Luca_Tagliaferro0 -
Is there any SEO impact to using "www" vs. non-"www" preferred domain name?
My client has been using "www" with his domain and before I took over, has used it in marketing etc. I typically don't use "www" in my wordpress setup, and set non-www as the preferred domain in google analytics and google search console. Does it make any difference? Especially when www resolves to non-www? I appreciate some guidance with this.
White Hat / Black Hat SEO | | chill9860 -
How to re-rank an established website with new content
I can't help but feel this is a somewhat untapped resource with a distinct lack of information.
White Hat / Black Hat SEO | | ChimplyWebGroup
There is a massive amount of information around on how to rank a new website, or techniques in order to increase SEO effectiveness, but to rank a whole new set of pages or indeed to 're-build' a site that may have suffered an algorithmic penalty is a harder nut to crack in terms of information and resources. To start I'll provide my situation; SuperTED is an entertainment directory SEO project.
It seems likely we may have suffered an algorithmic penalty at some point around Penguin 2.0 (May 22nd) as traffic dropped steadily since then, but wasn't too aggressive really. Then to coincide with the newest Panda 27 (According to Moz) in late September this year we decided it was time to re-assess tactics to keep in line with Google's guidelines over the two years. We've slowly built a natural link-profile over this time but it's likely thin content was also an issue. So beginning of September up to end of October we took these steps; Contacted webmasters (and unfortunately there was some 'paid' link-building before I arrived) to remove links 'Disavowed' the rest of the unnatural links that we couldn't have removed manually. Worked on pagespeed as per Google guidelines until we received high-scores in the majority of 'speed testing' tools (e.g WebPageTest) Redesigned the entire site with speed, simplicity and accessibility in mind. Htaccessed 'fancy' URLs to remove file extensions and simplify the link structure. Completely removed two or three pages that were quite clearly just trying to 'trick' Google. Think a large page of links that simply said 'Entertainers in London', 'Entertainers in Scotland', etc. 404'ed, asked for URL removal via WMT, thinking of 410'ing? Added new content and pages that seem to follow Google's guidelines as far as I can tell, e.g;
Main Category Page Sub-category Pages Started to build new links to our now 'content-driven' pages naturally by asking our members to link to us via their personal profiles. We offered a reward system internally for this so we've seen a fairly good turnout. Many other 'possible' ranking factors; such as adding Schema data, optimising for mobile devices as best we can, added a blog and began to blog original content, utilise and expand our social media reach, custom 404 pages, removed duplicate content, utilised Moz and much more. It's been a fairly exhaustive process but we were happy to do so to be within Google guidelines. Unfortunately, some of those link-wheel pages mentioned previously were the only pages driving organic traffic, so once we were rid of these traffic has dropped to not even 10% of what it was previously. Equally with the changes (htaccess) to the link structure and the creation of brand new pages, we've lost many of the pages that previously held Page Authority.
We've 301'ed those pages that have been 'replaced' with much better content and a different URL structure - http://www.superted.com/profiles.php/bands-musicians/wedding-bands to simply http://www.superted.com/profiles.php/wedding-bands, for example. Therefore, with the loss of the 'spammy' pages and the creation of brand new 'content-driven' pages, we've probably lost up to 75% of the old website, including those that were driving any traffic at all (even with potential thin-content algorithmic penalties). Because of the loss of entire pages, the changes of URLs and the rest discussed above, it's likely the site looks very new and probably very updated in a short period of time. What I need to work out is a campaign to drive traffic to the 'new' site.
We're naturally building links through our own customerbase, so they will likely be seen as quality, natural link-building.
Perhaps the sudden occurrence of a large amount of 404's and 'lost' pages are affecting us?
Perhaps we're yet to really be indexed properly, but it has been almost a month since most of the changes are made and we'd often be re-indexed 3 or 4 times a week previous to the changes.
Our events page is the only one without the new design left to update, could this be affecting us? It potentially may look like two sites in one.
Perhaps we need to wait until the next Google 'link' update to feel the benefits of our link audit.
Perhaps simply getting rid of many of the 'spammy' links has done us no favours - I should point out we've never been issued with a manual penalty. Was I perhaps too hasty in following the rules? Would appreciate some professional opinion or from anyone who may have experience with a similar process before. It does seem fairly odd that following guidelines and general white-hat SEO advice could cripple a domain, especially one with age (10 years+ the domain has been established) and relatively good domain authority within the industry. Many, many thanks in advance. Ryan.0 -
My website is coming up under a proxy server "HideMyAss.com." How do I stop this from happening?
We've noticed that when we search our web copy in Google the first result is under a proxy server "HideMyAss.com," and our actual website is no where in sight. We've called Google and they really didn't have an answer for us (well the 2-3 people) we spoke with. Any suggestions or ideas would be greatly appreciated.
White Hat / Black Hat SEO | | AAC_Adam0 -
DIV Attribute containing full DIV content
Hi all I recently watched the latest Mozinar called "Making Your Site Audits More Actionable". It was presented by the guys at seogadget. In the mozinar one of the guys said he loves the website www.sportsbikeshop.co.uk and that they have spent a lot of money on it from an SEO point of view (presumably with seogadget) so I decided to look through the source and noticed something I had not seen before and wondered if anyone can shed any light. On this page (http://www.sportsbikeshop.co.uk/motorcycle_parts/content_cat/852/(2;product_rating;DESC;0-0;all;92)/page_1/max_20) there is a paragraph of text that begins with 'The ever reliable UK weather...' and when you via the source of the containing DIV you will notice a bespoke attribute called "threedots=" and within it, is the entire text content for that DIV. Any thoughts as to why they would put that there? I can't see any reason as to why this would benefit a site in any shape or form. Its invalid markup for one. Am I missing a trick..? Thoughts would be greatly appreciated. Kris P.S. for those who can't be bothered to visit the site, here is a smaller version of what they have done: This is an introductory paragraph of text for this page.
White Hat / Black Hat SEO | | yousayjump0 -
Syndicated content outperforming our hard work!
Our company (FindMyAccident) is an accident news site. Our goal is to roll our reporting out to all 50 states; currently, we operate full-time in 7 states. To date, the largest expenditure is our writing staff. We hire professional
White Hat / Black Hat SEO | | Wayne76
journalists who work with police departments and other sources to develop written
content and video for our site. Our visitors also contribute stories and/or
tips that add to the content on our domain. In short, our content/media is 100% original. A site that often appears alongside us in the SERPs in the markets where we work full-time is accidentin.com. They are a site that syndicates accident news and offers little original content. (They also allow users to submit their own accident stories, and the entries index quickly and are sometimes viewed by hundreds of people in the same day. What's perplexing is that these entries are isolated incidents that have little to no media value, yet they do extremely well.) (I don't rest my bets with Quantcast figures, but accidentin does use their pixel sourcing and the figures indicate that they are receiving up to 80k visitors a day in some instances.) I understand that it's common to see news sites syndicate from the AP, etc., and traffic accident news is not going to have a lot of competition (in most instances), but the real shocker is that accidentin will sometimes appear as the first or second result above the original sources??? The question: does anyone have a guess as to what is making it perform so well? Are they bound to fade away? While looking at their model, I'm wondering if we're not silly to syndicate news in the states where we don't have actual staff? It would seem we could attract more traffic by setting up syndication in our vacant states. OR Is our competitor's site bound to fade away? Thanks, gang, hope all of you have a great 2013! Wayne0 -
Does it fall under cloaking in pagination?
When i am trying to implement rel=next and prev tag in my pages and due to prefetching feature of firefox browser some how more calls are coming to my server for one page and its effecting my page performance. Solution that i can think of is 1. Increase my server capacity to handle it smoothly - not possible to invest for this change 2. Show this tags only when bot crawls the pages and not when user is coming through browser. My question is does option 2 fall under cloaking ?
White Hat / Black Hat SEO | | Myntra0 -
Why doesn't Google find different domains - same content?
I have been slowly working to remove near duplicate content from my own website for different locals. Google seems to be doing noting to combat the duplicate content of one of my competitors showing up all over southern California. For Example: Your Local #1 Rancho Bernardo Pest Control Experts | 858-352 ...
White Hat / Black Hat SEO | | GerryWeitz<cite>www.pestcontrolranchobernardo.com/</cite>CachedYou +1'd this publicly. UndoPest Control Rancho Bernardo Pros specializes in the eradication of all household pests including ants, roaches, etc. Call Today @ 858-352-7728. Your Local #1 Oceanside Pest Control Experts | 760-486-2807 ...
<cite>www.pestcontrol-oceanside.info/</cite>CachedYou +1'd this publicly. UndoPest Control Oceanside Pros specializes in the eradication of all household pests including ants, roaches, etc. Call Today @ 760-486-2807. The competitor is getting high page 1 listing for massively duplicated content across web domains. Will Google find this black hat workmanship? Meanwhile, he's sucking up my business. Do the results of the competitor's success also speak to the possibility that Google does in fact rank based on the name of the url - something that gets debated all the time? Thanks for your insights. Gerry
0