Some bots excluded from crawling client's domain
-
Hi all!
My client is in healthcare in the US and for HIPAA reasons, blocks traffic from most international sources.
a. I don't think this is good for SEO
b. The site won't allow Moz bot or Screaming Frog bot to crawl it. It's so frustrating.
We can't figure out what mechanism they are utilizing to execute this. Any help as we start down the rabbit hole to remedy is much appreciated.
thank you!
-
The main reason it's not good is that Google crawl from different data-centers around the world. So one day they may think the site is up, then the next they may think the site is gone and down
Typically you use a user-agent lance to pierce these kinds of setups. Screaming Frog for example, you can pre-select from a variety of user-agents (including 'googlebot' and Chrome) but you can also author or write your own user-agent
Write a long one that looks like an encryption key. Tell your client the user agent you have defined, let them create and exemption for it within their spam-defense system. Insert the user-agent (which no one else has or uses) into Screaming Frog, use it to allow the crawler to pierce the defense grid
Typically you would want to exempt 'Googlebot' (as a user agent) from these defense systems, but it comes with a risk. Anyone with basic scripting knowledge or who knows how to install Chrome extensions, can alter the user-agent of their script (or web browser, it's under the user's control) with ease and it is widely known that many sites make an exception for 'Googlebot' - thus it becomes a common vulnerability. For example, lots of publishers create URLs which Google can access and index, yet if you are a bog standard user they ask you to turn off ad-blockers or pay a fee
Download the Chrome User-Agent extension, set your user-agent to "googlebot" and sail right through. Not ideal from a defense perspective
For this reason I have often wished (and I am really hoping someone from Google might be reading) that in Search Console, you could tell Google a custom user-agent string and give it to them. You could then exempt that, safe in the knowledge that no one else knows it, and Google would use your own custom string to identify themselves when accessing your site and content. Then everyone could be safe, indexable and happy
We're not there yet
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Another company's website indexing for my site
Hi, I am looking at all the pages which Google are indexing for my website and have come across pages of another company's website. I have contacted them through their online form and Facebook page asking for them to remove their listings for us, but to no avail so far. Is there a way I can do this myself?
Technical SEO | | British-Car-Registrations0 -
Is repurposing an old sub domain better than creating a new sub domain?
We have a good sub domain like** art.ourwebsite.com** which currently sells custom canvas art. We have owned the domain since 2013 but it has only been live for the past few weeks. We want to redesign & repurpose the page to continue to sell custom canvas art but will eventually include other merchandise like mugs, tshirts, etc which wouldn't be custom. Would it be best to keep art.ourwebsite.com since is a shorter/more memorible & older sub domain or would it be best to update the name to something that encompasses our new products? Our marketing team has suggested yourart.ourwebsite.com
Technical SEO | | sb10301 -
Drupal's Yoast
Hi. I'm wondering if anyone knows of an equivalent to Yoast for Drupal sites? Is there such a thing? I've been asked whether I could optimize a Drupal site and am wondering if the guiding principles and techniques I use for HTML and Wordpress sites can be easily transferred to a Drupal implementation, or whether I might be setting myself (and the client!) up for failure. Any observations or advice would be appreciated.
Technical SEO | | DonnaDuncan0 -
Hard-working newbie question: benefit of moving my blog to my online store's domain?
Hi all, I've been running an online wine store in Switzerland for a month and have been working hard on SEO (I love learning about it). Anyway, for a couple of years prior to launching the store, I had been running a wine blog whose articles are ranking well in Google. I now want to link the two. My questions are: A) will the addition of the blog (store.com/blog) contribute to the store's domain authority (currently, the blog authority is higher than the site authority)? B) technically, can I 301 the whole blog to store.com/blog? Any help and tips would be appreciated. Thank you!
Technical SEO | | fkupfer0 -
Domain Switch - With lost control of original domain.
Hey all, A client finally sold a domain name after being harassed to sell for many years, without talking to us about it first. They moved the site to a new domain, and the purchasing company took over the original domain. Then they called me, wondering why the site is no longer showing up in Google. I've done some initial research, and everything I find for advice assumes that you have control over the original domain. We don't. I'm hoping someone here has some creative advice, so we don't have to start from the beginning, and/or painfully update links we've acquired. My only thought was that the new company may be kind enough to post 301's for us if we provided them.... Any thoughts / advice / life rings will be greatly appreciated! 🙂
Technical SEO | | KBK0 -
Domains
My questions is what to do with old domains we own from a past business. Is it advantages to direct them to the new domain/company or is that going to cause a problem for the new company. They are not in the same industry.
Technical SEO | | KeylimeSocial0 -
Using hyphenated sub-domains or non-hyphenated sub-domains? What is the question! I Any takers?
For our corporate business level domain, we are exploring using a hyphenated sub-domain foir a project. Something like www.go-figure.extreme.com I thought from a user perspective it seems cluttered. The domain length might also be an issue with the new Algorithm big G has launched in recent past. I know with past experience, hyphenated domains usually take longer to index, as they are used by spammers more frequently and can take longer to get out of the supplementary index. Our company site has over 90 million viewers / year, so our brand is well established and traffic isn't an issue. This is for a corporate level project and I didn't have the answer! Will this work? anyone have any experience testing this. Any thoughts will help! Thanks, Rob
Technical SEO | | RobMay0 -
Does 'framing' a website create duplicate content?
Something I have not come across before, but hope others here are able offer advice based on experience: A client has independently created a series of mini-sites, aimed at targeting specific locations. The tactic has worked very well and they have achieved a large amount of well targeted traffic as a result. Each mini-site is different but then in the nav, if you want to view prices or go to the booking page, that then links to what at first appears to be their main site. However, you then notice that the URL is actually situated on the mini-site. What they have done is 'framed' the main site so that it appears exactly the same even when navigating through this exact replica site. Checking the code, there is almost nothing there - in fact there is actually no content at all. Below the head, there is a piece of code: <frameset rows="*" framespacing=0 frameborder=0> <frame src="[http://www.example.com](view-source:http://www.yellowskips.com/)" frameborder=0 marginwidth=0 marginheight=0> <noframes>Your browser does not support frames. Click [here](http://www.example.com) to view.noframes> frameset> Given that main site content does not appear to show in the source code, do we have an issue with duplicate content? This issue is that these 'referrals' are showing in Analytics, despite the fact that the code does not appear in the source, which is slightly confusing for me. They have done this without consultation and I'm very concerned that this could potentially be creating duplicate content of their ENTIRE main site on dozens of mini-sites. I should also add that there are no links to the mini-sites from the main site, so if you guys advise that this is creating duplicate content, I would not be worried about creating a link-wheel if I advise them to link directly to the main site rather than the framed pages. Thanks!
Technical SEO | | RiceMedia0