Some bots excluded from crawling client's domain
-
Hi all!
My client is in healthcare in the US and for HIPAA reasons, blocks traffic from most international sources.
a. I don't think this is good for SEO
b. The site won't allow Moz bot or Screaming Frog bot to crawl it. It's so frustrating.
We can't figure out what mechanism they are utilizing to execute this. Any help as we start down the rabbit hole to remedy is much appreciated.
thank you!
-
The main reason it's not good is that Google crawl from different data-centers around the world. So one day they may think the site is up, then the next they may think the site is gone and down
Typically you use a user-agent lance to pierce these kinds of setups. Screaming Frog for example, you can pre-select from a variety of user-agents (including 'googlebot' and Chrome) but you can also author or write your own user-agent
Write a long one that looks like an encryption key. Tell your client the user agent you have defined, let them create and exemption for it within their spam-defense system. Insert the user-agent (which no one else has or uses) into Screaming Frog, use it to allow the crawler to pierce the defense grid
Typically you would want to exempt 'Googlebot' (as a user agent) from these defense systems, but it comes with a risk. Anyone with basic scripting knowledge or who knows how to install Chrome extensions, can alter the user-agent of their script (or web browser, it's under the user's control) with ease and it is widely known that many sites make an exception for 'Googlebot' - thus it becomes a common vulnerability. For example, lots of publishers create URLs which Google can access and index, yet if you are a bog standard user they ask you to turn off ad-blockers or pay a fee
Download the Chrome User-Agent extension, set your user-agent to "googlebot" and sail right through. Not ideal from a defense perspective
For this reason I have often wished (and I am really hoping someone from Google might be reading) that in Search Console, you could tell Google a custom user-agent string and give it to them. You could then exempt that, safe in the knowledge that no one else knows it, and Google would use your own custom string to identify themselves when accessing your site and content. Then everyone could be safe, indexable and happy
We're not there yet
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moving Shopify from a Sub Domain to the Full Domain
Apologies if this has been asked before. Currently we have a Shopify shop on a subdomain shop.lucybee.com a blog on subdomain blog.lucybee.com and the full domain of course. We'd now like to move everything onto Shopify on the full domain. Therefore the blog rolls into Shopify. We'll manually move pages into Shopify. Any advice or links to resources on how to manage would be gratefully received. Thank you , Jim
Technical SEO | | LucyBee0 -
My wepgages aren't crawled by google
Most of my webpages aren't crawled by google.
Technical SEO | | Poutokas
Why is that and what can i do to make google index at least most of my webpages?0 -
The importance of url's - are they that important?
Hi Guys I'm reading some very contrasting and confusing reviews regarding urls and the impact they have on a sites ability to rank. My client has a number of flooring products, 71 to be exact - categorised under three sub categories 1. Gallery Wood - 2. Prefinshed Wood - 3. Parquet & Reclaimed. All of the 71 products are branded products (names that are completely unrelated to specific keyword search terms. This is having a major impact regarding how we optimise the site. FOR EXAMPLE: A product of the floor called "White Grain" - the "Key Word" we would like to rank this page for is Brown Engineered Flooring. I'm interested to know, should the name of the branded product match the url? What would you change to help this page rank better for the keyword - Brown Engineered Flooring. Title page: White Grain Url: thecompanyname.com/gallery-wood/white-grain (white grain is the name of the product) Key Word: Brown Engineered Flooring **Seo Title: **White Grain, Brown Engineered Flooring by X Meta Description: BLAH BLAH Brown Engineered Flooring BLAH BLAH Any feedback to help get my head around this would be really appreciated. Thank you.
Technical SEO | | GaryVictory0 -
Unfindable 404's
So I have noticed that my site has some really strange 404's that are only being linked to from internal links from the site.
Technical SEO | | Adamshowbiz
When I go to the pages that Web master tools suggests I can't actaully find the link which is pointing to the 404. In that instance what do you do? Any help would be much appreciated 🙂0 -
Blocked URL's by robots.txt
In Google Webmaster Tools shows me 10,936 Blocked URL's by robots.txt and it is very strange when you go to the "Index Status" section where shows that since April 2012 robots.txt blocked many URL's. You can see more precise on the image attached (chart WMT) I can not explain why I have blocked URL's ? because I have nothing in robots.txt.
Technical SEO | | meralucian37
My robots.txt is like this: User-agent: * I thought I was penalized by Penguin in April 2012 because constantly i'am losing visitors now reaching over 40%. It may be a different penalty? Any help is welcome because i'm already so saturated. Mera robotstxt.jpg0 -
Redirecting domain to the main domain (hosting cost?)
Hello Everyone, I have the following situation. There is main domain and a secondary domain that is related to the page on the main domain. I want to integrate the content of the secondary domain into the page on the main domain and redirect the secondary domain via 301 to that specific page. As i understand I can do it via .htaccess using rewrite mechanism. http://www.seomoz.org/learn-seo/redirection But the question is does it mean I have to keep paying for the hosting for the secondary domain? Because htaccess has to be located on the web server so I would need a hosting plan for it? Is that true? Is there any way around it? P.S. to avoid any confusion - I am talking about hosting plan - not domain registration fees
Technical SEO | | SirMax0 -
My Domain Authority is high but don't rank in serps
So i'm a beginner/intermediate SEO and uptil about 3 weeks ago i enjoyed Top 3 rankings for all my keywords(VIrtual Assistant,Virtual Assistants, Virtual Personal Assistant,Virtual Personal Assistants and so on) for my site www.247VirtualAssistant.com. All of a sudden i dropped in rankings and can't figure out why. I ran a link analysis and nothing looks like it changed, in fact i still command much higher domain authority than my competition, but i'm stuck on the bottom of the 2nd page. I can't tell if i'm being penalized, if the other sites all of sudden just outperformed me or something else is happening here. I've also noticed a lot of "dancing" in my serps, I've been in 2nd last position on the 2nd page, then 1st of the third page, then last on the 2nd page and so on. Can someone please help me make sense of this?? Thanks! Thomas, a very confused an desperate website owner
Technical SEO | | Shajan0 -
ECommerce Site, URL's, Canonical and Tracking Referral Traffic
I'm very, very new to eCommerce websites that employ many different URL's to track referral traffic. I have a client that has 18 different URL's that land on the Home Page in order to track traffic from different referral sources. For example: http://erasedisease.com/?ref=abot - Tracks traffic from an affiliate source http://erasedisease.com/?ref=FB01 - Tracks traffic from a FB Ad http://erasedisease.com/?ref=sas&SSAID=289169 - Tracks more affiliate traffic ...and the list goes on and on. My first question is do you think this could hinder our Google rankings? SEOMoz Crawl doesn't show any Duplicate Content Errors, so I guess that's good. I've just been reading a lot about Canonical Url's and eCommerce sites, but I'm not sure if this is a situation where I'd want to use some kind of canonical plugin for this Wordpress website or not. Any advice would be greatly appreciated. Thanks so much!!
Technical SEO | | Linwright0