Why Moz OSE, Ahrefs, Majestic and so on, don't change their user agent while crawling?
-
Some blackhat websites, PBNs and other "cheaters" are using various methods to effectively block third party backlink checker bots (OSE, Ahrefs, Majestic...) : robot.txt, IP and such.
A simple solution for those bots would be to mimic Google by using its user agent string for example.
Or if not legally permitted (which I doubt) use some kind of randomness in user agent strings, urls, and IPs in order to prevent blocking.This should not be a big deal IMHO, am I missing something obvious ?
-
The ethics of the Internet dictate that you
- crawl politely,
- obey robots.txt and
- properly identify yourself
This isn't a new issue. Link networks and sites have blocked crawlers and manipulated Google for years. Fortuneatly, it's only a small fraction of the web. Also, it unlikely links from those networks have much value, so crawl priority would be super low anyway.
Actually, it could be viewed as beneficial when blackhat sites block OSE and aHrefs, because those sites often get penalized by Google, but 3rd party crawlers have no way to know this, so blocking effectively keeps them out of the indexes.
-
Well, I think bot blocking is an obvious problem even now, and will be more important tomorrow with all private networks as you can imagine.
MOZ (and others) should find and implement the best possible solution, I see no problem with TAGFEE as soon as you are transparent with regards to the fact that your bots are undetectable.
I understand that what I'm proposing is maybe not best nor wanted solution, but the problem must be addressed or OSE will soon have no value at all
What do you propose ?
-
I agree with George here -- we'd hear a huge outcry if we pretended to be Googlebot or a different bot. We'd also likely get blocked, as sometimes people only let in a certain few known bots/IPs to crawl their site. If we changed user agents and IPs regularly, it would not be cool or TAGFEE.
-
What about using different user agents and IPs regurarly in order to avoid detection ?
Is there any acceptable other solution ?
-
The reputation and integrity of the major players would be at stake here. If they changed their user agent identification (to spoof Googlebot or Bing or whatever) that could be detected, and they would be castigated. The crawler IP address and its user agent ID would be out of sync...
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does Adding Affiliate ID's to Link Affects it's Value
Hello, I have a question about adding Affiliate ID's to links. I receive a DoFollow link to my website with an affiliate ID in order to track the leads and the traffic that comes from each affiliate link/website. does this link (with the affiliate ID) pass Juice from the affiliate website? (I use canonical link element on my website pages)? how does Google deals with links with affiliate IDs ? is adding Affiliate ID to links reduces it's value?
Link Building | | JonsonSwartz0 -
Why aren't my blog links counting?
In our blog posts, we frequently include links to our product pages on our official site. However, the blog root domain is not showing up on Open Site Explorer for any of these product pages. In other words, why isn't Google counting our blog as a unique root domain? If it helps, here is the link to the blog: http://jampaper.wordpress.com/ Thanks for your help!
Link Building | | jampaper0 -
Blogroll links vs. in author's byline
So, I have the following dilemma. I have certain amount on my budget and I'm thinking where to invest it better. Would you recommend obtaining blogroll links or focus on links that put in author's byline (for instance when you write a guest post). Could you also explain why you think so? 🙂 Thanks beforehand.
Link Building | | VinceWicks0 -
Links from Real Estate Websites Link Page, Do or Don't?
It seems that real estate sites have at times been under heavier scrutiny from the search engines. But real estate agents get networking, which makes acquiring those links a little bit easier. Plus with localization of search, I think real estate sites may give off a good GEO scent. Should SEO's go for links from real estate websites? Does a reciprocal link devalue said links?
Link Building | | Thos0030 -
How would one get on the Huff Po's "Around the Web" list?
On the huffington post they often have lists of links related to the article, near the bottom just before the comments. This section, when present, is called "Around the Web". See this as an example: http://www.huffingtonpost.com/linda-bacon-phd-ma-ma/health-at-every-size_b_1314339.html Do the article authors choose these links, or are they auto-generated somehow? I had a relatively small competitor appear here, so I'm wondering if there are any tricks to gett included. Thanks! Edited to add hyperlink
Link Building | | AdoptionHelp0 -
JavaScript is crawled by search engines, isn’t it? Does it mean that links embedded in JavaScript pass link juice?
I wonder If links embedded in JavaScript from an external Website pass link juice to the linked page and thus have a positive effect on google rankings. I read that JavaScipt is craweld. Does it mean that also the link juice is passed? I'm looking forward to your answers.
Link Building | | Tabea0 -
What's the typical response time to link building email requests?
Hello Forum, We're about to embark on a link building campaign and were curious about how long, on average, it takes to get a response to an email requesting links to our page. We're trying to come up with a timeline estimate for our campaign. Thanks
Link Building | | pano0 -
Many competitor's backlinks are in content anchor links. Ho do I get these same links?
Hi I managed to open up OSE. I'm finding that much of the competition's backlinks are in content anchor text links. Am I supposed to get backlinks from these same pages using the same anchor text but linking back to my page or is that allowed? If so, how do I get these in content blog post links? Thanks. Sunil.
Link Building | | sunilmuse0