How to Safely Scrape Google Results?
-
I've built a couple of small tools that I use personally, maybe 2 or 3 times per day.
Both tools scrape the top 10 results from Google and provide more details about each domain (like the SEOMoz Keyword Difficulty Tool).
Google seem to have banned my IP address for automated searches... can anyone tell me a safe way of scraping the google results? Is there a suitable API for this?
How do SEO Moz do this on such a huge scale?
-
As I doubt that the APIs have considerably improved since this blog post http://www.seomoz.org/blog/the-nasty-problem-with-scraping-results-from-the-engines, google scraping is still a big issue and necessary for our daily seo work.
Scraping savely can only work if you succeed in convincing Google that you're a "natural" user and not a scarping robot. How can you do that?
- Search with alternating IPs, from different locations using proxies from the countries where you'd like to scrape from
- don't send too many requests at once from the same source
Consider that, when requesting a URL, the browser sends various information elements to the server, containing, for example, your Operating System, browser version, referer, etc. - every element can and should be changed to virtually change your identity when executing a new search.
- change browsers, browser versions, operating system information, etc.
- take care when changing browser localization values (en-GB, en-US probably don't return the same results)
- have a good network of proxy servers ready to send the different requests with your different identities to
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What to do about one site dominating search results? (multiple pages ranking)?
Anybody have thoughts on dealing with search results where the same site gets listed multiple times? "weebly vs wix" is one example (same site #1-3, repetitive articles, not crazy high authority), but I see this now and then. I know Google likes variety, so it's weird for me to see results like this dominating search results. Thoughts? What gets these sites to take over the top rankings for a specific term? Any way to rise up in this situation, outside of the usual? Any tips on duplicating this kind of success?
Competitive Research | | davidwaring0 -
Is anyone else getting this search result?
One of our blog posts (http://dress.yournextshoes.com/celebrities-dresses-skirts-wind/) used to rank well for "windy skirt", but we're not ranking anymore. When I search for "windy skirt", all the top 10 results are Youtube videos. Is anyone else seeing this? gs1eVsn
Competitive Research | | Jantaro0 -
Majestic gives me a 24 situation and 24 trust flow. Seomoz just a total number of 7\. How come the difference? My ranking is still bad, so is Majestic crawling faster then google?
Hi, my total domain value number on SEOmoz is 7. In Majestic it is 24 situation and 24 trust flow. My ranking is still bad (page2) and my competitors have a lower trust/ situation flow in Majestic. But in Seomoz the're better. Is the conclusion that Majestic is more up to date then Google itself and that Seomoz is more inline with the google crawling? Because Majestic doesnt reflect my ranking. (ps I started with the domain for a month, and I only have some history in registration)
Competitive Research | | remkoallertz0 -
Sending autmated queries to Google hurting SEO?
Anyone have any ideas whether there could be a chance that a site might get penalized if it is sending automated queries to Google (ie, to check rankings)? I was reading the recently updated Google Webmaster Guidelines and saw on the section - "Quality guidelines - specific guidelines" that mentioned about sending automated queries to Google... Just wondering what are the chances that Google will actually penalize a site that sends automated queries (if they are able to identify which site is doing so in the first place)..
Competitive Research | | globalsources.com0 -
Image only site on top of Google
Hi Everyone, I'm trying to rank in Google for 'Hid xenon' in the netherlands, but there is one site above all results: http://bit.ly/qlsjne As you can see the site almost has no backlinks, and has not a single word in it's content, all images. it's a keyword only domain, and that's probably the only reason why it's ranking that high, but that means then that i can never get higher then him in Google because of it's domainname? Even when it's such a shitty site? Thank you, regards yannick
Competitive Research | | iwebdevnl0 -
Is it valuable for a local business to build links into its Google Place?
G'Day All, Almost all of my clients are geo-based small service-based businesses. I've noticed during my research that the google places for our competitors in 3 separate niches (3 different clients) seem to be the dominating results for almost all relevant keyword terms. I'm curious to see if anyone has actively tried to increase the ranking of a google place by building links into it. Is this something that anyone else sees value to for a local small business? I would love to get some thoughts. And for that matter I'm also curious to see if anyone thinks there might be value to optimizing a Facebook Fan Page or Yelp Business page. They all seem to be key drivers of traffic our client websites so I'm wondering how difficult it is to make them rank as opposed to a website. Thanks!
Competitive Research | | blahblahblah20150 -
Google Places - Client showed up before, now does not
This is a strange one, and I hope a few local experts are out there. My client basically has one major competitor in the market. The competitor is closer to downtown and he is out about 27 miles. A couple of months ago, if you searched on "biplane rides in atlanta" the places map in the SERPS would show two - my client and his competitor. Now, the initial local in-line serp just shows his competitor, zoomed in. If go to Google Maps and type in the same search, he is listed, but you first have to click show more results. Then, he's listed twice - one his airport address (which is the real one) and one his business registered address (his house). How would I go about straightening this out? My client is #1 in the natural SERPS, it's just this local thing drives us crazy. If anyone can figure this out, you may walk away with a biplane ride next time you're in Atlanta! Thanks, Charles
Competitive Research | | Chas-2957210 -
How much weight does Google give to Exact Match Domains?
I'm building a site on a virtual host and now it's ready to go online, but i still have to choose a domain name. One of the main keywords i want to rank for is a 3-word keyword phrase with 9000+ exact match searches per month. Here's an example to better understand my question: 'Guitar training lessons' My main competitor's domain is only 5 months old but it does have the full keyword phrase in it with '4u' added at the end: www.guitartraininglessons4u.com I wanted to go with www.guitartrainingcenter.com (notice that 'lessons' is left out of the domain name) but i'm wondering if my main competitor would have a big advantage by having the full keyword phrase in his domain. How much weight does google give to sites that have the exact search query in their domain name? Does a domain still qualify as 'exact match' if a word (info) is added to it? How much harder would it be to outrank this domain as apposed to a site that doesn't have the keywords in its domain name? Thanks in advance Freek
Competitive Research | | ZeroGrav1