Download all GSC crawl errors: Possible today?
-
Hey guys:
I tried to download all the crawl data from Google Search Console using the API and solutions like this one: https://github.com/eyecatchup/php-webmaster-tools-downloads but seems that is not longer working (or I made something wrong, I just receive a blank page when running the PHP file after some load time)... I needed to download more than 1.000 URLs long time ago, so I didn't tried to use this method since then.
Is there any other solution using the API to grab all the crawl errors, or today this is not possible anymore?
Thanks!
-
Hi Antonio,
Not sure which language you prefer - but you can find some sample codes here: https://developers.google.com/webmaster-tools/v3/samples - I tried the python example which was quite well documented inside the code, I guess it's the same for the other languages. If I have some time I could give it a try - but it won't be before the end of next week (and based on python)
Dirk
-
Thanks Dirk. At the moment I couldn't find any alternative, so maybe will be a good idea put some hands on this.
If any other person solved this, would be great if can share it with us the solution -
The script worked for the previous version of the API - it won't work on the current version.
You try to search to check if somebody else has created the same thing for the new API - or build something your self - the API is quite well documented so it shouldn't be to difficult to do. I build a Python script for the Search Analytics part in less than a day (without previous knowledge of Python) so it's certainly feasible.rgds
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Having possible problems with rankings due to development website
Hi all, I've got an interesting issue and a bit of a technical challenge for you. It's a bit complicated to explain, but please bear with me. We have a client website (http://clientwebsite.com) which we are having a hard time ranking in the past few months. Main keywords simply don't show up in Top100 searches, even though we are constantly building backlinks through Guest Posts, Citations, Media mentions, Profile links etc. Normally, we use ahrefs to look at the client's website backlinks, but just today we used Majestic to look at the backlink profile and one backlink stood out. This is a backlink from a development server (http://developmentwebsite.com) which redirects to http://clientwebsite.com
Intermediate & Advanced SEO | | zakkyg
The developers who were working on the redesign of the client website, put it up on their server and forgot to delete it.
Also, the content inside the development website is almost identical with the client website. We then checked to see if http://developmentwebsite.com is indexed.
It's not. Although, inside the robots file http://developmentwebsite.com/robots.txt there's:
User-agent: *
Allow: /
The funny (and weird thing) is that http://developmentwebsite.com/ and all development website inner pages are not indexed in Google. But if we go to http://developmentwebsite.com/inner-page, it doesn't redirect to the corresponding http://clientwebsite.com/inner-page, it's the same development website page URL and the pages even have links to the client website, but like I said, none of the pages of the development website are indexed, even though crawlers are allowed in the robots.txt's development website. In your opinion, could this be the reason why we are having a hard time to rank the client website? Second question is:
How do we approach in solving this issue?
Do we simply delete the whole http://developmentwebsite.com with all the inner pages?
Or should we do 301 redirrects on a per-page basis?0 -
WordPress and Rich Snippets plugin creating 501 error
Good Morning MOZguru's, Right, so I've been trying to install the Google schema.org rich snippet plugin through Wordpress for a website, and after I activate it, the website does not load ( appears a blank page) or loads very very slooooowww. Also through the MOZBar, Http status section, after the plug in it's activated it shows a 501 error. I had this issues with another website I was working on, hosted by Godaddy, and fixed it by installing plugins through the control panel on go daddy and not through WordPress. However this website is not hosted on the same platform. Does anyone know what should I do in order for the plugin to work and not affect the website? Many thanks, Moncia
Intermediate & Advanced SEO | | monicapopa0 -
Internal page links and possible penalties
If one looks at a page on our client's website, (http://truthbook.com/urantia-book/paper-98-the-melchizedek-teachings-in-the-occident for example), there are a huge amount of links in the body of the page. All internal links are normal links. All external links arerel="nofollow" class="externallink" We have two questions: 1. Could we be being penalized by google for having too many links on these pages? Will this show i our webmaster reports? 2. If we are being penalized, can we keep the links (and have no penalty) if we made the internal links rel="nofollow" class="externallink" as well? We need these internal links to help people use these pages as an educational tool. This is why these pages also have audio and imagery. Thank you
Intermediate & Advanced SEO | | jimmyzig0 -
Should I let Google crawl my production server if the site is still under development?
I am building out a brand new site. It's built on Wordpress so I've been tinkering with the themes and plug-ins on the production server. To my surprise, less than a week after installing Wordpress, I have pages in the index. I've seen advice in this forum about blocking search bots from dev servers to prevent duplicate content, but this is my production server so it seems like a bad idea. Any advice on the best way to proceed? Block or no block? Or something else? (I know how to block, so I'm not looking for instructions). We're around 3 months from officially launching (possibly less). We'll start to have real content on the site some time in June, even though we aren't planning to launch. We should have a development environment ready in the next couple of weeks. Thanks!
Intermediate & Advanced SEO | | DoItHappy0 -
SEOMOZ crawler is still crawling a subdomain despite disallow
This is for our client with a subdomain. We only want to analyze their main website as this is the one we want to SEO. The subdomain is not optimized so we know it's bound to have lots of errors. We added the disallow code when we started and it was working fine. We only saw the errors for the main domain and we were able to fix them. However, just a month ago, the errors and warnings spiked up and the errors we saw were for the subdomain. As far as our web guys are concerned. the disallow code is still there and was not touched. User-agent: rogerbot Disallow: / We would like to know if there's anything we might have unintentionally changed or something we need to do so that the SEOMOZ crawler will stop going through the subdomain. Any help is greatly appreciated!
Intermediate & Advanced SEO | | TheNorthernOffice790 -
Somthing weird in my Google Webmaster Tools Crawl Errors...
Hey, I recently (this past may) redesigned my e-commerce site from .asp to .php. I am trying to fix all the old pages with 301 redirects that didn't make it in the switch, but I keep getting weird pages coming up in GWT. I have about 400 pages under crawl errors that look like this "emailus.php?id=MD908070" I delete them and they come back. my site is http://www.moondoggieinc.com the id #'s are product #'s for products that are no longer on the site, but the site is .php now. They also do not show a sitemap they are linked in or any other page that they are linked from. Are these hurting me? and how do I get rid of them? Thanks! KristyO
Intermediate & Advanced SEO | | KristyO0 -
Why is my competitor Torontoseogroup.com ranked 31 in Chrome, but position 2 in Firefox? How is this possible?
There is a website I am analyzing that ranks highly in firefox - position #2 on top page. But in Google they are ranked only on the top of the 4th page. How is this possible? Looks like some of the codes in the links are different. Why? Link from Firefox link: http://www.google.ca/#q=hamilton+web+design&hl=en&prmd=imvnsfd&ei=nIB3TtG_BYTe0QHm05HnCA&start=0&sa=N&bav=on.2,or.r_gc.r_pw.&fp=6448f668fd4b6f72&biw=1024&bih=627 Link from Google Chrome: http://www.google.ca/#q=hamilton+Web+Design&hl=en&prmd=imvnsfd&ei=4353TvWmAaPW0QHhq9zYBg&start=30&sa=N&bav=on.2,or.r_gc.r_pw.&fp=c8e758962267edc1&biw=1024&bih=673 http://www.google.ca/#q=hamilton+web+design&hl=en&prmd=imvnsfd&ei=nIB3TtG_BYTe0QHm05HnCA&start=0&sa=N&bav=on.2,or.r_gc.r_pw.&fp=6448f668fd4b6f72&biw=1024&bih=627
Intermediate & Advanced SEO | | websiteready1 -
How to prevent Google from crawling our product filter?
Hi All, We have a crawler problem on one of our sites www.sneakerskoopjeonline.nl. On this site, visitors can specify criteria to filter available products. These filters are passed as http/get arguments. The number of possible filter urls is virtually limitless. In order to prevent duplicate content, or an insane amount of pages in the search indices, our software automatically adds noindex, nofollow and noarchive directives to these filter result pages. However, we’re unable to explain to crawlers (Google in particular) to ignore these urls. We’ve already changed the on page filter html to javascript, hoping this would cause the crawler to ignore it. However, it seems that Googlebot executes the javascript and crawls the generated urls anyway. What can we do to prevent Google from crawling all the filter options? Thanks in advance for the help. Kind regards, Gerwin
Intermediate & Advanced SEO | | footsteps0