Google: How to See URLs Blocked by Robots?
-
Google Webmaster Tools says we have 17K out of 34K URLs that are blocked by our Robots.txt file.
How can I see the URLs that are being blocked?
Here's our Robots.txt file.
User-agent: * Disallow: /swish.cgi Disallow: /demo Disallow: /reviews/review.php/new/ Disallow: /cgi-audiobooksonline/sb/order.cgi Disallow: /cgi-audiobooksonline/sb/productsearch.cgi Disallow: /cgi-audiobooksonline/sb/billing.cgi Disallow: /cgi-audiobooksonline/sb/inv.cgi Disallow: /cgi-audiobooksonline/sb/new_options.cgi Disallow: /cgi-audiobooksonline/sb/registration.cgi Disallow: /cgi-audiobooksonline/sb/tellfriend.cgi Disallow: /*?gdftrk
-
It seems you might be asking two different questions here, Larry.
You ask which URLs are blocked by your robots file. You then answered your own question by listing the entries in your robots file which are the actual URLs that it is blocking.
If in fact what you want to know is which pages exist on your website but are not currently indexed, that's a much bigger question and requires a lot more work to answer.
There is no way Webmaster Tools can give you that answer, because if it was aware of the URL it would already be indexing it.
HOWEVER! It is possible to do it if you are willing to do some of the work on your own to collect and manipulate data using several tools. Essentially, you have to do it in three steps:
- create a list of all the URLs that Google says are indexed. (This info comes from Google's SERPs.)
- then create a separate list of all of the URLs that actually exist on your website. (This must come from a 3rd-party tool you run against your site yourself.)
- From there, you will use Excel to subtract the indexed URLs from the known URLs, leaving a list of non-indexed URLS, which is what you asked for.
I actually laid out this process step-by-step in response to an earlier question, so you can read the process there http://www.seomoz.org/q/how-to-determine-which-pages-are-not-indexed
Is that what you were looking for?
Paul
-
Okay, well the robots.txt will only be excluding robots from the folders and URLs specified and as I say, there's no way to download a list of all the URLs that Google is not indexing from webmaster tools.
If you have exact URLs in mind which you think might be getting excluded, you can test individual URLs in Google Webmaster Tools in:
Health > Blocked URLs > URLs Specify the URLs and user-agents to test against.
Beyond this, if you want to know if there are URLs that shouldn't be excluded in the folders you have specified, I would run a crawl of your website using SEOMoz' crawl test or Screaming Frog. Then sort the URLs alphabetically and make sure that all of the URLs in the folders you have excluded via robots.txt are ones that you want to exclude.
-
I want to make sure that Google is indexing all of our pages we want them to. I.E. That all of the NOT indexed URLs are valid.
-
Hi Larry
Why do you want to find those URLs out for my understanding? Are you concerned that the robots.txt is blocking URLs it shouldn't be?
As for downloading a list of URLs which aren't indexed from Google Webmaster Tools, which is what I think you would really like, this isn't possible at the moment.
-
Liz; Perhaps my post was unclear or I am misunderstanding your answer.
I want to find out the specific URLs that Google says it isn't indexing because of our Robots.txt file.
-
If you want to see if Google has indexed individual pages which are supposed to be excluded, you can check the URLs in your robots.txt using the site: command.
E.g. type the following into Google:
site:http://www.audiobooksonline.com/swish.cgi
site:http://www.audiobooksonline.com/reviews/review.php/new/
...continue for all the URLs in your robots.txtJust from searching on the last example above (site:http://www.audiobooksonline.com/reviews/review.php/new/) I can see that you have results indexed. This is probably because you added the robots.txt after it was already indexed.
To get rid of these results you need to take the culprit line out of the robots.txt, add the robots meta tag set to noindex to all pages you want removed, submit a URL removal request via webmaster tools, check it has been nonidexed then you can add the line back into the robots.txt.
This is the tag:
I hope that makes sense and is useful!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does google ignore ? in url?
Hi Guys, Have a site which ends ?v=6cc98ba2045f for all its URLs. Example: https://domain.com/products/cashmere/robes/?v=6cc98ba2045f Just wondering does Google ignore what is after the ?. Also any ideas what that is? Cheers.
Intermediate & Advanced SEO | | CarolynSC0 -
Missing Google verification
I just went to check my client sites in Google search console and noticed a whole bunch of them no longer 'verified'. They were all previously verified. Why would they suddenly change status to 'not verified'? Does this affect anything (eg. search analytics data flowing through to GA)? Does this mean I have to verify all over again?
Intermediate & Advanced SEO | | muzzmoz0 -
How To Organise my URLS - Which is Optimal?
Hi all, I am currently in the process of re-writing my companies website URL structure. Compared to the way the website is structured at the minute, there's going to be a lot more URL's as the previous structure has missed out on a lot of search avenues that i intend to include within the rebuild. one of my issues is basically deciding under which category certain URL's come under, I can think of reasons for both sides but can't quite decide on which is optimal. My company is an automotive/car dealer so we sell cars for certain manufactures as well as offering a number of other services. what I'm curious about is what makes more sense in terms of the category that comes first in the URL. Here's what I am torn between; /(car manufacturer)/servicing OR /servicing/(car-manufacturer) To give you some more info that might influence the decision; In terms of generic keyword targeting, the majority would search in the order of '(car manufacturer) service' as opposed to 'service for (car manufacturer)'. Currently on our site, the sections /(manufacturer) are some of the most authoritative pages that we have on the website, but we've done very little work on /service in the past. For me, this would suggest that naturally the pages flowing from that URL would get an advantage in terms of authority/ranking. With either URL structure, the URL's are eventually going to cross paths - I just need to decide which one is best and should therefore feature first. Hopefully this is somewhat clear. I'd appreciate any suggestions or if you don't quite understand what I'm asking for then general URL advice is also appreciated. Many thanks Sam
Intermediate & Advanced SEO | | Sandicliffe0 -
Does putting a Google custom search box on make Google think my users are bouncing?
I added a Google custom search box to my pages, that's doing an advanced Google search. A lot of people are using it. So users are coming to my site from a Google search, and then often performing another Google search on my site. Should I be worried that Google may interpret the resultant user behavior as a bounce or pogo-stick? Or will the fact that the second search occurred on my site, using custom search, and with advanced parameters signal to Google that this is not a dissatisfied user returning to Google? Thanks
Intermediate & Advanced SEO | | GilReich0 -
Does URL format affect Keyword effectiveness for a URL?
I am looking at our site structure, and don't want to have to rebuild the way the site was linked together based on it's current folder structure so I am wondering what option would work better for our URL structure. I will uses car categories as an example of what I am talking about, but you can insert any category structure you like. For example I would like to have pages like this: www.example.com/ford-convertibles
Intermediate & Advanced SEO | | SL_SEM
www.example.com/chevy-convertibles But instead due to the site structure I will need to have pages like this: www.example.com/ford/convertibles
www.example.com/chevy/convertibles But wonder if I shouldn't do the following to ensure the proper phrase is known for the page: www.example.com/ford/ford-convertibles
www.example.com/chevy/chevy-convertibles The "/ford/ford-convertibles" just seems odd to me as a human, but I haven't seen anything on how well a keyphrase in a URL split by /'s does and I know dashes for phrases are fine. This means I am inclined to go with the"/ford/ford-convertibles"style because it keeps the keyphrase separated by dashes even if it is a bit repetitive. There will be other pages too like "/ford/top-10-fords-ever" but I don't wonder about that since it isnt "ford/ford-xxxxx" Thoughts on whether /'s in a keyphrase are as good as dashes?0 -
Links on Google Notebook
I have used OSE to look at links of a competitors site and notice they have dozens for links from Google Notebook pages eg http://www.google.pl/notebook/public/05275990022886032509/BDQExDQoQs8r3ls4j This page has a PA of 48 Is this a legitimate linking strategy?
Intermediate & Advanced SEO | | seanmccauley0 -
Site: on Google
Hello, people. I have a quick question regarding search in Google. I use search operator [site:url] to see indexing stauts of my site. Today, I was checking indexing status and I found that Google shows different numbers of indexed pages depends on search setting. 1. At default setting (set as 10 search result shows) > I get about 150 pages indexed by Google. 2. I set 100 results shows per page and tried again. > I get about 52 pages indexed by Google. Of course I used same page URL. I really want to know which data is accurate. Please help people!!
Intermediate & Advanced SEO | | Artience0 -
Google Places - How do we rank
So, google places showing up on search results is great feature . . . But how can we get our results to the top? I mean I can see some terrible websites appearing at the top of the google places with their places page having no activity whatsoever. Is there a trick to this at all? What can we do to increase our ranking on Google Places because our old GOOD rankings are now appearing BELOW the map results Cheers
Intermediate & Advanced SEO | | kayweb0