Disallowed Pages Still Showing Up in Google Index. What do we do?

udemy

We recently disallowed a wide variety of pages for www.udemy.com which we do not want google indexing (e.g., /tags or /lectures). Basically we don't want to spread our link juice around to all these pages that are never going to rank. We want to keep it focused on our core pages which are for our courses.

We've added them as disallows in robots.txt, but after 2-3 weeks google is still showing them in it's index. When we lookup "site: udemy.com", for example, Google currently shows ~650,000 pages indexed... when really it should only be showing ~5,000 pages indexed.

As another example, if you search for "site:udemy.com/tag", google shows 129,000 results. We've definitely added "/tag" into our robots.txt properly, so this should not be happening... Google showed be showing 0 results.

Any ideas re: how we get Google to pay attention and re-index our site properly?

KeriMorgret

The last time I used a tool, excluding via robots.txt was also sufficient for URL removal.

Recently, Google has updated their documentation to strongly encourage you to use URL removal only for things like exposing confidential information, and not to clean up old pages or errors in your GWT account (see http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1269119). I know many people still use the tool for that type of stuff, but wanted to point out that change.

loopyal

Thank you Keri.

Yes, good idea, but whatever you request, that page or directory must respond with a 404, otherwise, it will be ignored.

that is why I couldn't do that with the send to a friend URLs

(would have been a nice thing to do)

I guess I could have cheated, and made them return a 404 if it was google, just to dump them all out of the index.

The 15,000 I did request to be removed were individual pages, that returned 404 response code, so thats why I did them one at a time. I could have waited, but if you wait, then google keeps trying to fetch those missing pages and they keep reporting them in your GWT.

That is a good reason to request the removals.

I actually gave up when the number of deletions got to 1.5 million. I figured it was just too hard to do.

KeriMorgret

The last time I looked, you can request removal of an entire directory as well, which should work for the OP.

loopyal

I would have said the same thing, except that a few weeks ago, I removed a rule from the robots file and I changed the affected pages to have a noindex.nofollow and the next day, tens of thousands of those pages appeared in the index and overpowered the content pages.

So my advice, is don't trust noindex,nofollow and just stop the robot going down that tree (as you are doing) and find another way to get those pages out of the index.

You can use the URL removal request tool.

It only seems to allow you to remove 1000 per day.

I have done this before by automating the removal using a macro program.

I think I removed about 15,000 over the space of a month, doing that.

They are fairly fast at removing URLs these days, 24 hours or less.

john4math

Disallowing in your robots.txt keeps the bots from indexing your pages going forward, but Google may keep returning them in search results. This post has great explanations about ways to remove pages from indices: http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts

The surefire way to get them out of the index is to remove the disallow from your robots.txt, and add a meta noindex tags on all the pages you want removed. Once they're reindexed by Google, they'll no longer appear in SERPs.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Disallowed Pages Still Showing Up in Google Index. What do we do?

Got a burning SEO question?

Explore more categories

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved