Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Disallowed Pages Still Showing Up in Google Index. What do we do?
-
We recently disallowed a wide variety of pages for www.udemy.com which we do not want google indexing (e.g., /tags or /lectures). Basically we don't want to spread our link juice around to all these pages that are never going to rank. We want to keep it focused on our core pages which are for our courses.
We've added them as disallows in robots.txt, but after 2-3 weeks google is still showing them in it's index. When we lookup "site: udemy.com", for example, Google currently shows ~650,000 pages indexed... when really it should only be showing ~5,000 pages indexed.
As another example, if you search for "site:udemy.com/tag", google shows 129,000 results. We've definitely added "/tag" into our robots.txt properly, so this should not be happening... Google showed be showing 0 results.
Any ideas re: how we get Google to pay attention and re-index our site properly?
-
The last time I used a tool, excluding via robots.txt was also sufficient for URL removal.
Recently, Google has updated their documentation to strongly encourage you to use URL removal only for things like exposing confidential information, and not to clean up old pages or errors in your GWT account (see http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1269119). I know many people still use the tool for that type of stuff, but wanted to point out that change.
-
Thank you Keri.
Yes, good idea, but whatever you request, that page or directory must respond with a 404, otherwise, it will be ignored.
- that is why I couldn't do that with the send to a friend URLs
(would have been a nice thing to do)
I guess I could have cheated, and made them return a 404 if it was google, just to dump them all out of the index.
The 15,000 I did request to be removed were individual pages, that returned 404 response code, so thats why I did them one at a time. I could have waited, but if you wait, then google keeps trying to fetch those missing pages and they keep reporting them in your GWT.
That is a good reason to request the removals.
I actually gave up when the number of deletions got to 1.5 million. I figured it was just too hard to do.
-
The last time I looked, you can request removal of an entire directory as well, which should work for the OP.
-
I would have said the same thing, except that a few weeks ago, I removed a rule from the robots file and I changed the affected pages to have a noindex.nofollow and the next day, tens of thousands of those pages appeared in the index and overpowered the content pages.
So my advice, is don't trust noindex,nofollow and just stop the robot going down that tree (as you are doing) and find another way to get those pages out of the index.
You can use the URL removal request tool.
It only seems to allow you to remove 1000 per day.
I have done this before by automating the removal using a macro program.
I think I removed about 15,000 over the space of a month, doing that.
They are fairly fast at removing URLs these days, 24 hours or less.
-
Disallowing in your robots.txt keeps the bots from indexing your pages going forward, but Google may keep returning them in search results. This post has great explanations about ways to remove pages from indices: http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
The surefire way to get them out of the index is to remove the disallow from your robots.txt, and add a meta noindex tags on all the pages you want removed. Once they're reindexed by Google, they'll no longer appear in SERPs.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Are In-Page Tabs still detrimental to SEO?
Hi Mozers, Are in-page tabs still detrimental for SEO? In-page tabs: allow you to alternate between views within the same context, not to navigate to different areas. As in one long HTML page that just looks like it's divided into different pages via tabs that you can click between. Each tab has it's own URL, which I guess is for analytics tracking purposes? https://XXX https://XXX?qt-staff_profile_tabs=1 https://XXX?qt-staff_profile_tabs=2 https://XXX?qt-staff_profile_tabs=3
Intermediate & Advanced SEO | | yaelslater0 -
How do internal search results get indexed by Google?
Hi all, Most of the URLs that are created by using the internal search function of a website/web shop shouldn't be indexed since they create duplicate content or waste crawl budget. The standard way to go is to 'noindex, follow' these pages or sometimes to use robots.txt to disallow crawling of these pages. The first question I have is how these pages actually would get indexed in the first place if you wouldn't use one of the options above. Crawlers follow links to index a website's pages. If a random visitor comes to your site and uses the search function, this creates a URL. There are no links leading to this URL, it is not in a sitemap, it can't be found through navigating on the website,... so how can search engines index these URLs that were generated by using an internal search function? Second question: let's say somebody embeds a link on his website pointing to a URL from your website that was created by an internal search. Now let's assume you used robots.txt to make sure these URLs weren't indexed. This means Google won't even crawl those pages. Is it possible then that the link that was used on another website will show an empty page after a while, since Google doesn't even crawl this page? Thanks for your thoughts guys.
Intermediate & Advanced SEO | | Mat_C0 -
Google Indexing Request - Typical Time to Complete?
In Google Search Console, when you request the (re) indexing of a fetched page, what's the average amount of time it takes to re-index and does it vary that much from site to site or are manual re-index request put in a queue and served on a first come - first serve basis despite the site characteristics like domain/page authority?
Intermediate & Advanced SEO | | SEO18050 -
No Index thousands of thin content pages?
Hello all! I'm working on a site that features a service marketed to community leaders that allows the citizens of that community log 311 type issues such as potholes, broken streetlights, etc. The "marketing" front of the site is 10-12 pages of content to be optimized for the community leader searchers however, as you can imagine there are thousands and thousands of pages of one or two line complaints such as, "There is a pothole on Main St. and 3rd." These complaint pages are not about the service, and I'm thinking not helpful to my end goal of gaining awareness of the service through search for the community leaders. Community leaders are searching for "311 request service", not "potholes on main street". Should all of these "complaint" pages be NOINDEX'd? What if there are a number of quality links pointing to the complaint pages? Do I have to worry about losing Domain Authority if I do NOINDEX them? Thanks for any input. Ken
Intermediate & Advanced SEO | | KenSchaefer0 -
Should I use noindex or robots to remove pages from the Google index?
I have a Magento site and just realized we have about 800 review pages indexed. The /review directory is disallowed in robots.txt but the pages are still indexed. From my understanding robots means it will not crawl the pages BUT if the pages are still indexed if they are linked from somewhere else. I can add the noindex tag to the review pages but they wont be crawled. https://www.seroundtable.com/google-do-not-use-noindex-in-robots-txt-20873.html Should I remove the robots.txt and add the noindex? Or just add the noindex to what I already have?
Intermediate & Advanced SEO | | Tylerj0 -
Page position dropped on Google
Hey Guys, My web designer has recommended this forum to use, the reason being: my google position has been dropped from page 1 to page 10 in the last week. The site is weloveschoolsigns.co.uk, but our main business site is textstyles.co.uk the school signs are a product of text styles. I have been told off my SEO company, that because I have changed the school logo to the text styles logo, Google have penalised me for it, and dropped us from page 1 for numerous keywords, to page 10 or more. They have also said that duplicate content within the school site http://www.weloveschoolsigns.co.uk/school-signs-made-easy/ has also a contributed to the drop in positions. (this content is not on the textstyles site) Lastly they said, that having the same telephone number is a definate no no. They said that I have been penalised, because google see the above as trying to monopolise on the market. I don’t know if all this is true, as the SEO is way above my head, but they have quoted me £1250 to repair all the errors, when the site only cost £750. They have also mentioned that because of the above changes, the main text styles site will also be punished. Any thoughts on this matter would be much appreciated as I don't know whether to pay them to crack on, or accept the new positions. Either way I'm very confused. Thanks Thomas
Intermediate & Advanced SEO | | TextStylesUK0 -
Does Google index url with hashtags?
We are setting up some Jquery tabs in a page that will produce the same url with hashtags. For example: index.php#aboutus, index.php#ourguarantee, etc. We don't want that content to be crawled as we'd like to prevent duplicate content. Does Google normally crawl such urls or does it just ignore them? Thanks in advance.
Intermediate & Advanced SEO | | seoppc20120 -
Tool to calculate the number of pages in Google's index?
When working with a very large site, are there any tools that will help you calculate the number of links in the Google index? I know you can use site:www.domain.com to see all the links indexed for a particular url. But what if you want to see the number of pages indexed for 100 different subdirectories (i.e. www.domain.com/a, www.domain.com/b)? is there a tool to help automate the process of finding the number of pages from each subdirectory in Google's index?
Intermediate & Advanced SEO | | nicole.healthline0