Stuck trying to deindex pages from google
-
Hi There,
We had developers put a lot of spammy markups in one of our websites. We tried many ways to deindex them by fixing it and requesting recrawls... However, some of the URLs that had these spammy markups were incorrect URLs - redirected to the right version, (ex. same URL with or without / at the end)
so now all the regular URLs are updated and clean, however, the redirected URLs can't be found in crawls so they weren't updated, and couldn't get the spam removed. They still show up in the serp.
I tried deindexing those spammed pages by making then no-index in the robot.txt file. This seemed to be working for about a week, and now they showed up again in the serp
Can you help us get rid of these spammy urls?
-
Ruchy,
Yeap it might had helped for a few weeks. But internal links from your site are not the only way to crawl all your pages. Remember that there may be other sites linking other pages.
B- Absolutely, adding noindex will help. There is no way to know for sure how long will it take, give it a few weeks. Also, it could help removing manually all those pages with the Google Search Console, as Logan said.
Hope it helps!.
GR -
Hi Gaston,
Thanks so much for taking your time to answer my question
here are two points - A- My mistake, in the robot.txt we disallowed it, and it was done right. Our devs did it for us and I double checked in in search console tester. Also, this idea did work for us the first few weeks.
B - There is no place the crawlers can find these pages to recrawl, as they are no longer linked from anywhere in my site. will adding the no index help? If yes, how long can it take?
-
I second what Gaston said. This usage of robots.txt is one of the most common misconceptions in SEO, so don't feel bad. Google actually explicitly says to not use robots.txt for index-prevention in their webmaster guide.
To add to Gaston's point, make sure you remove the robots.txt disallow when you add the meta noindex tag he provided. If you don't let them crawl the page, they won't see the tag.
You can also use remove these URLs temporarily in Search Console by going to the Google Index menu and selecting "Remove URLs". It'll remove from search results, then when they come back to crawl that page again (as long as you're letting them), they'll see your noindex tag and keep it out.
-
Hello Ruchy,
If by "making no-index" in the robots you are meaning _disallowing _them, you are making ir wrong.
Robots.txt are just signs to the robots and only tell them to NOT CRAWL them, it doesnt prevent from indexing those pages. (it can happen the case that there is a link pointing to that page and the crawler just passes by it).The most used way to remove certaing indexed pages is by adding the robots noindex meta tag, it should look like this:
Also, some useful links:
Robots meta directives - Moz
Robots meta tag - Google developers
Robots tag generatorHope it helps.
GR
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site:www.domainname.com - does not find homepage in Google (only inner pages - why?)
When I do a Google search on site:www.domainname.com, my clients homepage does not appear. Other inner pages do. The same thing happend a while ago and I did 'fetch by google' in Search Console. After that the homepage was indexed again when I did a site:www.domainname.com search. But now (2 weeks later), it's gone again. When I search on the brand name of the website in Google it does find the homepage. I don't know why it doesn't find the homepage when I do a site: search. Any ideas? [see images where you can see the problem] XTrDn 2doHF
Technical SEO | | robk1230 -
Why is google not deindexing pages with the meta noindex tag?
On our website www.keystonepetplace.com we added the meta noindex tag to category pages that were created by the sorting function. Google no longer seems to be adding more of these pages to the index, but the pages that were already added are still in the index when I check via site:keystonepetplace.com Here is an example page: http://www.keystonepetplace.com/dog/dog-food?limit=50 How long should it take for these pages to disappear from the index?
Technical SEO | | JGar-2203710 -
Off-page SEO and on-page SEO improvements
I would like to know what off-page SEO and on-page SEO improvements can be made to one of our client websites http://www.nd-center.com Best regards,
Technical SEO | | fkdpl2420 -
I have custom 404 page and getting so much 404 error on Google webmaster, what should i do?
I have a custom 404 page with popular post and category links in the page, everyday i have 404 crawl error on webmaster tools, what should i do?
Technical SEO | | rimon56930 -
Do Collections in Shopify create Duplicate Pages according to Google/Bing/Yahoo?
I'm using the e-commerce platform Shopify to host an e-store. We've put our products into different collections. Shopify automatically creates different URL paths to a product in multiple collections. I'm worried that the same product listed in different collections is soon as different pages, and therefore duplicate content by Google/Bing/Yahoo. Would love to get your opinion on this concern! Thanks! Matthew
Technical SEO | | HappinessDigital0 -
How can I tell Google, that a page has not changed?
Hello, we have a website with many thousands of pages. Some of them change frequently, some never. Our problem is, that googlebot is generating way too much traffic. Half of our page views are generated by googlebot. We would like to tell googlebot, to stop crawling pages that never change. This one for instance: http://www.prinz.de/party/partybilder/bilder-party-pics,412598,9545978-1,VnPartypics.html As you can see, there is almost no content on the page and the picture will never change.So I am wondering, if it makes sense to tell google that there is no need to come back. The following header fields might be relevant. Currently our webserver answers with the following headers: Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0, public
Technical SEO | | bimp
Pragma: no-cache
Expires: Thu, 19 Nov 1981 08:52:00 GMT Does Google honor these fields? Should we remove no-cache, must-revalidate, pragma: no-cache and set expires e.g. to 30 days in the future? I also read, that a webpage that has not changed, should answer with 304 instead of 200. Does it make sense to implement that? Unfortunatly that would be quite hard for us. Maybe Google would also spend more time then on pages that actually changed, instead of wasting it on unchanged pages. Do you have any other suggestions, how we can reduce the traffic of google bot on unrelevant pages? Thanks for your help Cord0 -
Duplicate Page Content and Title for product pages. Is there a way to fix it?
We we're doing pretty good with our SEO, until we added product listing pages. The errors are mostly Duplicate Page Content/Title. e.g. Title: Masterpet | New Zealand Products MasterPet Product page1 MasterPet Product page2 Because the list of products are displayed on several pages, the crawler detects that these two URLs have the same title. From 0 Errors two weeks ago, to 14k+ errors. Is this something we could fix or bother fixing? Will our SERP ranking suffer because of this? Hoping someone could shed some light on this issue. Thanks.
Technical SEO | | Peter.Huxley590 -
Why do I see dramatic differences in impressions between Google Webmaster Tools and Google Insights for Search?
Has anyone else noticed discrepancies between these tools? Take keyword A and keyword B. I've literally seen situations where A has 3 or 4 times the traffic as B in Google Webmaster Tools, but half the traffic of B in Google Insights for Search. What might be the reason for this discrepancy?
Technical SEO | | ir-seo-account0