Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Best way to de-index content from Google and not Bing?
-
We have a large quantity of URLs that we would like to de-index from Google (we are affected b Panda), but not Bing. What is the best way to go about doing this?
-
Hi michelleh
The solution given by Dan above is the most reliable method as robots.txt will not block pages that googlebot finds via an external link to the page. Given the reasoning behind your desire to noindex, reliability is extremely important.
Also, you want "noindex, follow" rather than "noindex, nofollow" as the nofollow will trap any link value coming into the pages (from both internal and external links) and stop it from flowing through the site.
Hope that helps,
Sha
-
Is there any advantage to using "noindex, nofollow" over robots.txt? I've read that noindex, nofollow still accumulates pagerank for the page with the tag, but if we don't care about accumulating pagerank, is there any other advantage to using noindex, nofollow over robots.txt?
-
-
Robots.txt, you can remove from google to hurry it up, using WMT, but first you need to block them in robots.txt
User-agent: *
Disallow:User-agent: Google
Disallow: /bad
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How do internal search results get indexed by Google?
Hi all, Most of the URLs that are created by using the internal search function of a website/web shop shouldn't be indexed since they create duplicate content or waste crawl budget. The standard way to go is to 'noindex, follow' these pages or sometimes to use robots.txt to disallow crawling of these pages. The first question I have is how these pages actually would get indexed in the first place if you wouldn't use one of the options above. Crawlers follow links to index a website's pages. If a random visitor comes to your site and uses the search function, this creates a URL. There are no links leading to this URL, it is not in a sitemap, it can't be found through navigating on the website,... so how can search engines index these URLs that were generated by using an internal search function? Second question: let's say somebody embeds a link on his website pointing to a URL from your website that was created by an internal search. Now let's assume you used robots.txt to make sure these URLs weren't indexed. This means Google won't even crawl those pages. Is it possible then that the link that was used on another website will show an empty page after a while, since Google doesn't even crawl this page? Thanks for your thoughts guys.
Intermediate & Advanced SEO | | Mat_C0 -
What is best practice for "Sorting" URLs to prevent indexing and for best link juice ?
We are now introducing 5 links in all our category pages for different sorting options of category listings.
Intermediate & Advanced SEO | | lcourse
The site has about 100.000 pages and with this change the number of URLs may go up to over 350.000 pages.
Until now google is indexing well our site but I would like to prevent the "sorting URLS" leading to less complete crawling of our core pages, especially since we are planning further huge expansion of pages soon. Apart from blocking the paramter in the search console (which did not really work well for me in the past to prevent indexing) what do you suggest to minimize indexing of these URLs also taking into consideration link juice optimization? On a technical level the sorting is implemented in a way that the whole page is reloaded, for which may be better options as well.0 -
Mass Removal Request from Google Index
Hi, I am trying to cleanse a news website. When this website was first made, the people that set it up copied all kinds of articles they had as a newspaper, including tests, internal communication, and drafts. This site has lots of junk, but this kind of junk was on the initial backup, aka before 1st-June-2012. So, removing all mixed content prior to that date, we can have pure articles starting June 1st, 2012! Therefore My dynamic sitemap now contains only articles with release date between 1st-June-2012 and now Any article that has release date prior to 1st-June-2012 returns a custom 404 page with "noindex" metatag, instead of the actual content of the article. The question is how I can remove from the google index all this junk as fast as possible that is not on the site anymore, but still appears in google results? I know that for individual URLs I need to request removal from this link
Intermediate & Advanced SEO | | ioannisa
https://www.google.com/webmasters/tools/removals The problem is doing this in bulk, as there are tens of thousands of URLs I want to remove. Should I put the articles back to the sitemap so the search engines crawl the sitemap and see all the 404? I believe this is very wrong. As far as I know this will cause problems because search engines will try to access non existent content that is declared as existent by the sitemap, and return errors on the webmasters tools. Should I submit a DELETED ITEMS SITEMAP using the <expires>tag? I think this is for custom search engines only, and not for the generic google search engine.
https://developers.google.com/custom-search/docs/indexing#on-demand-indexing</expires> The site unfortunatelly doesn't use any kind of "folder" hierarchy in its URLs, but instead the ugly GET params, and a kind of folder based pattern is impossible since all articles (removed junk and actual articles) are of the form:
http://www.example.com/docid=123456 So, how can I bulk remove from the google index all the junk... relatively fast?0 -
What is the best way to find related forums in your industry?
Hi Guys, Just wondering what is the best way to find forums in your industry?
Intermediate & Advanced SEO | | edward-may2 -
"Null" appearing as top keyword in "Content Keywords" under Google index in Google Search Console
Hi, "Null" is appearing as top keyword in Google search console > Google Index > Content Keywords for our site http://goo.gl/cKaQ4K . We do not use "null" as keyword on site. We are not able to find why Google is treating "null" as a keyword for our site. Is anyone facing such issue. Thanks & Regards
Intermediate & Advanced SEO | | vivekrathore0 -
How does google recognize original content?
Well, we wrote our own product descriptions for 99% of the products we have. They are all descriptive, has at least 4 bullet points to show best features of the product without reading the all description. So instead using a manufacturer description, we spent $$$$ and worked with a copywriter and still doing the same thing whenever we add a new product to the website. However since we are using a product datafeed and send it to amazon and google, they use our product descriptions too. I always wait couple of days until google crawl our product pages before i send recently added products to amazon or google. I believe if google crawls our product page first, we will be the owner of the content? Am i right? If not i believe amazon is taking advantage of my original content. I am asking it because we are a relatively new ecommerce store (online since feb 1st) while we didn't have a lot of organic traffic in the past, i see that our organic traffic dropped like 50% in April, seems like it was effected latest google update. Since we never bought a link or did black hat link building. Actually we didn't do any link building activity until last month. So google thought that we have a shallow or duplicated content and dropped our rankings? I see that our organic traffic is improving very very slowly since then but basically it is like between 5%-10% of our current daily traffic. What do you guys think? You think all our original content effort is going to trash?
Intermediate & Advanced SEO | | serkie1 -
Our login pages are being indexed by Google - How do you remove them?
Each of our login pages show up under different subdomains of our website. Currently these are accessible by Google which is a huge competitive advantage for our competitors looking for our client list. We've done a few things to try to rectify the problem: - No index/archive to each login page Robot.txt to all subdomains to block search engines gone into webmaster tools and added the subdomain of one of our bigger clients then requested to remove it from Google (This would be great to do for every subdomain but we have a LOT of clients and it would require tons of backend work to make this happen.) Other than the last option, is there something we can do that will remove subdomains from being viewed from search engines? We know the robots.txt are working since the message on search results say: "A description for this result is not available because of this site's robots.txt – learn more." But we'd like the whole link to disappear.. Any suggestions?
Intermediate & Advanced SEO | | desmond.liang1 -
Best practice for removing indexed internal search pages from Google?
Hi Mozzers I know that it’s best practice to block Google from indexing internal search pages, but what’s best practice when “the damage is done”? I have a project where a substantial part of our visitors and income lands on an internal search page, because Google has indexed them (about 3 %). I would like to block Google from indexing the search pages via the meta noindex,follow tag because: Google Guidelines: “Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines.” http://support.google.com/webmasters/bin/answer.py?hl=en&answer=35769 Bad user experience The search pages are (probably) stealing rankings from our real landing pages Webmaster Notification: “Googlebot found an extremely high number of URLs on your site” with links to our internal search results I want to use the meta tag to keep the link juice flowing. Do you recommend using the robots.txt instead? If yes, why? Should we just go dark on the internal search pages, or how shall we proceed with blocking them? I’m looking forward to your answer! Edit: Google have currently indexed several million of our internal search pages.
Intermediate & Advanced SEO | | HrThomsen0