Google is indexing blocked content in robots.txt
-
Hi,Google is indexing some URLs that i don't want to be indexed and also is indexing the same URLs with https. This URLs are blocked in the file robots.txt.I've tried to block this URLs through Google WebmasterTools but Google doesn't let me do it because this URL are httpsThe file robots.txt is correct so, what can i do to avoid this content to be indexed?
-
I think you will find that the URL´s in Google´s index are either:
- indexed prior to putting in the robots.txt disallow in place - check in the google serp and click on "in cache" to see the date.
- Heavily linked to by other external domains.
- Both of the above.
@cleverphd has a great solution. Follow that.
-
This will sound backwards but it works.
-
Add the meta noindex tag to all pages you want out of the index.
-
Take those same pages out of the robots.txt and allow them to be crawled.
The meta noindex tells Google to remove the page from the index. It is preferred over using robots.txt
http://moz.com/learn/seo/robotstxt
The robot.txt - blocks Google from crawling the page, but things can still show up if there are other pages linking to the page you are trying to remove.
http://www.youtube.com/watch?v=KBdEwpRQRD0
You have to allow Google to crawl the pages (by taking them out of the robots.txt) so it can read the noindex meta tags that then tell Google to take them out of the index.
-
-
Thank you, but that is not the problem. The file robots.txt is done since a long time ago.
-
It seems you have added/modified Robot.txt file later. Wait for some time, Say 15 days.
Also ensure syntax for robot.txtRegards,
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google search console image indexing issue
Google search console tells that only '58 out of the 3553' images in the images sitemap are indexed. But if I search "site:example.com" in Google images there seem to be lots of images. There are no errors in the sitemap and I am still getting reasonable number of image search hits daily. Are the webmaster tools stats for images indexed accurate? When I click on the Sitemap Errors & Index Errors this is what i get - Error details: No errors found. https://www.screencast.com/t/pqL62pIc
Technical SEO | | 21centuryweb0 -
Google not indexing /showing my site in search results...
Hi there, I know there are answers all over the web to this type of question (and in Webmaster tools) however, I think I have a specific problem that I can't really find an answer to online. site is: www.lizlinkleter.com Firstly, the site has been live for over 2 weeks... I have done everything from adding analytics, to submitting a sitemap, to adding to webmaster tools, to fetching each individual page as googlebot and then submitting to index via webmaster tools. I've checked my robot files and code elsewhere on the site and the site is not blocking search engines (as far as I can see) There are no security issues in webmaster tools or MOZ. Google says it has indexed 31 pages in the 'Index Status' section, but on the site dashboard it says only 2 URLS are indexed. When I do a site:www.lizlinketer.com search the only results I get are pages that are excluded in the robots file: /xmlrpc.php & /admin-ajax.php. Now, here's where I think the issue stems from - I developed the site myself for my wife and I am new to doing this, so I developed it on the live URL (I now know this was silly) - I did block the content from search engines and have the site passworded, but I think Google must have crawled the site before I did this - the issue with this was that I had pulled in the Wordpress theme's dummy content to make the site easier to build - so lots of nasty dupe content. The site took me a couple of months to construct (working on it on and off) and I eventually pushed it live and submitted to Analytics and webmaster tools (obviously it was all original content at this stage)... But this is where I made another mistake - I submitted an old site map that had quite a few old dummy content URLs in there... I corrected this almost immediately, but it probably did not look good to Google... My guess is that Google is punishing me for having the dummy content on the site when it first went live - fair enough - I was stupid - but how can I get it to index the real site?! My question is, with no tech issues to clear up (I can't resubmit site through webmaster tools) how can I get Google to take notice of the site and have it show up in search results? Your help would be massively appreciated! Regards, Fraser
Technical SEO | | valdarama0 -
Blocked jquery in Robots.txt, Any SEO impact?
I've heard that Google is now indexing links and stuff available in javascript and jquery. My webmastertools is showing that some links are blocked in robots.txt of jquery. Sorry I'm not a developer or designer. I want to know is there any impact of this on my SEO? and also how can I unblock it for the robots? Check this screenshot: http://i.imgur.com/3VDWikC.png
Technical SEO | | hammadrafique0 -
Do pages that are in Googles supplemental index pass link juice?
I was just wondering if a page has been booted into the supplemental index for being a duplicate for example (or for any other reason), does this page pass link juice or not?
Technical SEO | | FishEyeSEO0 -
Google indexing thousands crazy search results with %25253
In GWT I started seeing very strange pages indexed a few weeks, and Google is no reporting over 21,000 of pages (blocked by robots.txt) with weird URLs like this: http://www.francesphotography.com/?s=no-results:no-results%25252525252525253Ano-results%2525252525252525253Ano-results%252525252525252525253Ano-results%252525252525252525253Ano-results%252525252525252525253Ano-results%252525252525252525253Ano-results%25252525252525252525253Ano-results%25252525252525252525253Ano-results%2525252525252525252525253Adanna&cat=no-results http://www.francesphotography.com/?s=no-results:no-results%2525253Ano-results%25252525253Ano-results%25252525253Ano-results%25252525253Ano-results%2525252525253Ano-results%25252525252525253Ano-results%25252525252525253Ano-results%25252525252525253Adanna&cat=no-results The current robots.txt looks like this: User-agent: *
Technical SEO | | BoulderJoe
Disallow: /wp-content Disallow: /wp-admin Disallow: /wp-includes
Disallow: /data
Disallow: /slideshows
Disallow: /page/*/?s=
Disallow: /?s=
Disallow: /search This website is running an up to date WP install with Yoast's Google Analytics and SEO plug-in. I can't point to anything specific that happened with the site when these URLs started appearing even after I modified the robots.txt. What can be done to try and stop Google from creating and indexing these goofy URLs? I see lots of sites having this issue when I search in Google, but no one seems to have a solution.0 -
Will Google index a 301 redirect for a new site?
So here is the problem... We have setup a 301redirect for our clients website. When you search the clients name it comes up with the old .co.uk website. We have made this redirect to the new .com website. However on the SERPs when it shows the .co.uk it shows the old title pages which currently say 'Holding Page'. When you click on that link it takes you to the fully functioning .com website. My question is, will the title tags in the SERPs which show the .co.uk update to the new ones from the .com? I'm thinking it will be just a case of Google catching up on things and it will sort itself out eventually. If anyone could help I would REALLY appreciate it. Thanks Chris
Technical SEO | | Weerdboil0 -
Why Google did not index our domain?
Hi, We launched tmart 60 days ago and submitted to google, bing, yahoo 20 days later. But google had never indexed our website still when yahoo indexed it in one week. What we have checked or tried: 1. We got 20~50 inlinks in one month and now 81 inlinks via yahoo site explorer. 2. This domain has registered for 13 years and we purchased it from sedo last year. We
Technical SEO | | zt673
did not find any problems from domain archive pages. 3. Page similar: the homepage is 50% similar to one of our competitors when we just launched.
So we adjusted the page structure and modified the content one month later and decreased the similarity to 30% (by tools from webconfs.com) 4. Google Robots: googlebot crawled our website every day after we submitted for indexing.
We opened GWT account for it and added the xml sitemap last week. GWT said nothing
was wrong except the time of page loading. Our questions: Why google did not indexed our website? What should we do? Thanks, wu0