Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Google indexing despite robots.txt block
-
Hi
This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch
This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt
Any clues why this is or what I could do to resolve it?
Thanks!
-
It sounds like Martijn solved your problem, but I still wanted to add that robots.txt exclusions keep search bots from reading pages that are disallowed, but it does not stop those pages from being returned in search results. When those pages do appear, a lot of times they'll have a page description along the lines of "A description of this page is not available due to this sites robots.txt".
If you want to ensure that pages are kept out of search engines results, you have to use the noindex meta tag on each page.
-
Yes, I think the crucial point is that addressing googlebot wouldn't resolve the specific problem I have here.
I would have tried adressing googlebot otherwise. But to be honest, I wouldn't have expected a much different result than specifying all user agents. Googlebot should be part of that exclusion in any case.
-
I thought that value was a bit outdated, turns out to be still accepted. Although it probably only address this issue for him in Google and I assume it will still remain one in other search engines.
Besides that the problem offered a way better solution in allowing Google not on the HTTPS site.
-
Specifically for Googlebot. I'm pretty surprised people would disagree - Stephan Spencer recommended this in a personal conversation with me.
-
Did you mean a noindex tags for robots or a specific one for googlebot? With the second one I probably get the downvotes.
-
People who are disagreeing with this, explain your reasoning.
-
A noindex tag specific to Googlebot would also be a good idea.
-
You're welcome, it was mostly due to noticing that the first snippet, the homepage, had no snippet and the rest of the pages did have one. That led me to looking at their URL structure. Good luck fixing it!
-
100 points for you Martijn, thanks! I'm pretty sure you've found the problem and I'll go about fixing it. Gotta get used to having https used more frequently now...
-
Hi Phillipp,
You almost got me with this one, but it's fairly simple. In your question you're pointing at the robots.txt of your HTTP page. But it's mostly your HTTP**S **pages that are indexed and if you look at that robots.txt file it's pretty clear why these pages are indexed: https://www1.swisscom.ch/robots.txt all the pages that are indexed match with one of your Allow statements are the complete Disallow. Hopefully that provides you with the insight on how to fix your issue.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Removing a site from Google index with no index met tags
Hi there! I wanted to remove a duplicated site from the google index. I've read that you can do this by removing the URL from Google Search console and, although I can't find it in Google Search console, Google keeps on showing the site on SERPs. So I wanted to add a "no index" meta tag to the code of the site however I've only found out how to do this for individual pages, can you do the same for a entire site? How can I do it? Thank you for your help in advance! L
Technical SEO | | Chris_Wright1 -
Robots txt. in page with 301 redirect
We currently have a a series of help pages that we would like to disallow from our robots txt. The thing is that these help pages are located in our old website, which now has a 301 redirect to current site. Which is the proper way to go around? 1- Add the pages we want to disallow to the robots.txt of the new website? 2- Break the redirect momentarily and add the pages to the robots.txt of the old one? Thanks
Technical SEO | | Kilgray0 -
Good robots txt for magento
Dear Communtiy, I am trying to improve the SEO ratings for my website www.rijwielcashencarry.nl (magento). My next step will be implementing robots txt to exclude some crawling pages.
Technical SEO | | rijwielcashencarry040
Does anybody have a good magento robots txt for me? And what need i copy exactly? Thanks everybody! Greetings, Bob0 -
Is sitemap required on my robots.txt?
Hi, I know that linking your sitemap from your robots.txt file is a good practice. Ok, but... may I just send my sitemap to search console and forget about adding ti to my robots.txt? That's my situation: 1 multilang platform which means... ... 2 set of pages. One for each lang, of course But my CMS (magento) only allows me to have 1 robots.txt file So, again: may I have a robots.txt file woth no sitemap AND not suffering any potential SEO loss? Thanks in advance, Juan Vicente Mañanas Abad
Technical SEO | | Webicultors0 -
My video sitemap is not being index by Google
Dear friends, I have a videos portal. I created a video sitemap.xml and submit in to GWT but after 20 days it has not been indexed. I have verified in bing webmaster as well. All videos are dynamically being fetched from server. My all static pages have been indexed but not videos. Please help me where am I doing the mistake. There are no separate pages for single videos. All the content is dynamically coming from server. Please help me. your answers will be more appreciated................. Thanks
Technical SEO | | docbeans0 -
Can I Disallow Faceted Nav URLs - Robots.txt
I have been disallowing /*? So I know that works without affecting crawling. I am wondering if I can disallow the faceted nav urls. So disallow: /category.html/? /category2.html/? /category3.html/*? To prevent the price faceted url from being cached: /category.html?price=1%2C1000
Technical SEO | | tylerfraser
and
/category.html?price=1%2C1000&product_material=88 Thanks!0 -
Robots.txt file getting a 500 error - is this a problem?
Hello all! While doing some routine health checks on a few of our client sites, I spotted that a new client of ours - who's website was not designed built by us - is returning a 500 internal server error when I try to look at the robots.txt file. As we don't host / maintain their site, I would have to go through their head office to get this changed, which isn't a problem but I just wanted to check whether this error will actually be having a negative effect on their site / whether there's a benefit to getting this changed? Thanks in advance!
Technical SEO | | themegroup0