Does Bing ignore robots txt files?
-
Bonjour from "Its a miracle is not raining" Wetherby Uk
Ok here goes... Why despite a robots text file excluding indexing to site
http://lewispr.netconstruct-preview.co.uk/ is the site url being indexed in Bing bit not Google?
Does bing ignore robots text files or is there something missing from http://lewispr.netconstruct-preview.co.uk/robots.txt I need to add to stop bing indexing a preview site as illustrated below.
http://i216.photobucket.com/albums/cc53/zymurgy_bucket/preview-bing-indexed.jpg
Any insights welcome
-
Thanks Clever PHD - we are now adding your recommendations to our preview sites
-
I know this does not sound related, but Matt Cutts explains this same situation on Google. It is probably the same reasoning for Bing.
http://www.mattcutts.com/blog/robots-txt-remove-url/
Looking at your screen shot, it looks as if all that is being shown in Bing is just the URL, no title tag, description, no other information.
What Matt says is that they did not technically crawl the url, but they are aware that it exists. Example, there is another page linking to it with related content or the anchor tag on the link relates to the keyword search you are performing.
You are searching for the URL specifically and so it makes sense that they would show the URL as it relates to that search, but they are not showing any information from the page as they do not have it as they did not spider it, again, they are just aware of the URL. Kind of like talking to a lawyer eh?
If you search for any other keywords does this excluded site show up? Probably not. If the do, then they are probably only showing the URL like in the example above.
The video has more details. Here are the solutions he gives, I will outline them as well
-
Use the Bing URL removal tool - bing bang boom. Done.
-
(my new favorite) Let the page / site be indexed but then show an noindex nofollow meta tag on the page / site. There is a subtle but important difference in the meta tag vs the robot.txt file. The spiders have to be able to crawl the page to be able to see what they are supposed to do with it.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710
"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it."
The thing is, if you have a robots.txt file that says don't crawl the site, then the spider never gets to the noindex meta tag to know to delete the page from the index. It sounds a little backwards, but when the page is already in the search index, you have to let the spider crawl it to then see the noindex tag so that the search engine will know to remove it from the index.
Here is what you can do as this seems to only be an issue with Bing and just with the home page. Open up the robots.txt to allow Bing to crawl the site. Restrict the crawling to the home page only and exclude all the other pages from the crawl.
On the home page that you allow Bing to crawl, add the noindex no follow meta tag and you should be set.
All of that said. If you have a single URL listed in bing with no meta data, it may not be worth all the above effort as you are not ranking for any valuable key words, but that is your call
It is always interesting to see how the spiders and engines think so I wanted to pass this along.
Cheers!
PS - If you have a ton of pages like this - then you just would allow Bing to crawl them all and add the noindex nofollow tag to all of them.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Bing search results - Site links
My site links in Bing search results are pulling through the footer text instead of the meta description (see image). Is there any way of controlling this? 2L2VusT
Technical SEO | | RWesley0 -
Removing CSS & JS Files from Index
Hi, Google has indexed a few .CSS and .JS files that belong to our WordPress plugins and themes. I had them blocked via robots, but realized this doesn't prevent indexation (and can likely hurt us since Google wants to access these files). I've since removed the robots instructions, submitted a removal request via Search Console, but want to make sure they don't come back. Is there a way to put a noindex tag within .CSS and .JS files? Or should I do something with .htaccess instead?
Technical SEO | | kirmeliux1 -
#1 on Bing, nowhere on Google. Should be at least top 3\. Any ideas?
I have a page that "should" be top 3 on Google - it's optimised (A on the Moz Pro page grader), it's the most relevant result (it's for an e-book, and the page is the publisher's page for the e-book). Other pages on the site for other books are top of the Google SERPs, and this page itself is top in Bing for the search phrase. The page is https://camphorpress.com/books/formosan-odyssey/ and the keyphrase I want to rank for is "formosan odyssey" (with or without the quotes). Does anyone have any insight as to why it's not ranking in Google? Over-optimised? Duplicate content? Many thanks.
Technical SEO | | C-Tech0 -
Ranking in Bing
My frustration with Google currently has led me to try bing out more. I don't rank as good in bing as Google, so I was wondering if anyone had any great tips of what bing looks for to rank website well? So far I can see they seem to favour keyword domains, take 'cheap holidays' for example. Always reading about what Google want but what do bing want? Any advice much appreciated.
Technical SEO | | pauledwards0 -
Kill your htaccess file, take the risk to learn a little
Last week I was browsing Google's index with "site:www.mydomain.com and wanted to scan over to see what Google had indexed with my site. I came across a URL that was mistakenly indexed. It went something like this www.mydomain.com/link1/link2/link1/link4/link3 I didn't understand why Google had indexed a page like that of mine when the "link" pages were links that were on my main bar which were site wide links. It seemed to be looping infinitely over and over. So I started trying to see how many of these Google had indexed and I came across about 20 pages. I went through the process of removing the URL's in Webmaster Tools, but then I wanted to know why it was happening. I had discovered that I had mistakenly placed some links on my site in my header in such a manner link1 link2 link3 If you know HTML you will realize that by not placing the "/" in the front of the link I was telling that page to add that link in addition to the URL that is was currently on. What this did was create an infinite loop of links which is not good 🙂 Basically when Google went to www.mydomain.com/link1/ it found the other links which then told Google to add that url to the existing URL and then go to that link. Something like: www.mydomain.com/links1/link2/... When you do not add the "/" in front of the directory you are linking too it will do this. The "/" refers to the root so if you place that in front of your directory you are linking too it will always assume that first "/" as the root then the url will follow. So what did I do? Even though I was able to find about 20 URL's using the "site:" search method there had to be more out there. Even though I tried to search I was not able to find anymore, but I was not convinced. The light bulb went on at this point My .htaccess file contained many 301 redirects in my attempt to try and redirect those pages to a real page, they were not really relevant pages to redirect too. So how could I really find out what Google had indexed out there for me since Webmaster Tools only reports the top 1000 links. I decided to kill my htaccess file. Knowing that Google is "forgiving" when major changes to your site happen I knew Google would not simply just kill my site for removing my htaccess file immediately. I waited 3 days then BOOM! Webmaster Tools was reporting to me that it found a ton of 401's on my site. I looked at the Crawl Errors and there they were. All those infinite loop links that I knew had to be more out there, I was able to see. How many were there? Google found in the first crawl over 5,000 of them. OMG! Yeah could you imagine the "Low quality" score I was getting on those pages? By seeing all those links I was able to determine about 4 patterns in the links. For example: www.mydomain.com/link1/link2/ www.mydomain.com/link1/link3/ www.mydomain.com/link1/link4/ www.mydomain.com/link1/link5/ Now my issue was I wanted to keep all the URL's that were pointing to www.mydomain.com/link1 but anything after that I needed gone. I went into my Robots.txt file and added this Disallow: www.mydomain.com/link1/link2/ Disallow: www.mydomain.com/link1/link3/ Disallow: www.mydomain.com/link1/link4/ Disallow: www.mydomain.com/link1/link5/ Now there were many more pages indexed that went deeper into those links but I knew I wanted anything after the 2nd URL gone since it was the start of the loop that I detected. With that I was able to have from what I know at least 5k links if not more. What did I learn from this? Kill your htaccess file for a few days and see what comes back in your reports. You might learn something 🙂 After doing this I simply replaced my htaccess file and I am on my way to removing a ton of "low quality" links I didn't even know I had.
Technical SEO | | cbielich0 -
Help needed please with 301 redirects in htaccess file.
In summary, we're currently having issues with our htaccess file. 301 redirects are going through to the new described URL but in addition the new URL is followed by a ? and the old URL. How can we get rid of the ? and previous URL so they don't appear as an ending. None of the examples we've found re this issue online appear to work. Can anyone please offer some advice? Can we use a RewriteRule to stop this happening? Here's a summary of the htaccess file REDIRECT CODE BEGINS HERE LONG LIST OF REDIRECTS, which appear to be set up perfectly fine. REDIRECT CODE ENDS DirectoryIndex index.php <ifmodule mod_rewrite.c="">RewriteEngine On Options +FollowSymLinks
Technical SEO | | petersommertravels
DirectoryIndex index.php
RewriteEngine On
RewriteCond $1 !^(images|system|themes|pdf|favicon.ico|robots.txt|index.php) [NC]
RewriteRule ^.htaccess$ - [F]
RewriteRule ^favicon.ico - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php?/$1 [L]</ifmodule> DirectoryIndex index.php0 -
What does it mean by 'blocked by Meta Robot'? How do I fix this?
When i get my crawl diagnostics, I am getting a blocked by Meta Robot, which means that my page is not being indexed in the search engines... obviously this is a major issue for organic traffic!!! What does it actually mean, and how can i fix it?
Technical SEO | | rolls1230 -
Robots.txt
My campaign hse24 (www.hse24.de) is not being crawled any more ... Do you think this can be a problem of the robots.txt? I always thought that Google and friends are interpretating the file correct, seen that he site was crawled since last week. Thanks a lot Bernd NB: Here is the robots.txt: User-Agent: * Disallow: / User-agent: Googlebot User-agent: Googlebot-Image User-agent: Googlebot-Mobile User-agent: MSNBot User-agent: Slurp User-agent: yahoo-mmcrawler User-agent: psbot Disallow: /is-bin/ Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-DE-Site/de_DE/-/EUR/hse24_Storefront-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-AT-Site/de_DE/-/EUR/hse24_Storefront-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-CH-Site/de_DE/-/CHF/hse24_Storefront-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-DE-Site/de_DE/-/EUR/hse24_DisplayProductInformation-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-AT-Site/de_DE/-/EUR/hse24_DisplayProductInformation-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-CH-Site/de_DE/-/CHF/hse24_DisplayProductInformation-Start Allow: /is-bin/intershop.static/WFS/HSE24-Site/-/Editions/ Allow: /is-bin/intershop.static/WFS/HSE24-Site/-/Editions/Root%20Edition/units/HSE24/Beratung/
Technical SEO | | remino630