Disallow statement - is this tiny anomaly enough to render Disallow invalid?
-
Google site search (site:'hbn.hoovers.com') indicates 171,000 results for this subdomain. That is not a desired result - this site has 100% duplicate content. We don't want SEs spending any time here.
Robots.txt is set up mostly right to disallow all search engines from indexing this site. That asterisk at the end of the disallow statement looks pretty harmless - but could that be why the site has been indexed?
User-agent: * Disallow: /*
-
Interesting. I'd never heard that before.
We've never had GA or GWT on these mirror sites before, so it's hard to say what Google is doing these days.
But the goal is definitely to make them and their contents invisible to SEs. We'll get GWT on there and start removing URLs.
Thanks!
-
The additional asterisk shouldn't do you any harm, although standard practice seems to be just putting the "/".
Does it seem like Google is still crawling this subdomain when you look at webmasters crawl stats? While the disallow function in robots.txt will usually stop bots from crawling, it doesn't prevent them from indexing or keeping pages indexed that were before the disallow was put in place. If you want these pages removed from the index, you can request it through webmasters and also use meta robots noindex as opposed to the robots.txt file. Moz has a good article about it here: http://moz.com/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
If you're just worried about bots crawling the subdomain, it's possible they've already stopped crawling it, but continue to index it due to history or additional indicators suggesting they should index it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Disallowing URL Parameters vs. Canonicalizing
Hi all, I have a client that has a unique search setup. So they have Region pages (/state/city). We want these indexed and are using self-referential canonicals. They also have a search function that emulates the look of the Region pages. When you search for, say, Los Angeles, the URL changes to _/search/los+angeles _and looks exactly like /ca/los-angeles. These search URLs can also have parameters (/search/los+angeles?age=over-2&time[]=part-time), which we obviously don't want indexed. Right now my concern is how best to ensure the /search pages don't get indexed and we don't get hit with duplicate content penalties. The options are this: Self-referential canonicals for the Region pages, and disallow everything after the second slash in /search/ (so the main search page is indexed) Self-referential canonicals for the Region pages, and write a rule that automatically canonicalizes all other search pages to /search. Potential Concern: /search/ URLs are created even with misspellings. Thanks!
Technical SEO | | Alces1 -
URL is invalid: Why?
Hello everyone, I am currently listing my company on business directories. For some websites however when I add my website URL, it comes up as URL is invalid. What could be the reason for this? I have tried different variations like www., http:// and https://. Kind Regards,
Technical SEO | | SMCCoachHire
Aqib0 -
Fetch as Google Desktop Render Width?
What is Google's minimum desktop responsive webpage width? Fetch as Google for desktop is showing a skinnier version of our responsive page.
Technical SEO | | Desiree-CP0 -
Google's ability to crawl AJAX rendered content
I would like to make a change to the way our main navigation is currently rendered on our e-commerce site. Currently, all of the content that appears when you click a navigation category is rendering on page load. This is currently a large portion of every page visit’s bandwidth and even the images are downloaded even if a user doesn’t choose to use the navigation. I’d like to change it so the content appears and is downloaded only IF the user clicks on it, I'm planning on using AJAX. As that is the case it wouldn’t not be automatically on the site(which may or may not mean Google would crawl it). As we already provide a sitemap.xml for Google I want to make sure this change would not adversely affect our SEO. As of October this year the Webmaster AJAX crawling doc. suggestions has been depreciated. While the new version does say that its crawlers are smart enough to render AJAX content, something I've tested, I'm not sure if that only applies to content injected on page load as opposed to in click like I'm planning to do.
Technical SEO | | znotes0 -
Trying to ensure AJAX rendered categories remain SEO friendly.
I'm considering implementing the following: http://lnavigation.demo.aheadworks.com/index.php/electronics-computers/electronics.html?___SID=U&mode=grid. My concern is that it doesn't seem to align with Making AJAX Application Crawlable. However, the pagination, Grid/List toggle, etc.. appear to have their full HREF links intact and when accessed at that URL the appropriate, matching data is displayed. So, it seems that if Google can see the full URL path then it should be crawlable, correct? I'm not very concerned about the filters being SEO friendly. Hoping the Moz community will offer some helpful insight. Thank you!
Technical SEO | | bearpaw0 -
Bing indexing at a tiny fraction of Google
I've read through other posts about this but I can't find a solution that works for us. My site is porch.com, 1M+ pages indexed on Google, ~10k on Bing. I've submitted the same sitemaps, and there's nothing different for each bot in our robots file. It looks like Bing is more concerned with our 500 errors than Google, but not sure if that might be causing the issue. Can anyone point me to the right things to be researching/investigating? Fixing errors, sitemap crawling issues, etc. I'm not sure what to spend my time looking into...
Technical SEO | | Porch0 -
Disallow: /404/ - Best Practice?
Hello Moz Community, My developer has added this to my robots.txt file: Disallow: /404/ Is this considered good practice in the world of SEO? Would you do it with your clients? I feel he has great development knowledge but isn't too well versed in SEO. Thank you in advanced, Nico.
Technical SEO | | niconico1011 -
Have I done enough seo on this page to make a difference
Hi, my home page has been a thorn in my side for as long as i remember. On normal sites i am ok with seo but when it comes to my magazine site it is a whole new ball game as everything is different. I have been working with a developer who has told me to remove the intro to the site on the home page and to move the bottom of the site which was about the magazine but i am not sure if this is right. I want the site to rank well for the following Lifestyle Magazine now before our upgrade, we ranked well for this and other words, we were number one for a very long time and then stayed on the first page but now since the upgrade, i am jumping from page 9, 10, to six and not sure why that is happening. I would like to know if the advice i have been given is correct, have i done enough on the page to rank well for lifestyle magazine, or should i be doing what i have been taught previously where i should be having an intro to the site so google can pick up the words lifestyle magazine and other words. the site is www.in2town.co.uk many thanks for your input
Technical SEO | | ClaireH-1848860