How to prevent directory from being accessed by search engines?
-
Pretty much as the question says, is there any way to stop search engines from crawling a directory? I am working on a Wordpress installation for my site but don't want it to be listed in search engines until it's ready to be shown to the world. I know the simplest way is to password-protect the directory but I had some issues when I tried to implement that so I'd like to see if there's a way to do it without passwords. Thanks in advance.
-
But don't forget to remove that Disallow out of Robots.txt when you go live - if you want those pages to be indexed (and also the Meta-robots noindex nofollow).
Otherwise you might be pulling your hair out trying to figure out why none of your pages are getting indexed in the SERPs.
-
You're absolutely right! I left that part out. Thanks
-
The robots.txt file does not guarantee that your pages will not show up in search results! Your best bet after password protection is adding a NoIndex meta tag to you page headers.
Google have openly said that they obey this tag (Matt Cutts).
-
Xee,
It always help, and it is very easy to implement. This function to show the path to the sitemap ir very good.
-
It's not required to have the ending slash. At least, it works for us without it.
-
As it is, my site is just phpBB3 forums (www.bearsfansonline.com); would a sitemap really help that much?
-
If you don't have an robot.txt file, you need to include some important stuff first.
First, do you have a sitemap.xlm for your website? If not, its very important and you should creat it at: http://www.xml-sitemaps.com/
Create a robot.txt file and include the follow:
User-agent: * allow: / disallow: /directoryname
Sitemap: http://www.yousite.com/sitemap.xmlWith this you will inform all robots where is your sitemap. You should read more about robots.txt in this great post: http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
-
shouldn't you put a slash at the end of the directory in the robots file?
you can create the robots file through the Google Webmaster Tools
-
I don't have a robots.txt file in my root. Do I just create a text file, put the above lines into it, and upload it to my root after changing the name?
-
I'm assuming you want all search engines blocked from this directory. If so, edit your robots.txt file to state the following. This will block all bots from accessing a folder/directory on your site
User-agent: *
Disallow: /directoryname
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Image Search
Hello Community, I have been reading and researching about image search and trying to find patterns within the results but unfortunately I could not get to a conclusion on 2 matters. Hopefully this community would have the answers I am searching for. 1) Watermarked Images (To remove or not to remove watermark from photos) I see a lot of confusion on this subject and am pretty much confused myself. Although it might be true that watermarked photos do not cause a punishment, it sure does not seem to help. At least in my industry and on a bunch of different random queries I have made, watermarked images are hard to come by on Google's images results. Usually the first results do not have any watermarks. I have read online that Google takes into account user behavior and most users prefer images with no watermark. But again, it is something "I have read online" so I don't have any proof. I would love to have further clarification and, if possible, a definite guide on how to improve my image results. 2) Multiple nested folders (Folder depth) Due to speed concerns our tech guys are using 1 image per folder and created a convoluted folder structure where the photos are actually 9 levels deep. Most of our competition and many small Wordpress blogs outrank us on Google images and on ALL INSTANCES I have checked, their photos are 3, 4 or 5 levels deep. Never inside 9 nested folders.
Technical SEO | | Koki.Mourao
So... A) Should I consider removing the watermark - which is not that intrusive but is visible?
B) Should I try to simplify the folder structure for my photos? Thank you0 -
Can ht access file affect page load times
We have a large and old site. As we've transition from one CMS to another, there's been a need for create 301 redirects using our ht access file. I'm not a technical SEO person, but concerned that the size of our ht access file might be contributing source for long page download times. Can large ht access files cause slow page load times? Or is the coding of the 301 redirect a cause for slow page downloads? Thanks
Technical SEO | | ahw1 -
Why has my search traffic suddenly tanked?
On 6 June, Google search traffic to my Wordpress travel blog http://www.travelnasia.com tanked completely. There are no warnings or indicators in Webmaster Tools that suggest why this happened. Traffic from search has remained at zero since 6 June and shows no sign of recovering. Two things happened on or around 6 June. (1) I dropped my premium theme which was proving to be not mobile friendly and replaced it with the ColorMag theme which is responsive. (2) I relocated off my previous hosting service which was showing long server lag times to a faster host. Both of these should have improved my search performance, not tanked it. There were some problems with the relocation to the new web host which resulted in a lot of "out of memory" errors on the website for 3-4 days. The allowed memory was simply not enough for the complexity of the site and the volume of traffic. After a few days of trying to resolve these problems, I moved the site to another web host which allows more PHP memory and the site now appears reliably accessible for both desktop and mobile. But my search traffic has not recovered. I am wondering if in all of this I've done something that Google considers to be a cardinal sin and I can't see it. The clues I'm seeing include: Moz Pro was unable to crawl my site last Friday. It seems like every URL it tried to crawl was of the form http://www.travelnasia.com/wp-login.php?action=jetpack-sso&redirect_to=http://www.travelnasia.com/blog/bangkok-skytrain-bts-mrt-lines which resulted in a 500 status error. I don't know why this happened but I have disabled the Jetpack login function completely, just in case it's the problem. GWT tells me that some of my resource files are not accessible by GoogleBot due to my robots.txt file denying access to /wp-content/plugins/. I have removed this restriction after reading the latest advice from Yoast but I still can't get GWT to fetch and render my posts without some resource errors. On 6 June I see in Structured Data of GWT that "items" went from 319 to 1478 and "items with errors" went from 5 to 214. There seems to be a problem with both hatom and hcard microformats but when I look at the source code they seem to be OK. What I can see in GWT is that each hcard has a node called "n [n]" which is empty and Google is generating a warning about this. I see that this is because the author vcard URL class now says "url fn n" but I don't see why it says this or how to fix it. I also don't see that this would cause my search traffic to tank completely. I wonder if anyone can see something I'm missing on the site. Why would Google completely deny search traffic to my site all of a sudden without notifying any kind of penalty? Note that I have NOT changed the content of the site in any significant way. And even if I did, it's unlikely to result in a complete denial of traffic without some kind of warning.
Technical SEO | | Gavin.Atkinson1 -
I broke Google! (random snippet appearing in non-personalized search)
Hello all, so either I broke Google or Google doesn't know how to index my page properly (onradpad.com/paymyrent). If you search "pay rent with credit card", whether you're logged in to Google or not, you'll see a snippet from our signup process (which is js) right under the ad slot in the serps (Awesome! You're signed up!) and it will repeat where my meta data should be. It's been like this for well over a month now and I cannot figure out how to get rid of it. Additionally, if you search for the branded title of the page "pay with radpad", it pulls language that's not on that page (perhaps from somewhere in the js signup form). Though if you search for "pay rent with radpad" you'll see what my meta description is supposed to look like in the serps. Any ideas as to what the heck is going on?
Technical SEO | | RadMatt0 -
What directory should a site go in (url structure)?
Hi All, The is the first actual SEO campaign i've worked on and I had a few question about where the site should live on the server and url structure. The site is in WP and we're using Yoast SEO. Anyway the site lives in a a folder called Coastal, which is a child of the WWW folder. So the permalink of the homepage is mcoastalwindows.com/coastal/. The URL is mycoastalwindows.com. The thing is I can still get to the homepage or any of the pages on the site by typing in the /coastal/. Another example is permalink mycoastalwndows.com/coastal/siding/ and url mycoastalwindows.com/siding/. The urls always display without the /coastal/, so I'm not too worried about people linking to them, but Yoast puts a canonical element to the permalink and always includes the /coastal/. Also I'm seeing that Google displays a lot of the urls with the /coastal/, which is an issue seeing as we don't link to the pages that way. My original thought was to solve this at the source and just move everything out of the coastal directory, but the developer swears that it's more secure being in another folder especially with WP. What would you all do and what is best practice? Would you move everything out of the coastal folder, 301 re-direct, do something with. htaccess, or another solution? Appreciate the input thanks!
Technical SEO | | Mario.Souza0 -
Site is not displaying in Search Engines
My site is www.deoveritas.com it is in magento framework and it has a blog section in wordpress. When I enter Site:www.deoveroitas.com in google it shows all blog links in search result. The homepage and other innerpages are not getting displayed in search results at all. I even tried searching for www.deoveritas.com/about-us and it displays blogs in result. Checked Google webmaster fetch as google and it was index and successful. Can you please help me with this. Is my site de-indexed or banned by Google? the same issue is on Bing and Yahoo search engines too. Please help Thank you.
Technical SEO | | tpt.com0 -
Google search result going to a page that I did not put on my site
Hi, I am seeing a very strange result in google for my site. When doing a search for the term "london reflexology" my site comes up 18th in the results. But when I click the link or check the URL it shows up as: http://www.reflexologyonline.co.uk/reflexologyonline.php?Action=Webring This is not right at all. It looks like some sort of cloaking but I am not sure. I am new to SEO and I do not know why goole is showing this URL that does not exist on my site and of witch the content is totally wrong. Can anyone please help with this? See the 2 linked images for more details. It seems to me the site might be hacked or something to that effect. Please help.... jyJdP.png 71Mf4.png
Technical SEO | | RupDog0 -
My urls changed with new CMS now search engines see pages as 302s what do I do?
We recently changed our CMS from php to .NET. The old CMS did not allow for folder structure in urls so every url was www.mydomain/name-of-page. In the new CMS we either have to have .aspx at the end of the url or a /. We opted for the /, but now my page rank is dead and Google webmaster tools says my existing links are now going through an intermediary page. Everything resolves to the right place, but looks like spiders see our new pages as being 302 redirected. Example of what's happening. Old page: www.mydomain/name-of-page New page: www.mydomain/name-of-page/ What should I do? Should I go in and 301 redirect the old pages? Will this get cleared up by itself in time?
Technical SEO | | rasiadmin10