Robots.txt questions...
-
All,
My site is rather complicated, but I will try to break down my question as simply as possible.
I have a robots.txt document in the root level of my site to disallow robot access to /_system/, my CMS. This looks like this:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/I have another robots.txt file in another level down, which is my holiday database - www.mysite.com/holiday-database/ - this is to disallow access to /holiday-database/ControlPanel/, my database CMS. This looks like this:
**User-agent: ***
Disallow: /ControlPanel/Am I correct in thinking that this file must also be in the root level, and not in the /holiday-database/ level? If so, should my new robots.txt file look like this:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/
Disallow: /holiday-database/ControlPanel/Or, like this:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/
Disallow: /ControlPanel/Thanks in advance.
Matt
-
Good answer Yannick.
here are some resources:
http://www.free-seo-news.com/all-about-robots-txt.htm
http://www.robotstxt.org/robotstxt.html
Good luck
-
Cheers gents.
-
Like:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/
Disallow: /holiday-database/ControlPanel/Search engines typically only look in the root of your domain to find robots.txt and sitemap.xml files.
-
Hey Matt
The first of your options looks right and google and other engines look for the robots.txt file in the site root rather than for each directory.
If you had a reason for not wanting that info in the root robots.txt file you can always use the robots meta tag on the pages in a given directory.
Few useful links:
Robots.txt
http://www.google.com/support/webmasters/bin/answer.py?answer=156449&&hl=enRobots Meta Tag
http://www.google.com/support/webmasters/bin/answer.py?answer=93710Marcus
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
2 sitemaps on my robots.txt?
Hi, I thought that I just could link one sitemap from my site's robots.txt but... I may be wrong. So, I need to confirm if this kind of implementation is right or wrong: robots.txt for Magento Community and Enterprise ...
Technical SEO | | Webicultors
Sitemap: http://www.mysite.es/media/sitemap/es.xml
Sitemap: http://www.mysite.pt/media/sitemap/pt.xml Thanks in advance,0 -
Robots.txt and Multiple Sitemaps
Hello, I have a hopefully simple question but I wanted to ask to get a "second opinion" on what to do in this situation. I am working on a clients robots.txt and we have multiple sitemaps. Using yoast I have my sitemap_index.xml and I also have a sitemap-image.xml I do put them in google and bing by hand but wanted to have it added into the robots.txt for insurance. So my question is, when having multiple sitemaps called out on a robots.txt file does it matter if one is before the other? From my reading it looks like you can have multiple sitemaps called out, but I wasn't sure the best practice when writing it up in the file. Example: User-agent: * Disallow: Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-content/plugins/ Sitemap: http://sitename.com/sitemap_index.xml Sitemap: http://sitename.com/sitemap-image.xml Thanks a ton for the feedback, I really appreciate it! :) J
Technical SEO | | allstatetransmission0 -
Pageing page and seo meta tag questions
Hi if i am using paging in my website there is lots of product in my website now in paging total paging is 1000 pages now what title tag i need to add for every paging page or is there any good way we can tell search engine all page or same ?
Technical SEO | | constructionhelpline0 -
Questionable SEO
Chess Telecom appears first when you search for 'business phone lines' in the UK so I used a campaign to check them out. It seems they've got tons of unrelated links and using comment spamming to increase their ranking. Along with fake twitter accounts and other things. Search for 'jewel jubic chess' and you'll see what i mean. I assumed this wasnt a good idea and been trying to get my link on relevant websites only. Any comments or suggestions? Should I simply trust that google will hopefully punish them eventually? Or should I be fighting fire with fire? Thanks Dan
Technical SEO | | DanFromUK0 -
Question about construction of our sitemap URL in robots.txt file
Hi all, This is a Webmaster/SEO question. This is the sitemap URL currently in our robots.txt file: http://www.ccisolutions.com/sitemap.xml As you can see it leads to a page with two URLs on it. Is this a problem? Wouldn't it be better to list both of those XML files as separate line items in the robots.txt file? Thanks! Dana
Technical SEO | | danatanseo0 -
Googlebot does not obey robots.txt disallow
Hi Mozzers! We are trying to get Googlebot to steer away from our internal search results pages by adding a parameter "nocrawl=1" to facet/filter links and then robots.txt disallow all URLs containing that parameter. We implemented this late august and since that, the GWMT message "Googlebot found an extremely high number of URLs on your site", stopped coming. But today we received yet another. The weird thing is that Google gives many of our nowadays robots.txt disallowed URLs as examples of URLs that may cause us problems. What could be the reason? Best regards, Martin
Technical SEO | | TalkInThePark0 -
Robots.txt
Hello Everyone, The problem I'm having is not knowing where to have the robots.txt file on our server. We have our main domain (company.com) with a robots.txt file in the root of the site, but we also have our blog (company.com/blog) where were trying to disallow certain directories from being crawled for SEO purposes... Would having the blog in the sub-directory still need its own robots.txt? or can I reference the directories i don't want crawled within the blog using the root robots.txt file? Thanks for your insight on this matter.
Technical SEO | | BailHotline0 -
Severe rank drop due to overwritten robots.txt
Hi, Last week we made a change to drupal core for an update to our website. We accidentally overwrote our good robots.txt that blocked hundreds of pages with the default drupal robots.txt. Several hours after that happened (and we didn't catch the mistake) our rankings dropped from mostly first, second place in Google organic to bottom and mid first page. Basically I believe we flooded the index with very low quality pages at once and threw a red flag and we got de-ranked. We have since fixed the robots.txt and have been re-crawled but have not seen a return in rank. Would this be a safe assumption of what happened? I haven't seen any other sites getting hit in the retail vertical yet in regards to any Panda 2.3 type of update. Will we see a return in our results anytime soon? Thanks, Justin
Technical SEO | | BrettKrasnove0