Confused about robots.txt

Netpace

There is a lot of conflicting and/or unclear information about robots.txt out there. Somehow, I can't make out what's the best way to use robots even after visiting the official robots website. For example I have the following format for my robots.

User-agent: *
Disallow: javascript.js
Disallow: /images/
Disallow: /embedconfig
Disallow: /playerconfig
Disallow: /spotlightmedia
Disallow: /EventVideos
Disallow: /playEpisode

Allow: /

Sitemap: http://www.example.tv/sitemapindex.xml
Sitemap: http://www.example.tv/sitemapindex-videos.xml
Sitemap: http://www.example.tv/news-sitemap.xml

Is this correct and/or recommended? If so, then how come I see a list of over 200 or so links blocked by robots when Im checking out Google Webmaster Tools!

Help someone, anyone! Can't seem to understand this robotic business!

Regards,

crvw

Google may still index pages excluded by robots.txt if the pages are backlinked either internally or externally.

For best results, use meta noindex to tell search engines they're not allowed to show the link in results, and meta nofollow to tell robots not to follow any links on the page.

Webmaster Tools Help: Using meta tags to block access to your site

You can also explicitly address goooglebot in the meta tag, as opposed to just robots. If you use both a robots.txt and meta robots tags and there are conflicting directives, googlebot will follow the most restrictive one.

irvingw

I would also recommend to go to the site configuration - crawler access page in Google Webmaster and test many of your sites URL's to ensure that robots can access them. Test every unique URL format on your site like the search results page, product pages, category pages, etc... I always use this tool whenever I make any change in the robots.txt

Entrusteddev

Hi,

Allow: / isn't valid syntax in a robots.txt file, Anything that isn't disallowed is allowed by default.

Other than that all looks good. Perhaps the 200 or so links to blocked pages were indexed before the robots.txt was last updated with the disallows?

Regards

Aran

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Confused about robots.txt

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

I have two robots.txt pages for www and non-www version. Will that be a problem?

Robot.txt : How to block a specific file type in several subdirectories ?

Robots.txt

Robots.txt crawling URL's we dont want it to

Is my robots.txt file working?

Help needed with robots.txt regarding wordpress!

Robots.txt usage

Blocking robots.txt