Question about Syntax in Robots.txt
-
So if I want to block any URL from being indexed that contains a particular parameter what is the best way to put this in the robots.txt file?
Currently I have-
Disallow: /attachment_idWhere "attachment_id" is the parameter. Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do
Disallow: attachment_id or Disallow: attachment_id= but figured I would ask you guys first.
Thanks!
-
That's excellent Chris.
Use the Remove Page function as well - it might help speed things up for you.
-Andy
-
I don't know how but I completely forgot I could just pop those URL's in GWT and see if they were blocked or not and sure enough, Google says they are. I guess this is just a matter of waiting.... Thanks much!
-
I have previously looked into both of those documents and the issue remains that they don't exactly address how best to block parameters. I could do this through GWT but just am curious about the correct and preferred syntax for the robots.txt as well. I guess I could just look at sites like Amazon or other big sites to see what the common practices are. Thanks though!
-
Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do
It can take Google some time to remove pages from the index.
The best way to test if this has worked is hop into Webmaster Tools and use the Test Robots.txt function. If it has blocked the required pages, then you know it's just a case of waiting - you can also remove pages from within Webmaster Tools as well, although this isn't immediate.
-Andy
-
Hi there
Take a look at Google's resource on robots.txt, as well as Moz's. You can get all the information you need there. You can also let Google know about what URLs to exclude from it's crawls via Search Console.
Hope this helps! Good luck!
-
Im not a robots.txt expert by a long shot, but I found this, which is a little dated, which explained it to me in terms i could understand.
https://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/
there is also a feature in Google Webmaster tools called URL parameters that lets you block URLs with set parameters for all sorts of reason to avoid duplicate content etc. I havn't used it myself but may be work looking into
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
A question related to domain authority and Page Authority.
What are the factors that matter to increase domain or page authority because we know domain authority is crucial, it's the best and easy way to tell someone about your website that how worthy your site is.
Intermediate & Advanced SEO | | hfameraya198
Is backlinks only the metric to increase domain authority?0 -
A 302 Redirect Question | Events Page Updates
Hello Moz World, I have a client that has a TON, like close to a thousand pages that have a 302 redirect set up. After further investigation, I found that every month they update their events page & Demo Request page, and the old events pages still exist but, get a 302 redirect to the updated page. From what I gather, this is a default mechanism set up by the hosting provider. My questions; is this an example of when to use a Rel=canonical? Also, is there a method for integrating this without having to go into every page and integrate the code snippet? And Lastly, How should I go about ensuring this doesn't happen in the future? Thanks ahead of time, you guys rock! B/R Will H.
Intermediate & Advanced SEO | | MarketingChimp100 -
Robots.txt: how to exclude sub-directories correctly?
Hello here, I am trying to figure out the correct way to tell SEs to crawls this: http://www.mysite.com/directory/ But not this: http://www.mysite.com/directory/sub-directory/ or this: http://www.mysite.com/directory/sub-directory2/sub-directory/... But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way: disallow: /directory/sub-directory/ disallow: /directory/sub-directory2/ disallow: /directory/sub-directory/sub-directory/ disallow: /directory/sub-directory2/subdirectory/ etc... I would end up having thousands of definitions to disallow all the possible sub-directory combinations. So, is the following way a correct, better and shorter way to define what I want above: allow: /directory/$ disallow: /directory/* Would the above work? Any thoughts are very welcome! Thank you in advance. Best, Fab.
Intermediate & Advanced SEO | | fablau1 -
.htaccess question/opinion/advice needed
Hello, I am trying to achieve 3 different things on my .htaccess I just want to make sure I am doing it the right or best way because I don't have much experience working on this kind of files. I am trying to: a) Redirect www.mysite.com/index.html to www.mysite.com so I don't get a duplicate content/tag error. b) Redirect mysite.com to www.mysite.com c) Get rid of the file extensions; www.mysite.com/stuff.html to www.mysite.com/stuff This is the code that I'm currently using and it seems to work fine, however I would like someone with experience to take a look so I can avoid internal server errors and other kinds of issues. I grabbed each piece of code from different posts and tutorials. Options +FollowSymlinks
Intermediate & Advanced SEO | | Eblan
RewriteEngine on Index Rewrite RewriteRule ^index.(htm|html|php) http://www.mysite.com/ [R=301,L] RewriteRule ^(.*)/index.(htm|html|php) http://www.mysite.com/$1/ [R=301,L] RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^(.*)$ $1.html Options +FollowSymlinks
RewriteEngine on
Rewritecond %{http_host} ^mysite.com [nc]
Rewriterule ^(.*)$ http://www.mysite.com/$1 [r=301,nc] Thanks a lot!0 -
"Too many links" - PageRank question
This question seems to come up a lot. 70 flat page site. For ease of navigation, I want to link every page to one-another. Pure CSS Dropdown menu with categories - each expanding to each of the subpage. Made, implemented, remade smartphone friendly. Hurray. I thought this was an SEO principle - ensuring good site navigation and good internal linking. Not forcing your users to hit "back". Not forcing your users to jump through hoops. But unless I've misread http://www.seomoz.org/blog/how-many-links-is-too-many then this is something that's indirectly penalised by Google because a site with 70 links from its homepage only lets each sub-page inherit 1/80th of its PageRank. Good site navigation vs your subpages are invisible on Google.
Intermediate & Advanced SEO | | JamesFx0 -
SEOMOZ Diagram question
Hi, On this SEOMOZ help page (http://www.seomoz.org/learn-seo/internal-link) the diagram explaining the optimal link structure (image also attached) has me a little confused. From the homepage, if the bot crawls down the right-hand link first, will it not just hit a dead end where it cant crawl any further and disappear? OR... will it hit the end of the structure and then crawl backwards to the homepage again and follow down another link and then just repeat the process until all pages are indexed? Cheers pyramid.jpg
Intermediate & Advanced SEO | | activitysuper0 -
Apache Mod Rewrite question
Hi everybody, I need to rewrite this url using mod rewrite, but I've got stuck. http://www.diamondgeezer.com/theultimate/search/index.php?sortprice=asc&followSearch=9673&q=eternity+rings I'd like it to show this one instead: http://www.diamondgeezer.com/eternity-rings I'm no expert on this stuff, so any help would be great! Thanks
Intermediate & Advanced SEO | | neooptic0 -
Analytics Question?
Is there a way to see in GA traffic from other IP address's. I want to subtract all the times I visit the site from my IP and get a real traffic %.
Intermediate & Advanced SEO | | SEObleu.com0