Question about Syntax in Robots.txt
-
So if I want to block any URL from being indexed that contains a particular parameter what is the best way to put this in the robots.txt file?
Currently I have-
Disallow: /attachment_idWhere "attachment_id" is the parameter. Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do
Disallow: attachment_id or Disallow: attachment_id= but figured I would ask you guys first.
Thanks!
-
That's excellent Chris.
Use the Remove Page function as well - it might help speed things up for you.
-Andy
-
I don't know how but I completely forgot I could just pop those URL's in GWT and see if they were blocked or not and sure enough, Google says they are. I guess this is just a matter of waiting.... Thanks much!
-
I have previously looked into both of those documents and the issue remains that they don't exactly address how best to block parameters. I could do this through GWT but just am curious about the correct and preferred syntax for the robots.txt as well. I guess I could just look at sites like Amazon or other big sites to see what the common practices are. Thanks though!
-
Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do
It can take Google some time to remove pages from the index.
The best way to test if this has worked is hop into Webmaster Tools and use the Test Robots.txt function. If it has blocked the required pages, then you know it's just a case of waiting - you can also remove pages from within Webmaster Tools as well, although this isn't immediate.
-Andy
-
Hi there
Take a look at Google's resource on robots.txt, as well as Moz's. You can get all the information you need there. You can also let Google know about what URLs to exclude from it's crawls via Search Console.
Hope this helps! Good luck!
-
Im not a robots.txt expert by a long shot, but I found this, which is a little dated, which explained it to me in terms i could understand.
https://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/
there is also a feature in Google Webmaster tools called URL parameters that lets you block URLs with set parameters for all sorts of reason to avoid duplicate content etc. I havn't used it myself but may be work looking into
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt & Disallow: /*? Question!
Hi, I have a site where they have: Disallow: /*? Problem is we need the following indexed: ?utm_source=google_shopping What would the best solution be? I have read: User-agent: *
Intermediate & Advanced SEO | | vetofunk
Allow: ?utm_source=google_shopping
Disallow: /*? Any ideas?0 -
Redirect question | new blog install on subdomain
Hi, I am running a wordpress site and our blog has grown to have a bit of a life of its own. I would like to use a more blog-oriented wordpress theme to take advantage of features that help with content discoverability, which is what the current theme I'm using doesn't really provide. I've seen sites like Canva, Mint and Hubspot put their blog on a subdomain, so the blog is sort of a separate site within a site. Advantages I see to this approach: Use a separate wordpress theme Help the blog feel like its own site and increase user engagement Give the blog its own name and identity My questions are: Are there any other SEO ramifications for taking this approach? For example, is a subdomain (blog.mysite.com) disadvantageous somehow, or inferior to to mysite.com/article-title? Regarding redirects, I read a recent Moz article about how 301s now do not lose page rank. I would also be able to implement https when I redirect, which is a plus. Is this an ok approach? Assuming I have to create redirect rules manually for each post though Thanks!
Intermediate & Advanced SEO | | mikequery0 -
Need help with Robots.txt
An eCommerce site built with Modx CMS. I found lots of auto generated duplicate page issue on that site. Now I need to disallow some pages from that category. Here is the actual product page url looks like
Intermediate & Advanced SEO | | Nahid
product_listing.php?cat=6857 And here is the auto generated url structure
product_listing.php?cat=6857&cPath=dropship&size=19 Can any one suggest how to disallow this specific category through robots.txt. I am not so familiar with Modx and this kind of link structure. Your help will be appreciated. Thanks1 -
Links Questions and advice?
I have a website which has a fair few link assets that are doing very well (a lot of really powerful sites have link to them with follow links) but my commercial pages are not doing as well as a lot of sites without any other investment than (mediocre) links direct to there commercial pages with at least 10% of them carrying the money anchor text. Even pages we have had a few links for with generalized real anchor text and reasonable links do not do as well as the above due to none of them carrying the money keyword? Is it me or does google still rely on links to the commercial page and keywords with anchor text to match the money term?
Intermediate & Advanced SEO | | BobAnderson0 -
Canonical Related question
I have a site where we have search and result pages, google webmaster tool was giving me duplicate content error for page 1 / 2 / 3 etc etc so i have added canonical on these pages like http://www.business2sell.com/businesses/california/ Is this is correct way of using canonical ?
Intermediate & Advanced SEO | | manish_khanna0 -
This is kind of a trick question, but if you could only do ONE thing to improve your search ranking what would you do?
This is kind of a trick question, but if you could only do ONE thing to improve your search ranking what would you do?
Intermediate & Advanced SEO | | JHSpecialty0 -
SEOMOZ Diagram question
Hi, On this SEOMOZ help page (http://www.seomoz.org/learn-seo/internal-link) the diagram explaining the optimal link structure (image also attached) has me a little confused. From the homepage, if the bot crawls down the right-hand link first, will it not just hit a dead end where it cant crawl any further and disappear? OR... will it hit the end of the structure and then crawl backwards to the homepage again and follow down another link and then just repeat the process until all pages are indexed? Cheers pyramid.jpg
Intermediate & Advanced SEO | | activitysuper0 -
Canonical URL Question
Hi Everyone I like to run this question by the community and get a second opinion on best practices for an issue that I ran into. I got two pages, Page A is the original page and Page B is the page with duplicate content. We already added** ="Page A**" />** to the duplicate content (Page B).** **Here is my question, since Page B is duplicate content and there is a link rel="canonical" added to it, would you put in the time to add meta tags and optimize the title of the page? Thanks in advance for all your help.**
Intermediate & Advanced SEO | | DRTBA0