Question about Syntax in Robots.txt
-
So if I want to block any URL from being indexed that contains a particular parameter what is the best way to put this in the robots.txt file?
Currently I have-
Disallow: /attachment_idWhere "attachment_id" is the parameter. Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do
Disallow: attachment_id or Disallow: attachment_id= but figured I would ask you guys first.
Thanks!
-
That's excellent Chris.
Use the Remove Page function as well - it might help speed things up for you.
-Andy
-
I don't know how but I completely forgot I could just pop those URL's in GWT and see if they were blocked or not and sure enough, Google says they are. I guess this is just a matter of waiting.... Thanks much!
-
I have previously looked into both of those documents and the issue remains that they don't exactly address how best to block parameters. I could do this through GWT but just am curious about the correct and preferred syntax for the robots.txt as well. I guess I could just look at sites like Amazon or other big sites to see what the common practices are. Thanks though!
-
Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do
It can take Google some time to remove pages from the index.
The best way to test if this has worked is hop into Webmaster Tools and use the Test Robots.txt function. If it has blocked the required pages, then you know it's just a case of waiting - you can also remove pages from within Webmaster Tools as well, although this isn't immediate.
-Andy
-
Hi there
Take a look at Google's resource on robots.txt, as well as Moz's. You can get all the information you need there. You can also let Google know about what URLs to exclude from it's crawls via Search Console.
Hope this helps! Good luck!
-
Im not a robots.txt expert by a long shot, but I found this, which is a little dated, which explained it to me in terms i could understand.
https://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/
there is also a feature in Google Webmaster tools called URL parameters that lets you block URLs with set parameters for all sorts of reason to avoid duplicate content etc. I havn't used it myself but may be work looking into
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What does Disallow: /french-wines/?* actually do - robots.txt
Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?* Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark? Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL? I think this has been done to block URLs containing query strings. Thanks, Luke
Intermediate & Advanced SEO | | McTaggart0 -
DA vs Relevancy - Trade Off Question
Hey Guys We all know that relevancy largely trumps DA nowadays. What I am wondering is if there is a DA 'level' at which relevancy doesn't really matter - you probably still want a backlink from that site... For example, sites with DA of 100 we probably want backlinks from. So where do you draw the line? What I mean is for a high DA 'non relevant' site, what DA is 'acceptable' where you start to disregard relevancy? I'm thinking something like 70 and above would like some other thoughts... Obviously you would still be building relevant links too, developing content to do so and all that good stuff. I am just wondering what DA I should focus on for building non-relevant links ALONGSIDE relevant links 🙂 Thanks
Intermediate & Advanced SEO | | GTAMP0 -
SEO for a UGC Question and Answers Platform
We are trying out SEO for a UGC Q& A Platform and has been able to generate 15000+ questions in last 4 months. The overall traffic is 50K while SEO traffic is only 4 K even after putting in all basic SEO elements in place and ensuring that we have a google page speed of 73/100. What are some of the items that can be done to push up the traffic through SEO ? Any thoughts .
Intermediate & Advanced SEO | | ozil1 -
Need help with Robots.txt
An eCommerce site built with Modx CMS. I found lots of auto generated duplicate page issue on that site. Now I need to disallow some pages from that category. Here is the actual product page url looks like
Intermediate & Advanced SEO | | Nahid
product_listing.php?cat=6857 And here is the auto generated url structure
product_listing.php?cat=6857&cPath=dropship&size=19 Can any one suggest how to disallow this specific category through robots.txt. I am not so familiar with Modx and this kind of link structure. Your help will be appreciated. Thanks1 -
A few questions on Google's Structured Data Markup Helper...
I'm trying to go through my site and add microdata with the help of Google's Structured Data Markup Helper. I have a few questions that I have not been able to find an answer for. Here is the URL I am referring to: http://www.howlatthemoon.com/locations/location-chicago My company is a bar/club, with only 4 out of 13 locations serving food. Would you mark this up as a local business or a restaurant? It asks for "URL" above the ratings. Is this supposed to be the URL that ratings are on like Yelp or something? Or is it the URL for the page? Either way, neither of those URLs are on the page so I can't select them. If it is for Yelp should I link to it? How do I add reviews? Do they have to be on the page? If I make a group of days for Day of the Week for Opening hours, such as Mon-Thu, will that work out? I have events on this page. However, when I tried to do the markup for just the event it told me to use itemscope itemtype="http://schema.org/Event" on the body tag of the page. That is just a small part of the page, I'm not sure why I would put the event tag on the whole body? Any other tips would be much appreciated. Thanks!
Intermediate & Advanced SEO | | howlusa0 -
Backlinks question: High Domain Authority, Lower Page Authority
We have a possibility of contributing guest blogs (with followed backlinks) to a site with very high domain authority (and highly trafficked), but when we've looked at the blog entires they already have, most of them have a much lower page authority. How do relevant links from a page with a lower PA but on a domain with a really high DA end up impacting our overall backlink profile? Can an expert or two give me some advice on what this may mean for us if we choose to go for it? In your opinion, does having lots of relevant links from a site with a much higher domain authority than ourselves (to give you an idea, our domain authority is in the low 60's, this site has a domain authority of almost 90) worth the time/effort/resources unto itself? Thanks!
Intermediate & Advanced SEO | | GrowOrganic0 -
Sitemap.xml Question
I am pretty new to SEO and I have been creating new pages for our website for niche terms. Should I include ALL pages on our website in the sitemap.xml or should I only have our "main" pages listed on the sitemap.xml file? Thanks
Intermediate & Advanced SEO | | threebiz0 -
Apache Mod Rewrite question
Hi everybody, I need to rewrite this url using mod rewrite, but I've got stuck. http://www.diamondgeezer.com/theultimate/search/index.php?sortprice=asc&followSearch=9673&q=eternity+rings I'd like it to show this one instead: http://www.diamondgeezer.com/eternity-rings I'm no expert on this stuff, so any help would be great! Thanks
Intermediate & Advanced SEO | | neooptic0