Need help with Robots.txt
-
An eCommerce site built with Modx CMS. I found lots of auto generated duplicate page issue on that site. Now I need to disallow some pages from that category. Here is the actual product page url looks like
product_listing.php?cat=6857And here is the auto generated url structure
product_listing.php?cat=6857&cPath=dropship&size=19Can any one suggest how to disallow this specific category through robots.txt. I am not so familiar with Modx and this kind of link structure.
Your help will be appreciated.
Thanks
-
I would actually add a canonical tag and then handle these using the Parameters section of Search Console. That's why it's there, for exactly this type of site with exactly this issue.
-
Nahid, before you use the robots.txt file's disallow for those URLs, you may want to reconsider. You may want to use the canonical tag instead. In the case where you have different sizes, colors, etc. we typically recommend using the Canonical Tag and not the disallow in robots.txt.
Anyhow, if you'd like to use the disallow you can use one of these:
Disallow: /?
or
Disallow: /?cat=
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Breadcrumbs markup need to be fixed
Hi there, Google is up to mass spamming, the latest one refers o an Enhancements > Breadcrukbs report, the message is: "...Google systems show that your site is affected by 24 instances of Breadcrumbs markup issues. This means that your Breadcrumbs pages might not appear as rich results in Google Search. Search Console has created a new report just for this rich result type..." I've used their Structured Data Testing Tool, no errors were highlighted. Can anyone fathom out what they're referring to, please?
Intermediate & Advanced SEO | | jasongmcmahon0 -
Moving to a new domain for second time - critical, help needed fast!
Hello, Important: please do not ask why we need to change the domain, its not the matter at all, thank you for understanding. Over a month ago we successfully changed our domain name, 301 redirected, did GWT 'change of address' and all. The old domain was 2 years old, ranking very well, the new domain change of address was a success and traffic back on the new domain after a week. Today we need to change the domain name again, unfortunately, for some reasons, we have to, however we are not sure what to do in GWT, when I went to 'change of address' in the domain (the new first domain), i saw the following message (screenshot attached too): This site is undergoing a move Old URL | New URL If any URL on the left should not be moved, you can withdraw its move request. To do this, click the URL and then Withdraw. Now our questions: 1. For second time moving to a new domain, we should move from the old first domain (301 from the first old domain) or from the second domain (301 from the second domain)? 2. If from the old first domain, should we Withdraw from the first domain (lift up the first change of address in GWT) and then redirect the old first domain to the second new domain (the one we want to move now)? If yes, what to do with the first new domain (the one which we moved to a month ago) 3. If we should move from the first new domain, then what to do? The situation is clear but confusing what to do? It's just that we need to change the domain name again, move to a new one, for the second time, now we should redirect from the first old domain or first new domain? I purchased MOZ just to get help from you guys here, the only place i thought I could be helped. Of course gonna use Moz service too now that i have puurchased it 🙂 Awaiting your quick help guys. Thank you! 8csVpOZ2QoiYCoTR1t_SnQ.png
Intermediate & Advanced SEO | | mdmoz0 -
Robots.txt advice
Hey Guys, Have you ever seen coding like this in a robots.txt, I have never seen a noindex rule in a robots.txt file before - have you? user-agent: AhrefsBot User-agent: trovitBot
Intermediate & Advanced SEO | | eLab_London
User-agent: Nutch
User-agent: Baiduspider
Disallow: / User-agent: *
Disallow: /WebServices/
Disallow: /*?notfound=
Disallow: /?list=
Noindex: /?*list=
Noindex: /local/
Disallow: /local/
Noindex: /handle/
Disallow: /handle/
Noindex: /Handle/
Disallow: /Handle/
Noindex: /localsites/
Disallow: /localsites/
Noindex: /search/
Disallow: /search/
Noindex: /Search/
Disallow: /Search/
Disallow: ? I have never seen a noindex rule in a robots.txt file before - have you?
Any pointers?0 -
Robots.txt question
I notice something weird in Google robots. txt tester I have this line Disallow: display= in my robots.text but whatever URL I give to test it says blocked and shows this line in robots.text for example this line is to block pages like http://www.abc.com/lamps/floorlamps?display=table but if I test http://www.abc.com/lamps/floorlamps or any page it shows as blocked due to Disallow: display= am I doing something wrong or Google is just acting strange? I don't think pages with no display= are blocked in real.
Intermediate & Advanced SEO | | rbai0 -
Need to find scientific studies
I need to find scientific studies that back up a claim. I am searching the internet and see a mix of studies with blog posts with other stuff. Is there a way to isolate scientific studies in a search result?
Intermediate & Advanced SEO | | StreetwiseReports0 -
My site is duplicated on the internet, please help.
I've been told that my site: "- your site is duplicated on the internet. Both www.joeyvalyphotography.com and joeyvalyphotography.com are valid internet addresses. This is a problem for SEO." I am wondering, what's the cause of this, and how it can fixed. Thanks In advanced, Joey
Intermediate & Advanced SEO | | gaji0 -
Should I robots block this directory?
There's about 43k pages indexed in this directory, and while helpful to end users, I don't see it being a great source of unique content for search engines. Would you robots block or meta noindex nofollow these pages in the /blissindex/ directory? ie. http://www.careerbliss.com/blissindex/petsmart-index-980481/ http://www.careerbliss.com/blissindex/att-index-1043730/ http://www.careerbliss.com/blissindex/facebook-index-996632/
Intermediate & Advanced SEO | | CareerBliss0 -
Subdomains - duplicate content - robots.txt
Our corporate site provides MLS data to users, with the end goal of generating leads. Each registered lead is assigned to an agent, essentially in a round robin fashion. However we also give each agent a domain of their choosing that points to our corporate website. The domain can be whatever they want, but upon loading it is immediately directed to a subdomain. For example, www.agentsmith.com would be redirected to agentsmith.corporatedomain.com. Finally, any leads generated from agentsmith.easystreetrealty-indy.com are always assigned to Agent Smith instead of the agent pool (by parsing the current host name). In order to avoid being penalized for duplicate content, any page that is viewed on one of the agent subdomains always has a canonical link pointing to the corporate host name (www.corporatedomain.com). The only content difference between our corporate site and an agent subdomain is the phone number and contact email address where applicable. Two questions: Can/should we use robots.txt or robot meta tags to tell crawlers to ignore these subdomains, but obviously not the corporate domain? If question 1 is yes, would it be better for SEO to do that, or leave it how it is?
Intermediate & Advanced SEO | | EasyStreet0