Best use of robots.txt for "garbage" links from Joomla!
-
I recently started out on Seomoz and is trying to make some cleanup according to the campaign report i received.
One of my biggest gripes is the point of "Dublicate Page Content".
Right now im having over 200 pages with dublicate page content.
Now.. This is triggerede because Seomoz have snagged up auto generated links from my site.
My site has a "send to freind" feature, and every time someone wants to send a article or a product to a friend via email a pop-up appears.
Now it seems like the pop-up pages has been snagged by the seomoz spider,however these pages is something i would never want to index in Google.
So i just want to get rid of them.
Now to my question
I guess the best solution is to make a general rule via robots.txt, so that these pages is not indexed and considered by google at all.
But, how do i do this? what should my syntax be?
A lof of the links looks like this, but has different id numbers according to the product that is being send:
http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167
I guess i need a rule that grabs the following and makes google ignore links that contains this:
view=send_friend
-
Hi Henrik,
It can take up to a week for SEOmoz crawlers to process your site, which may be an issue if you recently added the tag. Did you remember to include all user agents in your first line?
User-agent: *
Be sure to test your robots.txt file in Google Webmaster Tools to ensure everything is correct.
Couple of other things you can do:
1. Add a rel="nofollow" on your send to friend links.
2. Add a meta robots "noindex" to the head of the popup html.
3. And/or add a canonical tag to the popup. Since I don't have a working example, I don't know what to canonical it too (whatever content it is duplicating) but this is also an option.
-
I just tried to add
Disallow: /view=send_friend
I removed the last /
however a crawl gave me the dublicate content problem again.
Is my syntax wrong?
-
The second one "Disallow: /*view=send_friend" will prevent googlebot from crawling any url with that string in it. So that should take care of your problem.
-
So my link example would look like this in robots.txt?
Disallow: /index.php?option=com_redshop&view=send_friend&pid=&tmpl=component&Itemid=
Or
Disallow: /view=send_friend/
-
Your right I would disallow via robots.txt & a wildcard (*) wherever a unique item id # could be generated.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocked jquery in Robots.txt, Any SEO impact?
I've heard that Google is now indexing links and stuff available in javascript and jquery. My webmastertools is showing that some links are blocked in robots.txt of jquery. Sorry I'm not a developer or designer. I want to know is there any impact of this on my SEO? and also how can I unblock it for the robots? Check this screenshot: http://i.imgur.com/3VDWikC.png
Technical SEO | | hammadrafique0 -
Rel="next"
Hi I was just wondering if there is any difference in using rel='next' rather than rel="next". Would it still work the same way? I mean using the apostrophes differently, would it matter? Thanks!
Technical SEO | | pikka0 -
Correct linking to the /index of a site and subfolders: what's the best practice? link to: domain.com/ or domain.com/index.html ?
Dear all, starting with my .htaccess file: RewriteEngine On
Technical SEO | | inlinear
RewriteCond %{HTTP_HOST} ^www.inlinear.com$ [NC]
RewriteRule ^(.*)$ http://inlinear.com/$1 [R=301,L] RewriteCond %{THE_REQUEST} ^./index.html
RewriteRule ^(.)index.html$ http://inlinear.com/ [R=301,L] 1. I redirect all URL-requests with www. to the non www-version...
2. all requests with "index.html" will be redirected to "domain.com/" My questions are: A) When linking from a page to my frontpage (home) the best practice is?: "http://domain.com/" the best and NOT: "http://domain.com/index.php" B) When linking to the index of a subfolder "http://domain.com/products/index.php" I should link also to: "http://domain.com/products/" and not put also the index.php..., right? C) When I define the canonical ULR, should I also define it just: "http://domain.com/products/" or in this case I should link to the definite file: "http://domain.com/products**/index.php**" Is A) B) the best practice? and C) ? Thanks for all replies! 🙂
Holger0 -
"noindex" internal search result urls
Hi, Would applying "noindex" on any page (say internal search pages) or blocking via robots text, skew up the internal site search stats in Google Analytics? Thanks,
Technical SEO | | RaksG0 -
Allow or Disallow First in Robots.txt
If I want to override a Disallow directive in robots.txt with an Allow command, do I have the Allow command before or after the Disallow command? example: Allow: /models/ford///page* Disallow: /models////page
Technical SEO | | irvingw0 -
Can I Disallow Faceted Nav URLs - Robots.txt
I have been disallowing /*? So I know that works without affecting crawling. I am wondering if I can disallow the faceted nav urls. So disallow: /category.html/? /category2.html/? /category3.html/*? To prevent the price faceted url from being cached: /category.html?price=1%2C1000
Technical SEO | | tylerfraser
and
/category.html?price=1%2C1000&product_material=88 Thanks!0 -
InSite Linking Best Practices
When creating links within your website, is it bad to have a anchor text link pointing back to the same page? Say the page the homepage is optimized for "credit cards". If I have a "credit cards" anchor text link on the page the link points to, is that bad practice? Secondly, if it's better to put that link on a different page, wouldn't I be placing a keyword that's optimized for a different page on the wrong page? (hopefully I'm making sense) Any guidance would be greatly appreciated!
Technical SEO | | MichaelWeisbaum0 -
Robots.txt question
I want to block spiders from specific specific part of website (say abc folder). In robots.txt, i have to write - User-agent: * Disallow: /abc/ Shall i have to insert the last slash. or will this do User-agent: * Disallow: /abc
Technical SEO | | seoug_20050