Can I use a "no index, follow" command in a robot.txt file for a certain parameter on a domain?
-
I have a site that produces thousands of pages via file uploads. These pages are then linked to by users for others to download what they have uploaded.
Naturally, the client has blocked the parameter which precedes these pages in an attempt to keep them from being indexed. What they did not consider, was they these pages are attracting hundreds of thousands of links that are not passing any authority to the main domain because they're being blocked in robots.txt
Can I allow google to follow, but NOT index these pages via a robots.txt file --- or would this have to be done on a page by page basis?
-
Since you have those pages blocked via robots.txt, the bots would never even crawl these pages in theory...which means the Noindex,follow is not helping.
Also, if you do a report on the domain on opensiteexplorer and dig, you should be able to find tons of those links already showing up. So if my site is linking to a page on that site, that page may not be cached/indexed because of the robots.txt exclusion, but that as long as my site is follow, your domain is still getting the credit for the link.
Does that make sense ?
-
Answered my own question.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt wildcards - the devs had a disagreement - which is correct?
Hi – the lead website developer was assuming that this wildcard: Disallow: /shirts/?* would block URLs including a ? within this directory, and all the subdirectories of this directory that included a “?” The second developer suggested that this wildcard would only block URLs featuring a ? that come immediately after /shirts/ - for example: /shirts?minprice=10&maxprice=20 BUT argued that this robots.txt directive would not block URLS featuring a ? in sub directories - e.g. /shirts/blue?mprice=100&maxp=20 So which of the developers is correct? Beyond that, I assumed that the ? should feature a * on each side of it – for example - /? - to work as intended above? Am I correct in assuming that?
Intermediate & Advanced SEO | | McTaggart0 -
Search Results Pages Blocked in Robots.txt?
Hi I am reviewing our robots.txt file. I wondered if search results pages should be blocked from crawling? We currently have this in the file /searchterm* Is it a good thing for SEO?
Intermediate & Advanced SEO | | BeckyKey0 -
Application & understanding of robots.txt
Hello Moz World! I have been reading up on robots.txt files, and I understand the basics. I am looking for a deeper understanding on when to deploy particular tags, and when a page should be disallowed because it will affect SEO. I have been working with a software company who has a News & Events page which I don't think should be indexed. It changes every week, and is only relevant to potential customers who want to book a demo or attend an event, not so much search engines. My initial thinking was that I should use noindex/follow tag on that page. So, the pages would not be indexed, but all the links will be crawled. I decided to look at some of our competitors robots.txt files. Smartbear (https://smartbear.com/robots.txt), b2wsoftware (http://www.b2wsoftware.com/robots.txt) & labtech (http://www.labtechsoftware.com/robots.txt). I am still confused on what type of tags I should use, and how to gauge which set of tags is best for certain pages. I figured a static page is pretty much always good to index and follow, as long as it's public. And, I should always include a sitemap file. But, What about a dynamic page? What about pages that are out of date? Will this help with soft 404s? This is a long one, but I appreciate all of the expert insight. Thanks ahead of time for all of the awesome responses. Best Regards, Will H.
Intermediate & Advanced SEO | | MarketingChimp100 -
What are the ranking factors for "Google News"? How can we compete?
We have a few sport news websites that are picked up by Google News. Once in a blue moon, one of our articles ranks for a great keyword and shows in one of the 3 listings that Google News has in SERPS. Any tips on how we can we optimise more of our articles to compete in these 3 positions?
Intermediate & Advanced SEO | | betnl0 -
Google indexing "noindex" pages
1 weeks ago my website expanded with a lot more pages. I included "noindex, follow" on a lot of these new pages, but then 4 days ago I saw the nr of pages Google indexed increased. Should I expect in 2-3 weeks these pages will be properly noindexed and it may just be a delay? It is odd to me that a few days after including "noindex" on pages, that webmaster tools shows an increase in indexing - that the pages were indexed in other words. My website is relatively new and these new pages are not pages Google frequently indexes.
Intermediate & Advanced SEO | | khi50 -
How can you indexed pages or content on pages that are behind a pay wall or subscription login.
I have a client that has a boat of awesome content they provide to their client that's behind a pay wall ( ie: paid subscribers can only access ) Any suggestions mozzers? How do I get those pages index? Without completely giving away the contents in the front end.
Intermediate & Advanced SEO | | BizDetox0 -
Few questions regarding wordpress and indexing/no follow.
I'm using Yoast's Wordpress SEO plugin on my wordpress site which allows you to quickly set up nofollow / no index on specific taxonomies. I wanted to see what you guys thought was the best practice in setting up my various taxonomies. Would you noidex, but follow all of these, none of these, or just some of these: Categories, tags, media, author archives ( (My blog is mainly a single author blog (me) but my wife does sometimes write posts. So I didn't know how this effected everything. Also I could simply make the blog a single user blog and just have her posts be guest posts, but I'd rather leave her as a user.), and date archives. The example I read on line only no-index's the date archives. Just curious what you guys thought. Thanks.
Intermediate & Advanced SEO | | NoahsDad0 -
How can I tell which website pages are hosted on the root domain vs the www subdomain?
One of the SEOmoz help desk professionals told me this today regarding some of my website pages. "it looks like you have pages hosted as separate pages on both the root domain and the www subdomain, which means that these pages are competing for rankings and authority. You may want to consider a 301 redirect or the use of rel=canonical tags.". Can anyone help me understand this? How can I tell which pages are which?
Intermediate & Advanced SEO | | webestate0