Can I use a "no index, follow" command in a robot.txt file for a certain parameter on a domain?
-
I have a site that produces thousands of pages via file uploads. These pages are then linked to by users for others to download what they have uploaded.
Naturally, the client has blocked the parameter which precedes these pages in an attempt to keep them from being indexed. What they did not consider, was they these pages are attracting hundreds of thousands of links that are not passing any authority to the main domain because they're being blocked in robots.txt
Can I allow google to follow, but NOT index these pages via a robots.txt file --- or would this have to be done on a page by page basis?
-
Since you have those pages blocked via robots.txt, the bots would never even crawl these pages in theory...which means the Noindex,follow is not helping.
Also, if you do a report on the domain on opensiteexplorer and dig, you should be able to find tons of those links already showing up. So if my site is linking to a page on that site, that page may not be cached/indexed because of the robots.txt exclusion, but that as long as my site is follow, your domain is still getting the credit for the link.
Does that make sense ?
-
Answered my own question.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can cross domain canonicals help with international SEO when using ccTLDs?
Hello. My question is:** Can cross domain canonicals help with international SEO when using ccTLDs and a gTLD - and the gTLD is much more authoritative to begin with? ** I appreciate this is a very nuanced subject so below is a detailed explanation of my current approach, problem, and proposed solutions I am considering testing. Thanks for the taking the time to read this far! The Current setup Multiple ccTLD such as mysite.com (US), mysite.fr (FR), mysite.de (DE). Each TLD can have multiple languages - indeed each site has content in English as well as the native language. So mysite.fr (defaults to french) and mysite.fr/en-fr is the same page but in English. Mysite.com is an older and more established domain with existing organic traffic. Each language variant of each domain has a sitemap that is individually submitted to Google Search Console and is linked from the of each page. So: mysite.fr/a-propos (about us) links to mysite.com/sitemap.xml that contains URL blocks for every page of the ccTLD that exists in French. Each of these URL blocks contains hreflang info for that content on every ccTLD in every language (en-us, en-fr, de-de, en-de etc) mysite.fr/en-fr/about-us links to mysite.com/en-fr/sitemap.xml that contains URL blocks for every page of the ccTLD that exists in English. Each of these URL blocks contains hreflang info for that content on every ccTLD in every language (en-us, en-fr, de-de, en-de etc). There is more English content on the site as a whole so the English version of the sitemap is always bigger at the moment. Every page on every site has two lists of links in the footer. The first list is of links to every other ccTLD available so a user can easily switch between the French site and the German site if they should want to. Where possible this links directly to the corresponding piece of content on the alternative ccTLD, where it isn’t possible it just links to the homepage. The second list of links is essentially just links to the same piece of content in the other languages available on that domain. Mysite.com has its international targeting in Google Search console set to the US. The problems The biggest problem is that we didn’t consider properly how we would need to start from scratch with each new ccTLD so although each domain has a reasonable amount of content they only receive a tiny proportion of the traffic that mysite.com achieves. Presumably this is because of a standing start with regards to domain authority. The second problem is that, despite hreflang, mysite.com still outranks the other ccTLDs for brand name keywords. I guess this is understandable given the mismatch of DA. This is based on looking at search results via the Google AdWords Ad Preview tool and changing language, location, and domain. Solutions So the first solution is probably the most obvious and that is to move all the ccTLDs into a subfolder structure on the mysite.com site structure and 301 all the old ccTLD links. This isn’t really an ideal solution for a number of reasons, so I’m trying to explore some alternative possible routes to explore that might help the situation. The first thing that came to mind was to use cross-domain canonicals: Essentially this would be creating locale specific subfolders on mysite.com and duplicating the ccTLD sites in there, but using a cross-domain canonical to tell Google to index the ccTLD url instead of the locale-subfolder url. For example: mysite.com/fr-fr has a canonical of mysite.fr
Intermediate & Advanced SEO | | danatello
mysite.com/fr-fr/a-propos has a canonical of mysite.fr/a-propos Then I would change the links in the mysite.com footer so that they wouldn’t point at the ccTLD URL but at the sub-folder URL so that Google would crawl the content on the stronger domain before indexing the ccTLD domain version of the URL. Is this worth exploring with a test, or am I mad for even considering it? The alternative that came to my mind was to do essentially the same thing but use a 301 to redirect from mysite.com/fr-fr to mysite.fr. My question is around whether either of these suggestions might be worth testing, or am I completely barking up the wrong tree and liable to do more harm than good?0 -
Baidu Spider appearing on robots.txt
Hi, I'm not too sure what to do about this or what to think of it. This magically appeared in my companies robots.txt file (literally magically appeared/text is below) User-agent: Baiduspider
Intermediate & Advanced SEO | | IceIcebaby
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: / I know that Baidu is the Google of China, but I'm not sure why this would appear in our robots.txt all of a sudden. Should I be worried about a hack? Also, would I want to disallow Baidu from crawling my companies website? Thanks for your help,
-Reed0 -
Value in adding rel=next prev when page 2-n are "noindex, follow"?
Category A spans over 20 pages (not possible to create a "view all" because page would get too long). So I have page 1 - 20. Page 1 has unique content whereas page 2-20 of the series does not. I have "noindex, follow" on page 2-20. I also have rel=next prev on the series. Question: Since page 2-20 is "noindex, follow" doesn't that defeat the purpose of rel=next prev? Don't I run the risk of Google thinking "hmmm….this is odd. This website has noindexed page 2-20, yet using rel=next prev." Even though I do not run the risk, what is my upset in keeping rel=next prev when, again, the pages 2-20 are noindex, follow. thank you
Intermediate & Advanced SEO | | khi50 -
Google indexing "noindex" pages
1 weeks ago my website expanded with a lot more pages. I included "noindex, follow" on a lot of these new pages, but then 4 days ago I saw the nr of pages Google indexed increased. Should I expect in 2-3 weeks these pages will be properly noindexed and it may just be a delay? It is odd to me that a few days after including "noindex" on pages, that webmaster tools shows an increase in indexing - that the pages were indexed in other words. My website is relatively new and these new pages are not pages Google frequently indexes.
Intermediate & Advanced SEO | | khi50 -
Danger in using utm_source and utm_medium to track tens of thousands of cross domain redirects
We just merged with another company and are redirecting their domains (competitive/similar content) to our own. We'll have several domains, redirecting (301) several hundred thousand URL's to our domain (not all the same page, very unique mappings). Will adding utm_source, et al parameters to the URL's have a negative impact on how google transfers value to the pages based on the redirect authority passed? Any points of view? We have a self referencing canonical, but given that we have 90 million pages on the current domain (and climbing), seems like cleanest approach would be to not use redirects. Thanks, Jeff
Intermediate & Advanced SEO | | jrjames830 -
Changing domains - best process to use?
I am about to move my Thailand-focused travel website into a new, broader Asia-focused travel website. The Thailand site has had a sad history with Google (algorithmic, not penalties) so I don't want that history to carry over into the new site. At the same time though, I want to capture the traffic that Google is sending me right now and I would like my search positions on Bing and Yahoo to carry through if possible. Is there a way to make all that happen? At the moment I have migrated all the posts over to the new domain but I have it blocked to search engines. I am about to start redirecting post for post using meta-refresh redirects with a no-follow for safety. But at the point where I open the new site up to indexing, should I at the same time block the old site from being indexed to prevent duplicate content penalties? Also, is there a method I can use to selectively 301 redirect posts only if the referrer is Bing or Yahoo, but not Google, before the meta-refresh fires? Or alternatively, a way to meta-refresh redirect if the referrer is Google but 301 redirect otherwise? Or is there a way to "noindex, nofollow" the redirect only if the referrer is Google? Is there a danger of being penalised for doing any of these things? Late Edit: It occurs to me that if my penalties are algorithmic (e.g. due to bad backlinks), does 301 redirection even carry that issue through to the new website? Or is it left behind on the old site?
Intermediate & Advanced SEO | | Gavin.Atkinson0 -
Taking up an "abondoned" domain?
Hi, As far as SEO goes, are there any direct contradictions to picking up an approximately 1 year old domain, where the only thing that has ever been on is a static "Hello world" page from a wordpress install done when the domain was created? I'm thinking about picking it up again, as if it was a totally fresh domain, add content, and do SEO on it. What are your thoughts friends? Thanks.
Intermediate & Advanced SEO | | kaince0 -
Can you see the 'indexing rules' that are in place for your own site?
By 'index rules' I mean the stipulations that constitute whether or not a given page will be indexed. If you can see them - how?
Intermediate & Advanced SEO | | Visually0