Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Could you use a robots.txt file to disalow a duplicate content page from being crawled?
-
A website has duplicate content pages to make it easier for users to find the information from a couple spots in the site navigation. Site owner would like to keep it this way without hurting SEO.
I've thought of using the robots.txt file to disallow search engines from crawling one of the pages. Would you think this is a workable/acceptable solution?
-
Yeah, sorry for the confusion. I put the tag on all the pages (Original and Duplicate). I sent you a PM with another good article on Rel canonical tag
-
Peter, Thanks for the clarification.
-
Generally agree, although I'd just add that Robots.txt also isn't so great at removing content that's already been indexed (it's better at prevention). So, I find that it's not just not ideal - it sometimes doesn't even work in these cases.
Rel-canonical is generally a good bet, and it should go on the duplicate (you can actually put it on both, although it's not necessary).
-
Next time I'll read the reference links better
Thank you!
-
per google webmaster tools:
If Google knows that these pages have the same content, we may index only one version for our search results. Our algorithms select the page we think best answers the user's query. Now, however, users can specify a canonical page to search engines by adding a element with the attribute
rel="canonical"
to the section of the non-canonical version of the page. Adding this link and attribute lets site owners identify sets of identical content and suggest to Google: "Of all these pages with identical content, this page is the most useful. Please prioritize it in search results." -
Thanks Kyle. Anthony had a similar view on using the rel canonical tag. I'm just curious about adding it to both the original page or duplicate page? Or both?
Thanks,
Greg
-
Anthony, Thanks for your response. See Kyle, he also felt using the rel canonical tag was the best thing to do. However he seemed to think you'd put it on the original page - the one you want to rank for. And you're suggesting putting on the duplicate page. Should it be added to both while specifying which page is the 'original'?
Thanks!
Greg
-
I'm not sure I understand why the site owner seems to think that the duplicate content is necessary?
If I was in your situation I would be trying to convince the client to remove the duplicate content from their site, rather than trying to find a way around it.
If the information is difficult to find then this may be due to a problem with the site architecture. If the site does not flow well enough for visitors to find the information they need, then perhaps a site redesign is necessary.
-
Well, the answer would be yes and no. A robots.txt file would stop the bots from indexing the page, but links from other pages in site to that non indexed page could therefor make it crawlable and then indexed. AS posted in google webmaster tools here:
"You need a robots.txt file only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file (not even an empty one).
While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results."
I think the best way to avoid any conflict is applying the rel="canonical" tag to each duplicate page that you don't want indexed.
You can find more info on rel canonical here
Hope this helps out some.
-
The best way would be to use the Rel canonical tag
On the page you would like to rank for put the Rel canonical tag in
This lets google know that this is the original page.
Check out this link posted by Rand about the Rel canonical tag [http://www.seomoz.org/blog/canonical-url-tag-the-most-important-advancement-in-seo-practices-since-sitemaps](http://www.seomoz.org/blog/canonical-url-tag-the-most-important-advancement-in-seo-practices-since-sitemaps)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No Index thousands of thin content pages?
Hello all! I'm working on a site that features a service marketed to community leaders that allows the citizens of that community log 311 type issues such as potholes, broken streetlights, etc. The "marketing" front of the site is 10-12 pages of content to be optimized for the community leader searchers however, as you can imagine there are thousands and thousands of pages of one or two line complaints such as, "There is a pothole on Main St. and 3rd." These complaint pages are not about the service, and I'm thinking not helpful to my end goal of gaining awareness of the service through search for the community leaders. Community leaders are searching for "311 request service", not "potholes on main street". Should all of these "complaint" pages be NOINDEX'd? What if there are a number of quality links pointing to the complaint pages? Do I have to worry about losing Domain Authority if I do NOINDEX them? Thanks for any input. Ken
Intermediate & Advanced SEO | | KenSchaefer0 -
If my website do not have a robot.txt file, does it hurt my website ranking?
After a site audit, I find out that my website don't have a robot.txt. Does it hurt my website rankings? One more thing, when I type mywebsite.com/robot.txt, it automatically redirect to the homepage. Please help!
Intermediate & Advanced SEO | | binhlai0 -
Directory with Duplicate content? what to do?
Moz keeps finding loads of pages with duplicate content on my website. The problem is its a directory page to different locations. E.g if we were a clothes shop we would be listing our locations: www.sitename.com/locations/london www.sitename.com/locations/rome www.sitename.com/locations/germany The content on these pages is all the same, except for an embedded google map that shows the location of the place. The problem is that google thinks all these pages are duplicated content. Should i set a canonical link on every single page saying that www.sitename.com/locations/london is the main page? I don't know if i can use canonical links because the page content isn't identical because of the embedded map. Help would be appreciated. Thanks.
Intermediate & Advanced SEO | | nchlondon0 -
Duplicate content on URL trailing slash
Hello, Some time ago, we accidentally made changes to our site which modified the way urls in links are generated. At once, trailing slashes were added to many urls (only in links). Links that used to send to
Intermediate & Advanced SEO | | yacpro13
example.com/webpage.html Were now linking to
example.com/webpage.html/ Urls in the xml sitemap remained unchanged (no trailing slash). We started noticing duplicate content (because our site renders the same page with or without the trailing shash). We corrected the problematic php url function so that now, all links on the site link to a url without trailing slash. However, Google had time to index these pages. Is implementing 301 redirects required in this case?1 -
Block in robots.txt instead of using canonical?
When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed? In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt?
Intermediate & Advanced SEO | | YairSpolter0 -
Duplicate content on sites from different countries
Hi, we have a client who currently has a lot of duplicate content with their UK and US website. Both websites are geographically targeted (via google webmaster tools) to their specific location and have the appropriate local domain extension. Is having duplicate content a major issue, since they are in two different countries and geographic regions of the world? Any statement from Google about this? Regards, Bill
Intermediate & Advanced SEO | | MBASydney0 -
Are there any negative effects to using a 301 redirect from a page to another internal page?
For example, from http://www.dog.com/toys to http://www.dog.com/chew-toys. In my situation, the main purpose of the 301 redirect is to replace the page with a new internal page that has a better optimized URL. This will be executed across multiple pages (about 20). None of these pages hold any search rankings but do carry a decent amount of page authority.
Intermediate & Advanced SEO | | Visually0 -
Duplicate Content | eBay
My client is generating templates for his eBay template based on content he has on his eCommerce platform. I'm 100% sure this will cause duplicate content issues. My question is this.. and I'm not sure where eBay policy stands with this but adding the canonical tag to the template.. will this work if it's coming from a different page i.e. eBay? Update: I'm not finding any information regarding this on the eBay policy's: http://ocs.ebay.com/ws/eBayISAPI.dll?CustomerSupport&action=0&searchstring=canonical So it does look like I can have rel="canonical" tag in custom eBay templates but I'm concern this can be considered: "cheating" since rel="canonical is actually a 301 but as this says: http://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html it's legitimately duplicate content. The question is now: should I add it or not? UPDATE seems eBay templates are embedded in a iframe but the snap shot on google actually shows the template. This makes me wonder how they are handling iframes now. looking at http://www.webmaster-toolkit.com/search-engine-simulator.shtml does shows the content inside the iframe. Interesting. Anyone else have feedback?
Intermediate & Advanced SEO | | joseph.chambers1