Letting Others Use Our Content: Risk-Free Attribution Methods
-
Hello Moz!
A massive site that you've all heard of is looking to syndicate some of our original editorial content. This content is our bread and butter, and is one of the primary reasons why people use our site.
Note that this site is not a competitor of ours - we're in different verticals.
If this massive site were to use the content straight up, I'm fairly confident that they'd begin to outrank us for related terms pretty quickly due to their monstrous domain authority.
This is complex because they'd like to use bits and pieces of the content interspersed with their own content, so they can't just implement a cross-domain canonical. It'd also be difficult to load the content in an iframe with noindex,nofollow header tags since their own content (which they want indexed) will be mixed up with ours.
They're also not open to including a link back to the product pages where the corresponding reviews live on our site.
Are there other courses of action that could be proposed that would protect our valuable content?
Is there any evidence that using schema.org (Review and Organization schemas) pointing back to our review page URLs would provide attribution and prevent them from outranking us for associated terms?
-
Logan, I found your replies very helpful. We have allowed a site to replicate some of our pages / content on their site and have the rel canonical tag in place pointing back to us. However, Google has indexed the pages on the partner's site as well. Is this common or has something gone wrong? the partner temporarily had an original source tag pointing to their page as well as the canonical pointing to us. We caught this issue a few weeks ago and had the original source tag removed. GSC sees the rel canonical tag for our site. But I am concerned our site could be getting hurt for dupe content issues and the partner site may out rank us as their site is much stronger. Any insight would be greatly appreciated
-
"Why did this offer come my way?"
When someone asks to use your content, that is what you should be asking yourself.
When someone asks to use my content, my answer is always a fast. NO! Even if the Pope is asking, the answer will be NO.
-
This is exactly my concern. Our site is massive in it's own industry, but this other site is a top player across many industries - surely we'd be impacted by such an implementation without some steps taken to confirm attribution.
Thank you for confirming my suspicions.
-
Google claims that they are good at identifying the originator of the content. I know for a fact that they are overrating their ability on this.
Publish an article first on a weak site, allow it to be crawled and remain for six months. Then, put that same article on a powerful site. The powerful site will generally outrank the other site for the primary keywords of the article or the weak site will go into the supplemental results. Others have given me articles with the request that I publish them. After I published them they regretted that they were on my site.
Take pieces of an article from a strong site and republish them verbatim on a large number of weak sites. The traffic to the article on the strong site will often drop because the weak sites outrank it for long-tail keywords. I have multiple articles that were ranking well for valuable keywords. Then hundreds of mashup sites grabbed pieces of the article and published them verbatim. My article tanked in the SERPs. A couple years later the mashups fell from the SERPs and my article moved back up to the first page.
-
But, I would not agree with their site being the one to take the damage. YOU will lose a lot of long-tail keyword traffic because now your words are on their site and their site is powerful.
Typically, the first one that's crawled will be considered the originator of the content--then if a site uses that content it will be the one who is damaged (if that's the case). I was under the impression that your content was indexed first--and the other site will be using your content. At least that's the way I understood it.
So, if your content hasn't already been indexed then you may lose in this.
-
This is complex because they'd like to use bits and pieces of the content interspersed with their own content, so they can't just implement a cross-domain canonical. It'd also be difficult to load the content in an iframe with noindex,nofollow header tags since their own content (which they want indexed) will be mixed up with ours.
Be careful. This is walking past the alligator ambush. I agree with Eric about the rel=canonical. But, I would not agree with their site being the one to take the damage. YOU will lose a lot of long-tail keyword traffic because now your words are on their site and their site is powerful.
They're also not open to linking back to our content.
It these guys walked into my office with their proposal they might not make it to the exit alive.
My only offer would be for them to buy me out completely. That deal would require massive severances for my employees and a great price for me.
-
You're in the driver's seat here. _You _have the content _they _want. If you lay down your requirements and they don't want to play, then don't give them permission to use your content. It's really that simple. You're gaining nothing here with their rules, and they gain a lot. You should both be winning in this situation.
-
Thank you for chiming in Eric!
There pages already rank extraordinarily well. #1 for almost every related term that they have products for, across the board.
They're also not open to linking back to our content.
-
In an ideal situation, the canonical tag is preferred. Since you mentioned that it's not the full content, and you can't implement it, then there may be limited options. We haven't seen any evidence that pointing back to your review page URLs would prevent them from outranking you--but it's not likely. If there are links there, then you'd get some link juice passed on.
Most likely, though, if that content is already indexed on your site then it's going to be seen as duplicate content on their site--and would only really hurt their site, in that those pages may not rank.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Shall we add engaging and useful FAQ content in all our pages or rather not because of duplication and reduction of unique content?
We are considering to add at the end of alll our 1500 product pages answers to the 9 most frequently asked questions. These questions and answers will be 90% identical for all our products and personalizing them more is not an option and not so necessary since most questions are related to the process of reserving the product. We are convinced this will increase engagement of users with the page, time on page and it will be genuinely useful for the visitor as most visitors will not visit the seperate FAQ page. Also it will add more related keywords/topics to the page.
Intermediate & Advanced SEO | | lcourse
On the downside it will reduce the percentage of unique content per page and adds duplication. Any thoughts about wether in terms of google rankings we should go ahead and benefits in form of engagement may outweight downside of duplication of content?0 -
Buying a disused website and using their content - penalty risk?
Hi all, I'm in the process of setting up a new website. I have found various old websites covering a similar topic and I'm interested in purchasing two of these websites for their content as it is very good, despite those sites struggling to make ends meet. One of these websites is still live, the other one hasn't been live for 2 years. Let's say I bought these websites for their content, then used that content on my new domain and made sure the two websites where this content came from were offline, would I run a risk of getting penalised? Does Google hold onto content from a website even if it is now offline?
Intermediate & Advanced SEO | | Bee1590 -
What is considered duplicate content?
Hi, We are working on a product page for bespoke camper vans: http://www.broadlane.co.uk/campervans/vw-campers/bespoke-campers . At the moment there is only one page but we are planning add similar pages for other brands of camper vans. Each page will receive its specifically targeted content however the 'Model choice' cart at the bottom (giving you the choice to select the internal structure of the van) will remain the same across all pages. Will this be considered as duplicate content? And if this is a case, what would be the ideal solution to limit penalty risk: A rel canonical tag seems wrong for this, as there is no original item as such. Would an iFrame around the 'model choice' enable us to isolate the content from being indexed at the same time than the page? Thanks, Celine
Intermediate & Advanced SEO | | A_Q0 -
Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)
Hi Guys, We have developed a plugin that allows us to display used vehicle listings from a centralized, third-party database. The functionality works similar to autotrader.com or cargurus.com, and there are two primary components: 1. Vehicle Listings Pages: this is the page where the user can use various filters to narrow the vehicle listings to find the vehicle they want.
Intermediate & Advanced SEO | | browndoginteractive
2. Vehicle Details Pages: this is the page where the user actually views the details about said vehicle. It is served up via Ajax, in a dialog box on the Vehicle Listings Pages. Example functionality: http://screencast.com/t/kArKm4tBo The Vehicle Listings pages (#1), we do want indexed and to rank. These pages have additional content besides the vehicle listings themselves, and those results are randomized or sliced/diced in different and unique ways. They're also updated twice per day. We do not want to index #2, the Vehicle Details pages, as these pages appear and disappear all of the time, based on dealer inventory, and don't have much value in the SERPs. Additionally, other sites such as autotrader.com, Yahoo Autos, and others draw from this same database, so we're worried about duplicate content. For instance, entering a snippet of dealer-provided content for one specific listing that Google indexed yielded 8,200+ results: Example Google query. We did not originally think that Google would even be able to index these pages, as they are served up via Ajax. However, it seems we were wrong, as Google has already begun indexing them. Not only is duplicate content an issue, but these pages are not meant for visitors to navigate to directly! If a user were to navigate to the url directly, from the SERPs, they would see a page that isn't styled right. Now we have to determine the right solution to keep these pages out of the index: robots.txt, noindex meta tags, or hash (#) internal links. Robots.txt Advantages: Super easy to implement Conserves crawl budget for large sites Ensures crawler doesn't get stuck. After all, if our website only has 500 pages that we really want indexed and ranked, and vehicle details pages constitute another 1,000,000,000 pages, it doesn't seem to make sense to make Googlebot crawl all of those pages. Robots.txt Disadvantages: Doesn't prevent pages from being indexed, as we've seen, probably because there are internal links to these pages. We could nofollow these internal links, thereby minimizing indexation, but this would lead to each 10-25 noindex internal links on each Vehicle Listings page (will Google think we're pagerank sculpting?) Noindex Advantages: Does prevent vehicle details pages from being indexed Allows ALL pages to be crawled (advantage?) Noindex Disadvantages: Difficult to implement (vehicle details pages are served using ajax, so they have no tag. Solution would have to involve X-Robots-Tag HTTP header and Apache, sending a noindex tag based on querystring variables, similar to this stackoverflow solution. This means the plugin functionality is no longer self-contained, and some hosts may not allow these types of Apache rewrites (as I understand it) Forces (or rather allows) Googlebot to crawl hundreds of thousands of noindex pages. I say "force" because of the crawl budget required. Crawler could get stuck/lost in so many pages, and my not like crawling a site with 1,000,000,000 pages, 99.9% of which are noindexed. Cannot be used in conjunction with robots.txt. After all, crawler never reads noindex meta tag if blocked by robots.txt Hash (#) URL Advantages: By using for links on Vehicle Listing pages to Vehicle Details pages (such as "Contact Seller" buttons), coupled with Javascript, crawler won't be able to follow/crawl these links. Best of both worlds: crawl budget isn't overtaxed by thousands of noindex pages, and internal links used to index robots.txt-disallowed pages are gone. Accomplishes same thing as "nofollowing" these links, but without looking like pagerank sculpting (?) Does not require complex Apache stuff Hash (#) URL Disdvantages: Is Google suspicious of sites with (some) internal links structured like this, since they can't crawl/follow them? Initially, we implemented robots.txt--the "sledgehammer solution." We figured that we'd have a happier crawler this way, as it wouldn't have to crawl zillions of partially duplicate vehicle details pages, and we wanted it to be like these pages didn't even exist. However, Google seems to be indexing many of these pages anyway, probably based on internal links pointing to them. We could nofollow the links pointing to these pages, but we don't want it to look like we're pagerank sculpting or something like that. If we implement noindex on these pages (and doing so is a difficult task itself), then we will be certain these pages aren't indexed. However, to do so we will have to remove the robots.txt disallowal, in order to let the crawler read the noindex tag on these pages. Intuitively, it doesn't make sense to me to make googlebot crawl zillions of vehicle details pages, all of which are noindexed, and it could easily get stuck/lost/etc. It seems like a waste of resources, and in some shadowy way bad for SEO. My developers are pushing for the third solution: using the hash URLs. This works on all hosts and keeps all functionality in the plugin self-contained (unlike noindex), and conserves crawl budget while keeping vehicle details page out of the index (unlike robots.txt). But I don't want Google to slap us 6-12 months from now because it doesn't like links like these (). Any thoughts or advice you guys have would be hugely appreciated, as I've been going in circles, circles, circles on this for a couple of days now. Also, I can provide a test site URL if you'd like to see the functionality in action.0 -
Penalized for Similar, But Not Duplicate, Content?
I have multiple product landing pages that feature very similar, but not duplicate, content and am wondering if this would affect my rankings in a negative way. The main reason for the similar content is three-fold: Continuity of site structure across different products Similar, or the same, product add-ons or support options (resulting in exactly the same additional tabs of content) The product itself is very similar with 3-4 key differences. Three examples of these similar pages are here - although I do have different meta-data and keyword optimization through the pages. http://www.1099pro.com/prod1099pro.asp http://www.1099pro.com/prod1099proEnt.asp http://www.1099pro.com/prodW2pro.asp
Intermediate & Advanced SEO | | Stew2220 -
Appropriate use of rel canonical
Hey Guys,I'm a bit stuck. My on-page grade indicated the following two issues and I need to find how how to fix both issues.If you have a solution, could you please let me know how to address these issues? It's all a bit intimidating at the moment!!Thank you so much..****************************************************************************************************************************************Appropriate Use of Rel Canonical If the canonical tag is pointing to a different URL, engines will not count this page as the reference resource and thus, it won't have an opportunity to rank. Make sure you're targeting the right page (if this isn't it, you can reset the target above) and then change the canonical tag to reference that URL. Recommendation: We check to make sure that IF you use canonical URL tags, it points to the right page. If the canonical tag points to a different URL, engines will not count this page as the reference resource and thus, it won't have an opportunity to rank. If you've not made this page the rel=canonical target, change the reference to this URL. NOTE: For pages not employing canonical URL tags, this factor does not apply. No More Than One Canonical URL Tag The canonical URL tag is meant to be employed only a single time on an individual URL (much like the title element or meta description). To ensure the search engines properly parse the canonical source, employ only a single version of this tag. Recommendation: Remove all but a single canonical URL tag
Intermediate & Advanced SEO | | StoryScout1 -
Keyword research Methods
I am Looking for keyword research guide .because there are many ask about keyword research no idea where i can start .plz expert advice highly appreciate.
Intermediate & Advanced SEO | | innofidelity0 -
Google Translate for Unique Content
We are considering using the Google Translation tool to translate customer reviews into various languages for publication as indexable content both for users and for search engine long tail visibility and rankings. Does anyone have any experience, insights or caveats to share?
Intermediate & Advanced SEO | | edreamsbcn0