How do I use public content without being penalized for duplication?
-
The NHTSA produces a list of all recalls for automobiles. In their "terms of use" it states that the information can be copied. I want to add that to our site, so there is an up-to-date list for our audience to see. However, I'm just copying and pasting. I'm allowed to according to NHTSA, but google will probably flag it right? Is there a way to do this without being penalized?
Thanks,
Ruben
-
I didn't think about other sites, but that's a fabulous point. Best to play it safe.
Thanks for your input!
- Ruben
-
My gut says that your idea to keep the content noindexed is best. Even if the content is unique it borders on the area of auto-generated. Now, I might change your mind if there were a lot of users that were actively interacting with most of these pages. If not though, then you'll end up having a large portion of your site consisting of auto-generated content that doesn't seem useful. Plus...it's also possible that other sites are using this information so you would end up having content that is duplicated on other sites too.
I could be wrong, but my gut says not to try to use this content for ranking purposes.
-
I appreciate the follow up, Marie. Please give me your thoughts on the following idea;
The NHTSA only posts the updates for the past month. If I noindex the page for now (which is what I'm doing) and wait five months, then what would happen? At that point, yes, the current month would be duplicated, but I'd have four months of "unique" content because the NHTSA deletes there's. Plus, I could add pictures of all the automobile, too. Do you think that would be enough to index it?
(I'm most likely going to keep it noindex, because this borders on shady, or at least, I could see google taking it that way, but just as a thought experiment, what do you think?) Or anyone else?
Thanks,
Ruben
-
To expand on EGOL's answer, if you are taking someone else's content (even with their permission) and wanting Google to index it then Google can see that you have a large amount of copied content on your site. This can trigger the Panda filter and can cause Google to consider your whole site as low quality.
You can add a noindex tag as EGOL suggested or you could use a canonical tag to show Google who the originator of the content is, but probably the noindex tag is easiest.
There is one other option as well. If you think it is possible that you can add significant value to the content that is being provided then you can still keep it indexed. If you can combine the recall information with other valuable information then that might be ok to index. But, you have to truly be providing value, not just padding the page with words to make it look unique.
-
Alright, that sounds good. Thanks!
-
I had a bunch of republished articles on my website, done mostly at the request of government agencies and universities. That site got hit in one of the early Panda updates. So, I deleted a lot of that content and to the rest I added this line above the tag
name="robots" content="noindex, follow" />
That tells google not to index the page but to follow the links and allow pagerank to flow through. My site recovered a few weeks later.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Upper and lower case URLS coming up as duplicate content
Hey guys and gals, I'm having a frustrating time with an issue. Our site has around 10 pages that are coming up as duplicate content/ duplicate title. I'm not sure what I can do to fix this. I was going to attempt to 301 direct the upper case to lower but I'm worried how this will affect our SEO. can anyone offer some insight on what I should be doing? Update: What I'm trying to figure out is what I should do for our URL's. For example, when I run an audit I'm getting two different pages: aaa.com/BusinessAgreement.com and also aaa.com/businessagreement.com. We don't have two pages but for some reason, Google thinks we do.
Intermediate & Advanced SEO | | davidmac1 -
Same site serving multiple countries and duplicated content
Hello! Though I browse MoZ resources every day, I've decided to directly ask you a question despite the numerous questions (and answers!) about this topic as there are few specific variants each time: I've a site serving content (and products) to different countries built using subfolders (1 subfolder per country). Basically, it looks like this:
Intermediate & Advanced SEO | | GhillC
site.com/us/
site.com/gb/
site.com/fr/
site.com/it/
etc. The first problem was fairly easy to solve:
Avoid duplicated content issues across the board considering that both the ecommerce part of the site and the blog bit are being replicated for each subfolders in their own language. Correct me if I'm wrong but using our copywriters to translate the content and adding the right hreflang tags should do. But then comes the second problem: how to deal with duplicated content when it's written in the same language? E.g. /us/, /gb/, /au/ and so on.
Given the following requirements/constraints, I can't see any positive resolution to this issue:
1. Need for such structure to be maintained (it's not possible to consolidate same language within one single subfolders for example),
2. Articles from one subfolder to another can't be canonicalized as it would mess up with our internal tracking tools,
3. The amount of content being published prevents us to get bespoke content for each region of the world with the same spoken language. Given those constraints, I can't see a way to solve that out and it seems that I'm cursed to live with those duplicated content red flags right up my nose.
Am I right or can you think about anything to sort that out? Many thanks,
Ghill0 -
Duplicating relevant category content in subcategories. Good or bad for google ranking?
In a travel related page I have city categories with city related information.
Intermediate & Advanced SEO | | lcourse
Would you recommend for or against duplicating some relevant city related in subcategory pages. For visitor it would be useful and google should have more context about the topic of our page.
But my main concern is how this may be perceived by google and especially whether it may make it more likely being penalized for thin content. We already were hit end of june by panda/phantom and we are working on adding also more unique content, but this would be something that we could do additionally and basically instantaneously. Just do not want to make things worse.0 -
Duplicate Content through 'Gclid'
Hello, We've had the known problem of duplicate content through the gclid parameter caused by Google Adwords. As per Google's recommendation - we added the canonical tag to every page on our site so when the bot came to each page they would go 'Ah-ha, this is the original page'. We also added the paramter to the URL parameters in Google Wemaster Tools. However, now it seems as though a canonical is automatically been given to these newly created gclid pages; below https://www.google.com.au/search?espv=2&q=site%3Awww.mypetwarehouse.com.au+inurl%3Agclid&oq=site%3A&gs_l=serp.3.0.35i39l2j0i67l4j0i10j0i67j0j0i131.58677.61871.0.63823.11.8.3.0.0.0.208.930.0j3j2.5.0....0...1c.1.64.serp..8.3.419.nUJod6dYZmI Therefore these new pages are now being indexed, causing duplicate content. Does anyone have any idea about what to do in this situation? Thanks, Stephen.
Intermediate & Advanced SEO | | MyPetWarehouse0 -
On Page Content. has a H2 Tag but should I also use H3 tags for the sub headings within this body of content
Hi Mozzers, My on page content comes under my H2 tag. I have a few subheadings within my content to help break it up etc and currently this is just underlined (not bold or anything) and I am wondering from an SEO perspective, should I be making these sub headings H3 tags. Otherwise , I just have 500-750 words of content under an H2 tag which is what I am currently doing on my landing pages. thanks pete
Intermediate & Advanced SEO | | PeteC120 -
Duplicate content clarity required
Hi, I have access to a masive resource of journals that we have been given the all clear to use the abstract on our site and link back to the journal. These will be really useful links for our visitors. E.g. http://www.springerlink.com/content/59210832213382K2 Simply, if we copy the abstract and then link back to the journal source will this be treated as duplicate content and damage the site or is the link to the source enough for search engines to realise that we aren't trying anything untoward. Would it help if we added an introduction so in effect we are sort of following the curating content model? We are thinking of linking back internally to a relevant page using a keyword too. Will this approach give any benefit to our site at all or will the content be ignored due to it being duplicate and thus render the internal links useless? Thanks Jason
Intermediate & Advanced SEO | | jayderby0 -
Is it fine to use an iframe for video content? Will it still be indexed on your URL?
If we host a video on a third party site and use an iframe to display it on our site, when the video is indexed in SERPs will it show on our site or on the third party site?
Intermediate & Advanced SEO | | nicole.healthline0 -
Nuanced duplicate content problem.
Hi guys, I am working on a recently rebuilt website, which has some duplicate content issues that are more nuanced than usual. I have a plan of action (which I will describe further), so please let me know if it's a valid plan or if I am missing something. Situation: The client is targeting two types of users: business leads (Type A) and potential employees (Type B), so for each of their 22 locations, they have 2 pages - one speaking to Type A and another to Type B. Type A location page contains a description of the location. In terms of importance, Type A location pages are secondary because to the Type A user, locations are not of primary importance. Type B location page contains the same description of the location plus additional lifestyle description. These pages carry more importance, since they are attempting to attract applicants to work in specific places. So I am planning to rank these pages eventually for a combination of Location Name + Keyword. Plan: New content is not an option at this point, so I am planning to set up canonical tags on both location Types and make Type B, the canonical URL, since it carries more importance and more SEO potential. The main nuance is that while Type A and Type B location pages contain some of the same content (about 75%-80%), they are not exactly the same. That is why I am not 100% sure that I should canonicalize them, but still most of the wording on the page is identical, so... Any professional opinion would be greatly appreciated. Thanks!
Intermediate & Advanced SEO | | naymark.biz0