Duplicate Content due to Panda update!
-
I can see that a lot of you are worrying about this new Panda update just as I am!
I have such a headache trying to figure this one out, can any of you help me?
I have thousands of pages that are "duplicate content" which I just can't for the life of me see how... take these two for example:
http://www.eteach.com/Employer.aspx?EmpNo=18753
http://www.eteach.com/Employer.aspx?EmpNo=31241
My campaign crawler is telling me these are duplicate content pages because of the same title (which that I can see) and because of the content (which I can't see).
Can anyone see how Google is interpreting these two pages as duplicate content??
Stupid Panda!
-
Hi Virginia
This is frustrating indeed as it certainly doesn't look like you've used duplicate content in a malicious way.
To understand why Google might be seeing these pages as duplicate content, let's take a look at the pages through the Google bot's eyes:
Google Crawl for page 1
Google Crawl for page 2What you'll see here is that Google is reading the entirety of both pages, with the only difference being a logo that it can't see and a name + postal address. The rest of the page is duplicate. This should point out that Google reads things like site navigation menus and footers and interprets them, for the purpose of Panda, as "content".
This doesn't mean that you should have a different navigation on every page (that wouldn't be feasible). But it does mean that you need to have enough unique content on each page to show Google that the pages are not duplicate and contain content. I can't give you a % on this, but let's say roughly content that is 300-400 words long would do the trick.
Now, this might be feasible for some of your pages, but for the two pages you've linked to above, there simply isn't enough you could write about. Similarly, because the URL generates a random query for each employer, you could potentially have hundreds or thousands of pages you'd need to add content to, which is a hell of a lot of work.
So here's what I'd do. I'd get a list of each URL on your site that could be seen as "duplicate" content, like the ones above. Be as harsh in judging this as Google would be. I'd then decide whether you can add further content to these pages or not. For description pages or "about us" pages, you can perhaps add a bit more. For URLs like the ones above, you should do the following:
In the header of each of these URLs you've identified, add this code:
This tells the Googlebot not to crawl or index the URLs. In doing that, it won't rank it in the index and it won't see it as duplicate content. This would be perfect for the URLs you've given above as I very much doubt you'd ever want to rank these pages, so you can safely noindex and nofollow them. Furthermore, as these URLs are created from queries, I am assuming that you may have one "master" page that the URLs are generated from. This may mean that you would only need to add the meta code to this one page for it to apply to all of them. I'm not certain on this and you should clarify with your developers and/or whoever runs your CMS. The important thing, however, is to have the meta tags applied to all those duplicate content URLs that you don't want to rank for. For those that you do want to rank for, you will need to add more unique content to those pages in order to stop it being flagged as duplicate.
As always, there's a great Moz post on how to deal with duplication issues right here.
Hope this helps Virginia and if you have any more questions, feel free to ask me!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content for product pages
Say you have two separate pages, each featuring a different product. They have so many common features, that their content is virtually duplicated when you get to the bullets to break it all down. To avoid a penalty, is it advised to paraphrase? It seems to me it would benefit the user to see it all laid out the same, apples to apples. Thanks. I've considered combining the products on one page, but will be examining the data to see if there's a lost benefit to not having separate pages. Ditto for just not indexing the one that I suspect may not have much traction (requesting data to see).
White Hat / Black Hat SEO | | SSFCU0 -
Panda Recovery: Is a reconsideration request necessary?
Hi everyone, I run a 12-year old travel site that primarily publishes hotel reviews and blog posts about ways to save when traveling in Europe. We have a domain authority of 65 and lots of high quality links from major news websites (NYT, USA Today, NPR, etc.). We always ranked well for competitive searches like "cheap hotels in Paris," etc., for many, many years (like 10 years). Things started falling two years ago (April 2011)--I thought it was just normal algorithmic changes, and that our pages were being devalued (and perhaps, it was). So, we continued to bulk up our reviews and other key pages, only to see things continue to slide. About a month ago I lined up all of our inbound search traffic from Google Analytics and compared it to SEO Moz's timeline of Google updates. Turns out every time there was a Panda roll-out (from the second one in April 2011) our traffic tumbled. Other updates (Penguin, etc.) didn't seem to make a difference. But why should our content that we invest so much in take a hit from Panda? It wasn't "thin." But thin content existed elsewhere on our site: We had a flights section with 40,000 pages of thin content, cranked out of our database with virtually no unique content. We had launched that section in 2008, and it had never been an issue (and had mostly been ignored), but now, I believed, it was working against us. My understanding is that any thin content can actually work against the entire site's rankings. In summary: We had 40,000 thin flights pages, 2,500 blog posts (rich content), and about 2,500 hotel-related pages (rich and well researched "expert" content). So, two weeks ago we dropped almost the entire flights section. We kept about 400 pages (of the 40,000) with researched, unique and well-written information, and we 410'd the rest. Following the advice of so many others on these boards, we put the "thin" flights pages in their own sitemap so we could watch their index number fall in Webmaster tools. And we watched (with some eagerness and trepidation) as the error count shot up. Google has found about half of them at this point. Last week I submitted a "reconsideration request" to Google's spam team. I wasn't sure if this was necessary (as the whole point of dropping the pages, 410'ing and so forth was to fix it on our end, which would hopefully filter down through the SERPs eventually). However, I thought it was worth sending them a note explaining the actions we had taken, just in case. Today I received a response from them. It includes: "We reviewed your site and found no manual actions by the webspam team that might affect your site's ranking in Google. There's no need to file a reconsideration request for your site, because any ranking issues you may be experiencing are not related to a manual action taken by the webspam team. Of course, there may be other issues with your site that affect your site's ranking. Google's computers determine the order of our search results using a series of formulas known as algorithms. We make hundreds of changes to our search algorithms each year, and we employ more than 200 different signals when ranking pages. As our algorithms change and as the web (including your site) changes, some fluctuation in ranking can happen as we make updates to present the best results to our users. If you've experienced a change in ranking which you suspect may be more than a simple algorithm change, there are other things you may want to investigate as possible causes, such as a major change to your site's content, content management system, or server architecture. For example, a site may not rank well if your server stops serving pages to Googlebot, or if you've changed the URLs for a large portion of your site's pages..." And thus, I'm a bit confused. If they say that there wasn't any manual action taken, is that a bad thing for my site? Or is it just saying that my site wasn't experiencing a manual penalty, however Panda perhaps still penalized us (through a drop in rankings) -- and Panda isn't considered "manual." Could the 410'ing of 40,000 thin pages actually raise some red flags? And finally, how long do these issues usually take to clear up? Pardon the very long question and thanks for any insights. I really appreciate the advice offered in these forums.
White Hat / Black Hat SEO | | TomNYC0 -
What are your views on recent statements regarding "advertorial" content?
Hi, Recently, there's been a lot said and written about how Google is going to come down hard on 'advertorial' content. Many B2B publishers provide exposure to their clients by creating and publishing content about them -----based on information/ content obtained from clients (for example, in the form of press releases) or compiled by the publisher. From a target audience/ user perspective, this is useful information that the publication is bringing to its audience. Also, let's say the publishers don't link directly to client websites. In such a case, how do you think Google is likely to look at publisher websites in the context of the recent statements related to 'advertorial' type content? Look forward to views of the Moz community. Thanks, Manoj
White Hat / Black Hat SEO | | ontarget-media0 -
DIV Attribute containing full DIV content
Hi all I recently watched the latest Mozinar called "Making Your Site Audits More Actionable". It was presented by the guys at seogadget. In the mozinar one of the guys said he loves the website www.sportsbikeshop.co.uk and that they have spent a lot of money on it from an SEO point of view (presumably with seogadget) so I decided to look through the source and noticed something I had not seen before and wondered if anyone can shed any light. On this page (http://www.sportsbikeshop.co.uk/motorcycle_parts/content_cat/852/(2;product_rating;DESC;0-0;all;92)/page_1/max_20) there is a paragraph of text that begins with 'The ever reliable UK weather...' and when you via the source of the containing DIV you will notice a bespoke attribute called "threedots=" and within it, is the entire text content for that DIV. Any thoughts as to why they would put that there? I can't see any reason as to why this would benefit a site in any shape or form. Its invalid markup for one. Am I missing a trick..? Thoughts would be greatly appreciated. Kris P.S. for those who can't be bothered to visit the site, here is a smaller version of what they have done: This is an introductory paragraph of text for this page.
White Hat / Black Hat SEO | | yousayjump0 -
Site dropped suddenly. Is it due to htaccess?
I had a new site that was ranking on the first page for 5 keywords. My site was hacked recently and I went through a lot of trouble to restore it. Last night, I discovered that my site was nowhere to be found but when i searched site: mysite.com, it was still ranking which means it was not penalized. I discovered the issue to be a .htaccess and it have been resolved. My question is now that the .htaccess issue is resolved , will my site be restored back to the first page? Is there additional things that i should do? I have notified google by submitting my site
White Hat / Black Hat SEO | | semoney0 -
Using Programmatic Content
My company has been approached a number of times by computer generated content providers (like Narrative Science and Comtex). They are providing computer generated content to a number of big name websites. Does anyone have any experience working with companies like this? We were burned by the first panda update because we were busing boilerplate forms for content
White Hat / Black Hat SEO | | SuperMikeLewis0 -
Why doesn't Google find different domains - same content?
I have been slowly working to remove near duplicate content from my own website for different locals. Google seems to be doing noting to combat the duplicate content of one of my competitors showing up all over southern California. For Example: Your Local #1 Rancho Bernardo Pest Control Experts | 858-352 ... <cite>www.pestcontrolranchobernardo.com/</cite>CachedYou +1'd this publicly. UndoPest Control Rancho Bernardo Pros specializes in the eradication of all household pests including ants, roaches, etc. Call Today @ 858-352-7728. Your Local #1 Oceanside Pest Control Experts | 760-486-2807 ... <cite>www.pestcontrol-oceanside.info/</cite>CachedYou +1'd this publicly. UndoPest Control Oceanside Pros specializes in the eradication of all household pests including ants, roaches, etc. Call Today @ 760-486-2807. The competitor is getting high page 1 listing for massively duplicated content across web domains. Will Google find this black hat workmanship? Meanwhile, he's sucking up my business. Do the results of the competitor's success also speak to the possibility that Google does in fact rank based on the name of the url - something that gets debated all the time? Thanks for your insights. Gerry
White Hat / Black Hat SEO | | GerryWeitz0