Is Google able to determine duplicate content every day/ month?
-
A while ago I talked to somebody who used to work for MSN a couple of years ago within their engineering department. We talked about a recent dip we had with one of our sites.We argued this could be caused by the large amount of duplicate content we have on this particular website (+80% of our site).
Then he said, quoted: "Google seems only to be able to determine every couple of months instead of every day if the content is actually duplicate content". I clearly don't doubt that duplicate content is a ranking factor. But I would like to know you guys opinions about Google being only able to determine this every couple of X months instead of everyday.
Have you seen or heard something similar?
-
Sorting out Google's timelines is tricky these days, because they aren't the same for every process and every site. In the early days, the "Google dance" happened about once a month, and that was the whole mess (index, algo updates, etc.). Over time, index updates have gotten a lot faster, and ranking and indexation are more real-time (especially since the "Caffeine" update), but that varies wildly across sites and pages.
I think you also have to separate a couple of impacts of duplicate content. When it comes to filtering - Google excluding a piece of duplicate content from rankings (but not necessarily penalizing the site), I don't see any evidence that this takes a couple of months. It can Google days or weeks to re-cache any given page, and to detect a duplicate they would have to re-cache both copies, so that may take a month in some cases, realistically. I strongly suspect, though, that the filter itself happens in real-time. There's no good way to store a filter for every scenario, and some filters are query-specific. Computationally, some filters almost have to happen on the fly.
On the other hand, you have updates like Panda, where duplicate content can cause something close to a penalty. Panda data was originally updated outside of the main algorithm, to the best of our knowledge, and probably about once/month. Over the more than a year since Panda 1.0 rolled out, though, it seems that this timeline accelerated. I don't think it's real-time, but it may be closer to 2 weeks (that's speculation, I admit).
So, the short answer is "It's complicated" I don't have any evidence to suggest that filtering duplicates takes Google months (and, actually, have anecdotal evidence that it can happen much faster). It is possible that it could take weeks or months to see the impact of duplicates on some sites and in some situations, though.
-
Hi Donnie,
Thanks for your reply, but I was already aware of the fact that Google had/ has a sandbox. I had to mention this within my question. I'm looking more for an answer around the fact if Google is able to determine on what basis if pages are duplicate.
Because I saw dozens of cases where our content was indexed and we linked/ linked not back to the 'original' source.
Also want to make clear that in all of these cases the duplicate content was in agreement with the original sources just to be sure.
-
In the past google had a sandbox period before any page (content) would rank. However, now everything is instant. (just learned this today @seomoz)
If you release something, Google will index it as fast as possible. If that info gets duplicated Google will only count the first one indexed. Everyone else loses brownie points unless they trackback/link back to the main article (first indexed).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404 Status Codes in Google Search Console
Hi all, I've noticed in Google Search Console under 'Crawl errors' - 1. Why does the status code '410' come up as an 'error' in the crawl report? 2. Why are some articles labelled as '404' error when they have been completely deleted and should be a '410' - there are roughly around 1000-2000 of these. Thanks!
Reporting & Analytics | | lucwiesman0 -
Google Analytics Tutorials
Hi, I'm trying to beef up my knowledge of google analytics. Can you pelase tell me where I can find some good Google analytics tutorials?
Reporting & Analytics | | corn20150 -
Links from PA/DA of 1
My immediate thought is that a link from a website that has a PA/DA of 1 for both or either metric would be absolutely worthless, but I just wanted to make sure. Could this also be from it being recently indexed? Also is there a rule of thumb that anybody knows, or tries to follow at least? Obliviously the higher the better, but for example; A PA/DA of 15 and below is SPAM A PA/DA of 30 and above is GOOD A PA/DA of 50 is GREAT PA/DA of 80 is AMAZING PA/DA of 100 is GODTIER
Reporting & Analytics | | HashtagHustler0 -
Index.php and /
Hello, We have a php system and in the MOZ error report our index.php shows up as a duplicate for / (home page). I instituted a rel canonical on the index.php because the / gets better rank than the other. This said, the error report through MOZ still shows them as duplicates. Should I be using a 301 instead? Please help! Also, I would love a good technical SEO book (for bridging the gap between SEO and programmer) if someone can recommend one? Thanks in advance!
Reporting & Analytics | | lfrazer0 -
Goal tracking in Google Analytics
Hi folks I read from various sources that if you setup goals in Google Analytics each of these goals can only be fulfilled once per visit. Also some sources suggesting that only one goal from each goal group can be fulfilled per visit. On our site we have a goal for external links since this provides value to partners. Some users do open an external link in a new tab, then come back to the main site. Any further goal completions would then not get tracked. Since we apply a result based payment model for our work this means we are literally loosing money. Anyone has official info from Google on this? Can it be configured? How long is a visit? Thanks a million and have a great day. Fredrik
Reporting & Analytics | | Resultify0 -
Google Search Results inconsistent from different computers
Recently after some optimization activities - I do not see much movement in search rankings - my client is seeing the results on page 1 position 3 and I see page 2 for the same keyword. How does Google change ranking based on past searches and how can I get an accurate picture of what the actual rank is?
Reporting & Analytics | | devonkrusich0 -
Does using Google URL Builder override original source in Google Analytics?
During a free trial on Tatango, we send daily emails to customers to give them advice, resources, etc. We started using Google URL Builder http://www.google.com/support/analytics/bin/answer.py?answer=55578 to create individual links in each of these emails, but when the customer purchases a subscription now, the source in GA isn't Google, Facebook, Twitter, etc. they are all showing up as the source we created using the URL builder for each email. Does Google URL builder override the original source in Google Analytic?
Reporting & Analytics | | Tatango0 -
Google: show all images indexed on a domain
Is there a way to display all images that google has indexed on a domain / subdomain? I'm basically looking for something like a site:-command for google image search.
Reporting & Analytics | | jmueller0