Excel tips or tricks for duplicate content madness?
-
Dearest SEO Friends,
I'm working on a site that has over 2,400 instances of duplicate content (yikes!).
I'm hoping somebody could offer some excel tips or tricks to managing my SEOMoz crawl diagnostics summary data file in a meaningful way, because right now this spreadsheet is not really helpful. Here's a hypothetical situation to describe why:
Say we had three columns of duplicate content. The data is displayed thusly:
|
Column A
|
Column B
|
Column C
URL A
|
URL B
|
URL C
|
In a perfect world, this is easy to understand. I want URL A to be the canonical. But unfortunately, the way my spreadsheet is populated, this ends up happening:
|
Column A
|
Column B
|
Column C
URL A
|
URL B
|
URL C
URL B
|
URL A
|
URL C
URL C
|
URL A
|
URL B
|
Essentially all of these URLs would end up being called a canonical, thus rendering the effect of the tag ineffective. On a site with small errors, this has never been a problem, because I can just spot check my steps. But the site I'm working on has thousands of instances, making it really hard to identify or even scale these patterns accurately.
This is particularly problematic as some of these URLs are identified as duplicates 50+ times! So my spreadsheet has well over 100K cells!!! Madness!!! Obviously, I can't go through manually. It would take me years to ensure the accuracy, and I'm assuming that's not really a scalable goal.
Here's what I would love, but I'm not getting my hopes up. Does anyone know of a formulaic way that Excel could identify row matches and think - "oh! these are all the same rows of data, just mismatched. I'll kill off duplicate rows, so only one truly unique row of data exists for this particular set" ? Or some other work around that could help me with my duplicate content madness?
Much appreciated, you Excel Gurus you!
-
Choose one of the URL's as the authoritive and remove the dupped content from the others.
-
FMLLC,
I use Excel 2010 so my approach would be as follows:
-
Make a backup copy of your file before you start.
-
You will need to sort each row by value, but Excel has a 3 sort level limit, so you will need to add a macro.
-
Assuming your data starts in A1 and has no header row, Put it in a general module, go back to excel, activate your sheet, then run the macro from Tools=>Macro=>Macros.
Sub SortEachRowHorizontal()
Dim rng As Range, rw As Range
Set rng = Range("A1").CurrentRegion
For Each rw In rng.Rows
rw.Sort Key1:=rw(1), _
order1:=xlAscending, _
Header:=xlNo, _
OrderCustom:=1, _
MatchCase:=False, _
Orientation:=xlLeftToRight
Next
End Sub
- Then Highlight all your cells and then go to Data -> Remove Duplicates
The result should be all unique rows. I hope this helps.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How Moz takes a page title is duplicate?
Suppose i have added suffix and prefix to each of my product (ex: i have two tittles like buy online t-shirt at abc.com & buy online poster at abc.com, so in this buy online and abc.com are suffix and prefix) so .. will it take these two page tittles as duplicates ?
Moz Pro | | vayush0 -
The pages that add robots as noindex will Crawl and marked as duplicate page content on seo moz ?
When we marked a page as noindex with robots like {<meta name="<a class="attribute-value">robots</a>" content="<a class="attribute-value">noindex</a>" />} will crawl and marked as duplicate page content(Its already a duplicate page content within the site. ie, Two links pointing to the same page).So we are mentioning both the links no need to index on SE.But after we made this and crawl reports have no change like it tooks the duplicate with noindex marked pages too. Please help to solve this problem.
Moz Pro | | trixmediainc0 -
Crawl report - duplicate page title/content issue
When the crawl report is finished, it is saying that there are duplicate content/page titles issues. However there is a canonical tag that is formatted correctly so just wondered if this was a bug or if anyone else was having the same issues? For example, I'm getting a error warning for this page http://www.thegreatgiftcompany.com/categories/categories_travel?sort=name_asc&searchterm=&page=1&layout=table
Moz Pro | | KarlBantleman0 -
Duplicate content analysis
Good morning everyone, I have just run a test from SEOmoz PRO tool and I got more than 2,000 double content errors. How might I see which are the 2 pages whose content is double? My website is just newly revamped and can't find these similarities on my own: for me there are not). Thanks for your help! Francesca
Moz Pro | | astojanov0 -
Sorting Dupe Content Pages
Hi, I'm no excel pro, and I'm having a bit of a challenge interpreting the Crawl Diagnostics export .csv file. I'd like to see at a glance which of my pages (and I have many) are the worst offenders for dupe content – ie. which have the most "Other URLs" associated with them. Thanks, would appreciate any advice on how other people are using this data, and/or how 'Moz recommends to do it. 🙂
Moz Pro | | ntcma0 -
Duplicate pages with canonical links still show as errors
On our CMS, there are duplicate pages such as /news, /news/, /news?page=1, /news/?page=1. From an SEO perspective, I'm not too worried, because I guess Google is pretty capable of sorting this out, but to be on the safe side, I've added canonical links. /news itself has no link, but all the other variants have links to "/news". (And if you go wild and add a bunch of random meaningless parameters, creating /news/?page=1&jim=jam&foo=bar&this=that, we will laugh at you and generate a canonical link back to "/news". We're clever like that.) So far so good. And everything appears to work fine. But SEOMoz is still flagging up errors about duplicate titles and duplicate content. If you click in, you'll see a "Note" on each error, showing that SEOMoz has found the canonical link. So SEOMoz knows the duplication isn't a problem, as we're using canonical links exactly the way they're supposed to be used, and yet is still flagging it as an error. Is this something I should be concerned about, or is it just a bug in SEOMoz?
Moz Pro | | LockyDotser0 -
Duplicate page errors
I have 102 duplicate page title errors and 64 duplicate page content errors. They are almost all from the email a friend forms that are on each product of my online store. I looked and the pages are identical except for the product name. Is this a real problem and if so is there a work around or should I see if I can turn off the email a friend option? Thanks for any information you can give me. Cingin Gifts
Moz Pro | | cingingifts0 -
Duplicate Content being caused by home page?
Hello everyone, I am new to SEOmoz and SEO in general and I have a quick questions. When running a SEO Web Crawler report on my URL, I noticed in the report that my home page (also known as my index page) was listed twice. Here is what the report was showing: www.example.com/ www.example.com/index.php So are these 2 different urls? If so, is this considered duplicate content and should I block crawler access to the index.php? Thanks in advance for the help!
Moz Pro | | threebiz0