Excel tips or tricks for duplicate content madness?
-
Dearest SEO Friends,
I'm working on a site that has over 2,400 instances of duplicate content (yikes!).
I'm hoping somebody could offer some excel tips or tricks to managing my SEOMoz crawl diagnostics summary data file in a meaningful way, because right now this spreadsheet is not really helpful. Here's a hypothetical situation to describe why:
Say we had three columns of duplicate content. The data is displayed thusly:
|
Column A
|
Column B
|
Column C
URL A
|
URL B
|
URL C
|
In a perfect world, this is easy to understand. I want URL A to be the canonical. But unfortunately, the way my spreadsheet is populated, this ends up happening:
|
Column A
|
Column B
|
Column C
URL A
|
URL B
|
URL C
URL B
|
URL A
|
URL C
URL C
|
URL A
|
URL B
|
Essentially all of these URLs would end up being called a canonical, thus rendering the effect of the tag ineffective. On a site with small errors, this has never been a problem, because I can just spot check my steps. But the site I'm working on has thousands of instances, making it really hard to identify or even scale these patterns accurately.
This is particularly problematic as some of these URLs are identified as duplicates 50+ times! So my spreadsheet has well over 100K cells!!! Madness!!! Obviously, I can't go through manually. It would take me years to ensure the accuracy, and I'm assuming that's not really a scalable goal.
Here's what I would love, but I'm not getting my hopes up. Does anyone know of a formulaic way that Excel could identify row matches and think - "oh! these are all the same rows of data, just mismatched. I'll kill off duplicate rows, so only one truly unique row of data exists for this particular set" ? Or some other work around that could help me with my duplicate content madness?
Much appreciated, you Excel Gurus you!
-
Choose one of the URL's as the authoritive and remove the dupped content from the others.
-
FMLLC,
I use Excel 2010 so my approach would be as follows:
-
Make a backup copy of your file before you start.
-
You will need to sort each row by value, but Excel has a 3 sort level limit, so you will need to add a macro.
-
Assuming your data starts in A1 and has no header row, Put it in a general module, go back to excel, activate your sheet, then run the macro from Tools=>Macro=>Macros.
Sub SortEachRowHorizontal()
Dim rng As Range, rw As Range
Set rng = Range("A1").CurrentRegion
For Each rw In rng.Rows
rw.Sort Key1:=rw(1), _
order1:=xlAscending, _
Header:=xlNo, _
OrderCustom:=1, _
MatchCase:=False, _
Orientation:=xlLeftToRight
Next
End Sub
- Then Highlight all your cells and then go to Data -> Remove Duplicates
The result should be all unique rows. I hope this helps.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content flagged by Moz that's not actually duplicate content at all
Hi, Moz has flagged a whole lot of pages as dupe content, but I cannot see how they qualify as such.
Moz Pro | | Caro-O
Not sure if I'm allowed to post actual URLs here....happy to if I can, but I feel certain that the pages are not 90% similar. Has anyone else had this experience? ~Caro1 -
Duplicate Errors found in my search
I have run my 1st site check with SEOMOZ and have 4000+ errors. The "duplicate Page Content" culprit appears to be a extended url that keeps showing as duplicating. This is only a customer log-in and can be redirected back to the main cust log in page, but is there a short way of doing it (rather than 4000x 301's)? The format of the url is: http://www.????.com.au/default/customer/account/login/referer/aSR0cDovL3d3dy1234YWNiYW Thanks
Moz Pro | | Paul_MC0 -
Is there a way to specify what SEOmoz classes as duplicate content?
Hi all, Currently working through the laundry list of errors and warning on our company's 24 websites. Due to the ridiculous amount of on page links and the sheer volume of products on our sites, much of the descriptive text is similar, following a strict pattern to best mention our USPs and the like. Of course we use a CMS, which means that all the pages look the same and draw this information from the style sheet. Anyways, to the problem at hand. I have been tasked with reducing the "error" count on the SEOmoz admin panel, the problem being SEOmoz is reporting duplicate page content, when they are different, but similar products, for example, 35, 45 and 55 litre refrigeration units. Is there a way in which I can specify what classes as duplicate content, or make the duplicate content report more restrictive, so that everything HAS to be the same for this error to show. Any help is much appreciated, thanks in advance.
Moz Pro | | cmuknbb0 -
"Duplicate Page Title" Problem - Please Help
Hello, My website is categorized into 2 main categories. Sci/Tech (Has 4 sub-categories) Gadgets(Has 2 sub-categories) The Crawl diagnostic tool shows "Duplicate Page Title" error on Gadget's sub-categories while there's no error on the Sci/Tech. I don't really know how to get rid of these errors. Anyone has a solution to this?
Moz Pro | | MighteeObvious0 -
How Do I deal with duplicate page titles for pages on eCommerce site
Hi We have an ecommerce site selling physical products. There are a few areas where the products run into two pages. I have used canonical meta tags and next and prev meta tags too. Despite this SEOMOZ reports are still displaying these as warnings for duplicate page titles. An example would be /brand_name/range/ <link rel="next" href="/brand_name/range/?page=2" /> <link rel="<a class="attribute-value">canonical</a>" href="/band_name/range/"/> /brand_name/range/?page=2 <link rel="<a class="attribute-value">prev</a>" href="/brand_name/range/" /> <link rel="<a class="attribute-value">canonical</a>" href="/band_name/range/?page=2"/> Should I be doing something different?
Moz Pro | | wouldBseoKING0 -
Issue: Duplicate page title
Hello, I have run the "Crawl Diagnostics" report using SEOmoz pro and it says that I have a total of 56 errors. 18 of those errors being duplicate content and another 38 errors being duplicate title tags. Now I have looked at both reports and detail and the reason I am getting there errors is due to the fact the it is checking "http" and "https". So for example: my website is http://www.widgets.com On the crawl diagnostics report, it also checks https://www.widgets.com So it looks like I have duplicate content and duplicate title tags because of this Now my question is this: Is this really duplicate content? If so, how do I fix this? Any help is greatly appreciated.
Moz Pro | | threebiz0 -
Help with duplicate title tags?
I was looking in Google webmaster tools and it says I have 95 duplicate title tags for my site Noah's Dad. When I look through the list it appears the pages with duplicate title tags are some of my category pages, archive pages, and some author pages... Not sure if you guys can use some of the tools to see what is actually showing up duplicate or not, and if you need more info just let me know. But I wanted to see if this is something I should be concerned with? Should WMT also say 0 in duplicate content? It seems like when I started my blog I was told no to be conceded with this sort of stuff in gwmt. Anyways...I just wanted to see what you guys think. (By the way, is there any way to tell what this duplicate content is having (or has had) on my SERP results? Thanks.
Moz Pro | | NoahsDad0 -
Company Name in Page Title creating thousands of "Duplicate Page Title" errors
I am new, and I just got back my crawl results (after a week or more). The first thing I noticed is that the "duplicate page title" is in the thousands, my urls and page titles are different. The only thing I can see is that our company name is at appended to the name of every title. I did search and found one other person with this problem, but no answer was given. Can anyone offer some advice? This doesn't seem right... Thanks,
Moz Pro | | AoyamaJPN0