Excel tips or tricks for duplicate content madness?
-
Dearest SEO Friends,
I'm working on a site that has over 2,400 instances of duplicate content (yikes!).
I'm hoping somebody could offer some excel tips or tricks to managing my SEOMoz crawl diagnostics summary data file in a meaningful way, because right now this spreadsheet is not really helpful. Here's a hypothetical situation to describe why:
Say we had three columns of duplicate content. The data is displayed thusly:
|
Column A
|
Column B
|
Column C
URL A
|
URL B
|
URL C
|
In a perfect world, this is easy to understand. I want URL A to be the canonical. But unfortunately, the way my spreadsheet is populated, this ends up happening:
|
Column A
|
Column B
|
Column C
URL A
|
URL B
|
URL C
URL B
|
URL A
|
URL C
URL C
|
URL A
|
URL B
|
Essentially all of these URLs would end up being called a canonical, thus rendering the effect of the tag ineffective. On a site with small errors, this has never been a problem, because I can just spot check my steps. But the site I'm working on has thousands of instances, making it really hard to identify or even scale these patterns accurately.
This is particularly problematic as some of these URLs are identified as duplicates 50+ times! So my spreadsheet has well over 100K cells!!! Madness!!! Obviously, I can't go through manually. It would take me years to ensure the accuracy, and I'm assuming that's not really a scalable goal.
Here's what I would love, but I'm not getting my hopes up. Does anyone know of a formulaic way that Excel could identify row matches and think - "oh! these are all the same rows of data, just mismatched. I'll kill off duplicate rows, so only one truly unique row of data exists for this particular set" ? Or some other work around that could help me with my duplicate content madness?
Much appreciated, you Excel Gurus you!
-
Choose one of the URL's as the authoritive and remove the dupped content from the others.
-
FMLLC,
I use Excel 2010 so my approach would be as follows:
-
Make a backup copy of your file before you start.
-
You will need to sort each row by value, but Excel has a 3 sort level limit, so you will need to add a macro.
-
Assuming your data starts in A1 and has no header row, Put it in a general module, go back to excel, activate your sheet, then run the macro from Tools=>Macro=>Macros.
Sub SortEachRowHorizontal()
Dim rng As Range, rw As Range
Set rng = Range("A1").CurrentRegion
For Each rw In rng.Rows
rw.Sort Key1:=rw(1), _
order1:=xlAscending, _
Header:=xlNo, _
OrderCustom:=1, _
MatchCase:=False, _
Orientation:=xlLeftToRight
Next
End Sub
- Then Highlight all your cells and then go to Data -> Remove Duplicates
The result should be all unique rows. I hope this helps.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Ive been using moz for just a minute now , i used it to check my website and find quite a number of errors , unfortunately i use a wordpress website and even with the tips , is till dont know how to fix the issues.
ive seen quite a number of errors on my website hipmack.co a wordpress website and i dont know how to begin clearing the index errors or any others for that matter , can you help me please? ghg-1.jpg
Moz Pro | | Dogara0 -
Duplicate Pages
Hello, we have an issue which I'm hoping someone can help with. Our Moz system is saying that this page http://www.indigolittle.com/fees/ Is a duplicate page. We use this page purely for mobiles and we have added code to say This has been on for over a month now however Moz is still picking the page us as a High Priority Issue.
Moz Pro | | popcreativeltd0 -
I have a duplicate content on my Moz crawler, but google hasn't indexed those pages: do I still need to get rid of the tags?
I received an urgent error from the Moz crawler that I have duplicate content on my site due to the tags I have. For example: http://www.1forjustice.com/graves-amendment/ The real article found here: http://www.1forjustice.com/car-accident-rental-car/ I didn't think this was a big deal, because when I looked at my GWT these pages weren't indexed (picture attached). Question: should I bother fixing this from an SEO perspective? If Google isn't indexing the pages, then am I losing link juice? 6c2kxiZ
Moz Pro | | Perenich0 -
Duplicate titles reported with canonical
Hi Mozzers, In the reports it is saying that I have some duplicate content and titles even though there is a canonical tag on them, is anyone else getting this?
Moz Pro | | KarlBantleman0 -
How do I fix a duplicate content error with a top level domain?
Hi, I'm getting a duplicate content error from the SEOmoz crawler due to an issue with trailing slashes. It's showing www.milengo.com and www.milengo.com/ as having duplicate page titles. However I'm pretty sure this has been fixed in the .htaccess file since if you type in the domain with a trailing slash it automatically redirects to the domain without a trailing slash, so this shouldn't be an issue. I'm stuck here. Any ideas? Thanks. Rob
Moz Pro | | milengo0 -
Using MOZ Keyword Rankings with Google Analytics in Excel to track paid keywords
Hi, I sorted my keywords in GA from highest incoming traffic to lowest, exported it and then imported these 280 or something keywords into SEO Moz. The problem is, it appears that SEO Moz Keyword Rankings tool pulls traffic data (in the traffic column) only for NON-PAID traffic ... So what I'm seeing is that keywords that brings A LOT of traffic to my site (because of AdWords) sometimes have 0 of traffic in SEO Moz, because I'm on page 3 of SERP and I literally got no non-paid traffic that week. The thing is: I want to improve my rankings for keywords that I'm already performing well, even if they are paid keywords, because they already bring me a lot of traffic to my website. The way this is working is not helping me out, because when I sort by the traffic column very important keywords end up in the last place for me. Questions: Whey does SEO Moz pull traffic data from non-paid keywords only ? Shouldn't be at least an option for this or I'm missing something here ? While they don't change it, any Excel Gurus could help me how to export SEO Moz ranking data, import it on Excel and MERGE a new "paid traffic" column that I export from Google Analytics ? How to assing each keyword line the paid traffic value I'm getting from GA ? Thanks ! Felipe
Moz Pro | | pqdbr0 -
RSS feed showing up as duplicate content
Hi, I've just run an SEOMOZ Pro scan for the first time and it is picking up duplicate content errors from the RSS feed. For some reason it seems to be picking up two feeds, for example: http://blog.clove.co.uk/2009/05/13/htc-touch-diamond2-review/feed/ http://blog.clove.co.uk/2009/05/19/htc-touch-diamond2-review-2/feed/ Does anyone know why this is happening and how I can resolve this? Thanks
Moz Pro | | pugh0 -
Redirecting duplicate .asp pages??
Hi all, I have a bit of a problem with duplicate content on our website. The CMS has been creating identical duplicate pages depending on which menu route a user takes to get to a product (i.e. via the side menu button or the top menu bar). Anyway, the web design company we use are sorting it out going forward, and creating 301 redirects on the duplicate pages. My question is, some of the duplicates take two different forms. E.g. for the home page: www.<my domain="">.co.uk
Moz Pro | | gdavies09031977
www..<my domain="">.co.uk/index.html
www.<my domain="">.co.uk/index.asp</my></my></my> Now I understand the 'index.html' page should be redirected, but does the 'index.asp' need to be directed also? What makes this more confusing is when I run the SEOMoz diagnostics report (which brought my attention to the duplicate content issue in the first place - thanks SEOMoz), not all the .asp pages are identified as duplicates. For example, the above 'index.asp' page is identified as a duplicate, but 'contact-us.asp' is not highlighted as a duplicate to 'contact-us.html'? I'm a bit new to all this (I'm not a IT specialist), so any clarification anyone can give would be appreciated. Thanks, Gareth0