Duplicate Content with ADN, DNS and F5 URLs
-
In my duplicate content report, there are URLs showing as duplicate content.
All of the pages work, they do not redirect, and they are used for either IT debugging or as part of a legacy system using a split DNS, QAing the site, etc...
They aren't linked (or at least, shouldn't be) on any pages, and I am not seeing them in Search Results, but Moz is picking them up. Should I be worried about duplicate content here and how should I handle them? They are replicates of the current live site, but have different subdomains.
We are doing clean up before migrating to a new CMS, so I'm not sure it's worth fixing at this point, or if it is even an issue at all. But should I make sure they are in robots or take any action to address these?
Thanks!
-
A couple more thoughts here, based on your revised question.
You'll want to figure out how those links to the rogue subdomain have been generated, so you don't just move them over to the new CMS (such as if it's in body text that gets wholesale copied without being examined).
If those old subdomains are not needed at all anymore, I'd get them removed entirely if you can, or at the very least blocked in robots.txt. You can verify each subdomain as its own site in Google Webmaster Tools, then request removal of those subdomains if the content is gone or if it's excluded in robots.txt.
You might suggest to the dev team that they password-protect things like this so they don't get accidentally crawled in the future, use robots.txt to block, etc.
If you have known dev subdomains that are needed, and you know about them as the SEO and make sure they have robots.txt on them, you might want to use a code monitoring service like https://www.polepositionweb.com/roi/codemonitor/ to monitor the contents of the robots.txt file. It will let you know if the file has been changed or removed (good idea for the main site too). I've seen dev sites copied over to live sites, and the robots.txt copied over too, so everything is now blocked on the new live site. I've also seen dev sites with a data refresh from the live site, and the robots.txt from the live site is now on the dev site, and the dev site gets indexed.
-
Thanks Keri, I received your note!
-
Hi! I have a couple of ideas, and sent you a quick email to the account on your Moz profile.
You may also find it helpful to do a google search for:
site:ourdomain.com -inurl:www
This will show you all the non-www subdomains that Google has indexed, in case some others have slipped on in and you don't want them to be indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pages with Duplicate Page Content Crawl Diagnostics
I have Pages with Duplicate Page Content in my Crawl Diagnostics Tell Me How Can I solve it Or Suggest Me Some Helpful Tools. Thanks
Technical SEO | | nomyhot0 -
Duplicate content on report
Hi, I just had my Moz Campaign scan 10K pages out of which 2K were duplicate content and URL's are http://www.Somesite.com/modal/register?destination=question%2F37201 http://www.Somesite.com/modal/register?destination=question%2F37490 And the title for all 2K is "Register" How can i deal with this as all my pages have the register link and login and when done it comes back to the same page where we left and that it actually not duplicate but we need to deal with it propely thanks
Technical SEO | | mtthompsons0 -
Duplicate content issue with trailing / ?
Hi ,I did a SEOmoz Crawl Test and found most pages show twice, for example: A: www.website.com/index.php/dog/walk B: www.website.com/index.php/dog/walk/ I've checked Google Analytics and 90% of organic search traffic arrives on the URLs with the trailing slash (B). Question 1: Can I assume I've a duplicate content problem? Question 2: Is it best to do 301 redirects from the 'non trailing slash' pages to the 'trailing slash pages'? Question 3: For some reason every web page has a '/index.php' in it (see A&B) above. No idea why. Should it be a SEO concern? Kind regards and thank you in advance Nigel
Technical SEO | | Richard5550 -
Fix duplicate content caused by tags
Hi everyone, TGIF. We are getting hundreds of duplicate content errors on our WP site by what appears to be our tags. For each tag and each post we are seeing a duplicate content error. I thought I had this fixed but apparently I do not. We are using the Genesis theme with Yoast's SEO plugin. Does anyone have the solution to what I imagine is this easy fix? Thanks in advance.
Technical SEO | | okuma0 -
Duplicate Content - That Old Chestnut!!!
Hi Guys, Hope all is well, I have a question if I may? I have several articles which we have written and I want to try and find out the best way to post these but I have a number of concerns. I am hoping to use the content in an attempt to increase the Kudos of our site by providing quality content and hopefully receiving decent back links. 1. In terms of duplicate content should I only post it on one place or should I post the article in several places? Also where would you say the top 5 or 10 places would be? These are articles on XML, Social Media & Back Links. 2. Can I post the article on another blog or article directory and post it on my websites blog or is this a bad idea? A million thanks for any guidance. Kind Regards, C
Technical SEO | | fenwaymedia0 -
API for testing duplicate content
Does anyone know a service or API or php lib to compare two (or more) pages and to return their similiarity (Level-3-Shingles). API would be greatly prefered.
Technical SEO | | Sebes0 -
Google Duplicate Content Penalty On My Own Site?
I am certain that I have hit a google penalty filter for my site http://www.playpokeronline.ca for my main keywords "play poker online" in google.ca I rank 670th and used to be on the first page between 1 and 10 in June. On Bing I am like 9th On my site I found the entire site duplicated as follows Original: www.playpokeronline.ca Duplicate www.playpokeronline.ca/playpokeronline/ this duplicate was not intentional and seems to be a result of my hosting at godaddy. for every page on my site and it shows up in webmaster tools I blocked the duplicate with robots.txt and a few days ago dropped it and wrote a rel=connonical tag in the top of each page visitors dropped from 100 per day in august to 12-20 in the last month. Google says that if duplicate content is made to try to game serps they may filter or penalize my site. Have I triggered this penalty or a different sort of over optimization penalty? Will the rel= canonical tags fix this or should i do something else? This Penalty Business is Not my Idea of a good time Thank You Jeb
Technical SEO | | PokerCanada0 -
Question about duplicate content within my site
Hi. New here to SEOmoz and also somewhat new to SEO in general. A friend has asked me to help do some onsite SEO for their company's website. The company uses Drupal Content Management System. They have a couple product pages that contain a tabbed section for features, accessories, etc. When they built their tabs, they used a Drupal module called Quicktabs, by which each individual tab is created as a separate page and then pulled into the tabs from those pages. So, in essence, you now have instances of repeated content. 1) the page used to create the tab, and 2) the tab that displays on the product page. My question is, how should I handle the pages that were used to create the tabs? Should I make them NOINDEX? Thank you for your advice in advance.
Technical SEO | | aprilm-1890400