Advice needed on how to handle alleged duplicate content and titles
-
Hi
I wonder if anyone can advise on something that's got me scratching my head.
The following are examples of urls which are deemed to have duplicate content and title tags. This causes around 8000 errors, which (for the most part) are valid urls because they provide different views on market data. e.g. #1 is the summary, while #2 is 'Holdings and Sector weightings'.
#3 is odd because it's crawling the anchored link. I didn't think hashes were crawled?
I'd like some advice on how best to handle these, because, really they're just queries against a master url and I'd like to remove the noise around duplicate errors so that I can focus on some other true duplicate url issues we have.
Here's some example urls on the same page which are deemed as duplicates.
1) http://markets.ft.com/Research/Markets/Tearsheets/Summary?s=IVPM:LSE
-
http://markets.ft.com/Research/Markets/Tearsheets/Holdings-and-sectors-weighting?s=IVPM:LSE
-
http://markets.ft.com/Research/Markets/Tearsheets/Summary?s=IVPM:LSE&widgets=1
What's the best way to handle this?
-
-
I would defiantly not tell Google to ignore parameters since you have pages ranking high with URL parameters in them.
Be careful if you do implement a canonical, because you could end up removing a few good ranking pages since the URL parameter pages are the ones currently ranking best.
Personally i would just ignore these errors since Google has done a pretty good job choosing the best page already.
You could block Rogerbot from crawling parameter pages.
-
Thanks. This is the only solution I can think of too but the information on each of the tabs is actually different, so technically it is a unique page.
That said the likelihood of someone searching for such a specific subset of that data associated with one company or fund is arguably extremely low, which is why i wasn't sure whether to apply a canonical or not, just to reduce the noise.
I suppose another approach is to tell Google to ignore parameter 's' which forms part of the query which loads one of the subsets of data?
Slightly wary of doing that
-
Hi,
The best way to fix this would be to implement the canonical tag, this would stop Google/Rogerbot thinking those pages are duplicated and focus on the URL you specified.
Check this post from Google explaining all about it.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Kyle
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Http vs. https - duplicate content
Hi I have recently come across a new issue on our site, where https & http titles are showing as duplicate. I read https://moz.com/community/q/duplicate-content-and-http-and-https however, am wondering as https is now a ranking factor, blocked this can't be a good thing? We aren't in a position to roll out https everywhere, so what would be the best thing to do next? I thought about implementing canonicals? Thank you
Intermediate & Advanced SEO | | BeckyKey0 -
Self referencing canonicals and paginated content - advice needed
Hi, I help manage a large site that uses a lot of params for tracking, testing and to help deal with paginated content e.g. abc.com/productreview?page=2. The paginated review content correctly uses rel next and rel prev tags to ensure we get the value of all of the paginated review content that we have. The volume of param exclusions I need to maintain in Google & Bing Webmaster tools is getting clunky and frustrating. I would like to use self referencing canonicals, which would make life a lot easier. Here's my issue: If I use canonicals on the review pages the paginated content urls would also use the same canonical e.g. /productreview?page=2 pointing to /productreview I believe I am going to lose the value of those reviews, even though they use the rel next rel prev tags. BTW airbnb do this - do they know something I don't, don't care about the paginated reviews, or are they doing it incorrectly, see http://d.pr/i/14mPU Is my assertion above correct about losing the value of the paginated reviews if I use self referencing canonicals? Any thoughts on a solution to clearing up the param problem or do I have to live with it? Thanks in advance, Andy
Intermediate & Advanced SEO | | AndyMacLean0 -
Duplicate content on URL trailing slash
Hello, Some time ago, we accidentally made changes to our site which modified the way urls in links are generated. At once, trailing slashes were added to many urls (only in links). Links that used to send to
Intermediate & Advanced SEO | | yacpro13
example.com/webpage.html Were now linking to
example.com/webpage.html/ Urls in the xml sitemap remained unchanged (no trailing slash). We started noticing duplicate content (because our site renders the same page with or without the trailing shash). We corrected the problematic php url function so that now, all links on the site link to a url without trailing slash. However, Google had time to index these pages. Is implementing 301 redirects required in this case?1 -
Are all duplicate content issues bad? (Blog article Tags)
If so how bad? We use tags on our blog and this causes duplicate content issues. We don't use wordpress but with such a highly used cms having the same issue it seems quite plausible that Google would be smart enough to deal with duplicate content issues caused by blog article tags and not penalise at all. Here it has been discussed and I'm ready to remove tags from our blog articles or monitor them closely to see how it effects our rankings. Before I do, can you give me some advice around this? Thanks,
Intermediate & Advanced SEO | | Daniel_B
Daniel.0 -
Does rel=canonical fix duplicate page titles?
I implemented rel=canonical on our pages which helped a lot, but my latest Moz crawl is still showing lots of duplicate page titles (2,000+). There are other ways to get to this page (depending on what feature you clicked, it will have a different URL) but will have the same page title. Does having rel=canonical in place fix the duplicate page title problem, or do I need to change something else? I was under the impression that the canonical tag would address this by telling the crawler which URL was the URL and the crawler would only use that one for the page title.
Intermediate & Advanced SEO | | askotzko0 -
Load balancing - duplicate content?
Our site switches between www1 and www2 depending on the server load, so (the way I understand it at least) we have two versions of the site. My question is whether the search engines will consider this as duplicate content, and if so, what sort of impact can this have on our SEO efforts? I don't think we've been penalised, (we're still ranking) but our rankings probably aren't as strong as they should be. The SERPs show a mixture of www1 and www2 content when I do a branded search. Also, when I try to use any SEO tools that involve a site crawl I usually encounter problems. Any help is much appreciated!
Intermediate & Advanced SEO | | ChrisHillfd0 -
Subdomains - duplicate content - robots.txt
Our corporate site provides MLS data to users, with the end goal of generating leads. Each registered lead is assigned to an agent, essentially in a round robin fashion. However we also give each agent a domain of their choosing that points to our corporate website. The domain can be whatever they want, but upon loading it is immediately directed to a subdomain. For example, www.agentsmith.com would be redirected to agentsmith.corporatedomain.com. Finally, any leads generated from agentsmith.easystreetrealty-indy.com are always assigned to Agent Smith instead of the agent pool (by parsing the current host name). In order to avoid being penalized for duplicate content, any page that is viewed on one of the agent subdomains always has a canonical link pointing to the corporate host name (www.corporatedomain.com). The only content difference between our corporate site and an agent subdomain is the phone number and contact email address where applicable. Two questions: Can/should we use robots.txt or robot meta tags to tell crawlers to ignore these subdomains, but obviously not the corporate domain? If question 1 is yes, would it be better for SEO to do that, or leave it how it is?
Intermediate & Advanced SEO | | EasyStreet0 -
Nuanced duplicate content problem.
Hi guys, I am working on a recently rebuilt website, which has some duplicate content issues that are more nuanced than usual. I have a plan of action (which I will describe further), so please let me know if it's a valid plan or if I am missing something. Situation: The client is targeting two types of users: business leads (Type A) and potential employees (Type B), so for each of their 22 locations, they have 2 pages - one speaking to Type A and another to Type B. Type A location page contains a description of the location. In terms of importance, Type A location pages are secondary because to the Type A user, locations are not of primary importance. Type B location page contains the same description of the location plus additional lifestyle description. These pages carry more importance, since they are attempting to attract applicants to work in specific places. So I am planning to rank these pages eventually for a combination of Location Name + Keyword. Plan: New content is not an option at this point, so I am planning to set up canonical tags on both location Types and make Type B, the canonical URL, since it carries more importance and more SEO potential. The main nuance is that while Type A and Type B location pages contain some of the same content (about 75%-80%), they are not exactly the same. That is why I am not 100% sure that I should canonicalize them, but still most of the wording on the page is identical, so... Any professional opinion would be greatly appreciated. Thanks!
Intermediate & Advanced SEO | | naymark.biz0