Handling Similar page content on directory site
-
Hi All,
SEOMOZ is telling me I have a lot of duplicate content on my site. The pages are not duplicate, but very similar, because the site is a directory website with a page for cities in multiple states in the US.
I do not want these pages being indexed and was wanting to know the best way to go about this.
I was thinking I could do a rel ="nofollow" on all the links to those pages, but not sure if that is the correct way to do this.
Since the folders are deep within the site and not under one main folder, it would mean I would have to do a disallow for many folders if I did this through Robots.txt.
The other thing I am thinking of is doing a meta noindex, follow, but I would have to get my programmer to add a meta tag just for this section of the site.
Any thoughts on the best way to achieve this so I can eliminate these dup pages from my SEO report and from the search engine index?
Thanks!
-
Thanks Kane!
Meta-robots it is!
I will apply it and see how I go with it.
Cheers
-
The best solution is to use on those pages.
I believe that using robots.txt will still allow the URLs to be shown as URLs in search results, so that is less ideal. Not certain if that's still the case, but it used to be that way.
I personally would not nofollow links to that page, because if you use "noindex, follow" it will in turn pass value to other indexed pages, and nofollowing links to a noindex page isn't supposed to increase pagerank to other links on the page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Impact of Removing 60,000 Page from Sites
We currently have a database of content across about 100 sites. All of this content is exactly the same on all of them, and it is also found all over the internet in other places. So it's not unique at all and it brings in almost no organic traffic. I want to remove this bloat from our sites. Problem is that this database accounts for almost 60,000 pages on each site and it is all currently indexed. I'm a little bit worried that flat out dumping all of this data at once is going to cause Google to wonder what in the world we are doing and we are going to see some issues from it (at least in the short run). My thought now is to remove this content in stages so it doesn't all get dropped at once. But would deindexing all of this content first be better? That way Google would still be able to crawl it and understand that it is not relevant user content and therefore minimize impact when we do terminate it completely? Any other ideas for minimizing SEO issues?
Intermediate & Advanced SEO | | MJTrevens1 -
Google crawling 200 page site thousands of times/day. Why?
Hello all, I'm looking at something a bit wonky for one of the websites I manage. It's similar enough to other websites I manage (built on a template) that I'm surprised to see this issue occurring. The xml sitemap submitted shows Google there are 229 pages on the site. Starting in the beginning of December Google really ramped up their intensity in crawling the site. At its high point Google crawled 13,359 pages in a single day. I mentioned I manage other similar sites - this is a very unusual spike. There are no resources like infinite scroll that auto generates content and would cause Google some grief. So follow up questions to my "why?" is "how is this affecting my SEO efforts?" and "what do I do about it?". I've never encountered this before, but I think limiting my crawl budget would be treating the symptom instead of finding the cure. Any advice is appreciated. Thanks! *edited for grammar.
Intermediate & Advanced SEO | | brettmandoes0 -
Concerns of Duplicative Content on Purchased Site
Recently I purchased a site of 50+ DA (oldsite.com) that had been offline/404 for 9-12 months from the previous owner. The purchase included the domain and the content previously hosted on the domain. The backlink profile is 100% contextual and pristine. Upon purchasing the domain, I did the following: Rehosted the old site and content that had been down for 9-12 months on oldsite.com Allowed a week or two for indexation on oldsite.com Hosted the old content on my newsite.com and then performed 100+ contextual 301 redirects from the oldsite.com to newsite.com using direct and wild card htaccess rules Issued a Press Release declaring the acquisition of oldsite.com for newsite.com Performed a site "Change of Name" in Google from oldsite.com to newsite.com Performed a site "Site Move" in Bing/Yahoo from oldsite.com to newsite.com It's been close to a month and while organic traffic is growing gradually, it's not what I would expect from a domain with 700+ referring contextual domains. My current concern is around original attribution of content on oldsite.com shifting to scraper sites during the year or so that it was offline. For Example: Oldsite.com has full attribution prior to going offline Scraper sites scan site and repost content elsewhere (effort unsuccessful at time because google know original attribution) Oldsite.com goes offline Scraper sites continue hosting content Google loses consumer facing cache from oldsite.com (and potentially loses original attribution of content) Google reassigns original attribution to a scraper site Oldsite.com is hosted again and Google no longer remembers it's original attribution and thinks content is stolen Google then silently punished Oldsite.com and Newsite.com (which it is redirected to) QUESTIONS Does this sequence have any merit? Does Google keep track of original attribution after the content ceases to exist in Google's search cache? Are there any tools or ways to tell if you're being punished for content being posted else on the web even if you originally had attribution? Unrelated: Are there any other steps that are recommend for a Change of site as described above.
Intermediate & Advanced SEO | | PetSite0 -
My site has a loft of leftover content that's irrelevant to the main business -- what should I do with it?
Hi Moz! I'm working on a site that has thousands of pages of content that are not relevant to the business anymore since it took a different direction. Some of these pages still get a lot of traffic. What should I do with them? 404? Keep them? Redirect? Are these pages hurting rankings for the target terms? Thanks for reading!
Intermediate & Advanced SEO | | DA20130 -
Is it ok to add snippet of information taken from other sites on product pages?
Hello here, I own an e-commerce website that sells digital sheet music, and I would like to enrich my product pages with short references to artists/composers related to the product, taken from external websites such as mentions, fresh news, information taken from related videos, cross-references, etc. In other words, I'd like to provide our users with a different kind of informational content that our competitors are currently not offering. We could also think of this like "providing some sort of aggregate content on product pages to enrich the user's experience by providing more information about the product". What do you think are the risks or the benefits of such an approach? And if there are any risks, how to avoid/tackle them? Any thoughts are very welcome! Thank you in advance to anyone.
Intermediate & Advanced SEO | | fablau0 -
Does duplicate content penalize the whole site or just the pages affected?
I am trying to assess the impact of duplicate content on our e-commerce site and I need to know if the duplicate content is affecting only the pages that contain the dupe content or does it affect the whole site? In Google that is. But of course. Lol
Intermediate & Advanced SEO | | bjs20100 -
Advice needed on how to handle alleged duplicate content and titles
Hi I wonder if anyone can advise on something that's got me scratching my head. The following are examples of urls which are deemed to have duplicate content and title tags. This causes around 8000 errors, which (for the most part) are valid urls because they provide different views on market data. e.g. #1 is the summary, while #2 is 'Holdings and Sector weightings'. #3 is odd because it's crawling the anchored link. I didn't think hashes were crawled? I'd like some advice on how best to handle these, because, really they're just queries against a master url and I'd like to remove the noise around duplicate errors so that I can focus on some other true duplicate url issues we have. Here's some example urls on the same page which are deemed as duplicates. 1) http://markets.ft.com/Research/Markets/Tearsheets/Summary?s=IVPM:LSE http://markets.ft.com/Research/Markets/Tearsheets/Holdings-and-sectors-weighting?s=IVPM:LSE http://markets.ft.com/Research/Markets/Tearsheets/Summary?s=IVPM:LSE&widgets=1 What's the best way to handle this?
Intermediate & Advanced SEO | | SearchPM0 -
How do Google Site Search pages rank
We have started using Google Site Search (via an XML feed from Google) to power our search engines. So we have a whole load of pages we could link to of the format /search?q=keyword, and we are considering doing away with our more traditional category listing pages (e.g. /biology - not powered by GSS) which account for much of our current natural search landing pages. My question is would the GoogleBot treat these search pages any differently? My fear is it would somehow see them as duplicate search results and downgrade their links. However, since we are coding the XML from GSS into our own HTML format, it may not even be able to tell.
Intermediate & Advanced SEO | | EdwardUpton610