Does Google see this as duplicate content?
-
I'm working on a site that has too many pages in Google's index as shown in a simple count via a site search (example):
site:http://www.mozquestionexample.com
I ended up getting a full list of these pages and it shows pages that have been supposedly excluded from the index via GWT url parameters and/or canonicalization
For instance, the list of indexed pages shows:
1. http://www.mozquestionexample.com/cool-stuff
2. http://www.mozquestionexample.com/cool-stuff?page=2
3. http://www.mozquestionexample.com?page=3
4. http://www.mozquestionexample.com?mq_source=q-and-a
5. http://www.mozquestionexample.com?type=productss&sort=1date
Example #1 above is the one true page for search and the one that all the canonicals reference.
Examples #2 and #3 shouldn't be in the index because the canonical points to url #1.
Example #4 shouldn't be in the index, because it's just a source code that, again doesn't change the page and the canonical points to #1.
Example #5 shouldn't be in the index because it's excluded in parameters as not affecting page content and the canonical is in place.
Should I worry about these multiple urls for the same page and if so, what should I do about it?
Thanks... Darcy
-
Darcy,
Blocking URLs in the robots.txt file will not remove them from the index if Google has already found them, nor will it prevent them from being added if Google finds links to them, such as internal navigation links or external backlinks. If this is your issue, you'll probably see something like this in the SERPs for those pages:
"We cannot display the content because our crawlers are being blocked by this site's robots.txt file" or something like that.
Here's a good discussion about it on WMW.
If you have parameters set up in GWT and are using a rel canonical tag that points Google to the non-parameter version of the URL you probably don't need to block Googlebot. I would only block them if I thought crawlbudget was an issue, as in seeing Google to continue to crawl these pages within your log files, or when you potentially have millions of these types of pages.
-
Hi Ray,
Thanks for the response. To answer your question, the URL parameters have been set for months, if not years.
I wouldn't know how to set a noindex on a url with a different source code, because it really isn't a whole new url, just different tracking. I'd be setting a noindex for the example 1 page and that would not be good.
So, should I just not worry about it then?
Thanks... Darcy
-
Hi 94501,
Example #1 above is the one true page for search and the one that all the canonicals reference.
If the pages are properly canonicalized then Example #1 will receive nearly all of the authority stemming from pages with this URL as the canonical tag.
I.e. Example #2 and #3 will pass authority to Example #1
Examples #2 and #3 shouldn't be in the index because the canonical points to url #1.
Setting a canonical tag doesn't guarantee that a page will not be indexed. To do that, you'd need to add a 'noindex' tag to the page.
Google chooses whether or not to index these pages and in many situations you want them indexed. For example: User searches for 'product X' and product x resides on the 3rd page of your category. Since Google has this page indexed (although the canonical points to the main page) it makes sense to show the page that contains the product the user was searching for.
Example #4 shouldn't be in the index, because it's just a source code that, again doesn't change the page and the canonical points to #1.
To make sure it is not indexed, you would need to add a 'noidex' tag and/or make sure the parameters are set in GWMT to ignore these pages.
But again, if the canonical is set properly then the authority passes to the main page and having this page indexed may not have negative impact.
Example #5 shouldn't be in the index because it's excluded in parameters as not affecting page content and the canonical is in place.
How long ago was the parameter setting applied in GWMT? Sometimes it takes a couple weeks to deindex pages that were already indexed by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301 redirect to avoid duplicate content penalty
I have two websites with identical content. Haya and ethnic Both websites have similar products. I would like to get rid of ethniccode I have already started to de-index ethniccode. My question is, Will I get any SEO benefit or Will it be harmful if I 301 direct the below only URL’s https://www.ethniccode/salwar-kameez -> https://www.hayacreations/collections/salwar-kameez https://www.ethniccode/salwar-kameez/anarkali-suits - > https://www.hayacreations/collections/anarkali-suits
Intermediate & Advanced SEO | | riyaaaz0 -
Internal Duplicate Content - Classifieds (Panda)
I've been wondering for a while now, how Google treats internal duplicate content within classified sites. It's quite a big issue, with customers creating their ads twice.. I'd guess to avoid the price of renewing, or perhaps to put themselves back to the top of the results. Out of 10,000 pages crawled and tested, 250 (2.5%) were duplicate adverts. Similarly, in terms of the search results pages, where the site structure allows the same advert(s) to appear under several unique URLs. A prime example would be in this example. Notice, on this page we have already filtered down to 1 result, but the left hand side filters all return that same 1 advert. Using tools like Siteliner and Moz Analytics just highlights these as urgent high priority issues, but I've always been sceptical. On a large scale, would this count as Panda food in your opinion, or does Google understand the nature of classifieds is different, and treat it as such? Appreciate thoughts. Thanks.
Intermediate & Advanced SEO | | Sayers1 -
Search console, duplicate content and Moz
Hi, Working on a site that has duplicate content in the following manner: http://domain.com/content
Intermediate & Advanced SEO | | paulneuteboom
http://www.domain.com/content Question: would telling search console to treat one of them as the primary site also stop Moz from seeing this as duplicate content? Thanks in advance, Best, Paul. http0 -
Magento products and eBay - duplicate content risk?
Hi, We are selling about 1000 sticker products in our online store and would like to expand a large part of our products lineup to eBay as well. There are pretty good modules for this as I've heard. I'm just wondering if there will be duplicate content problems if I sync the products between Magento and eBay and they get uploaded to eBay with identical titles, descriptions and images? What's the workaround in this case? Thanks!
Intermediate & Advanced SEO | | speedbird12290 -
Duplicate page content query
Hi forum, For some reason I have recently received a large increase in my Duplicate Page Content issues. Currently it says I have over 7,000 duplicate page content errors! For example it says: Sample URLs with this Duplicate Page Content http://dikelli.com.au/accessories/gowns/news.html http://dikelli.com.au/accessories/news.html
Intermediate & Advanced SEO | | sterls
http://dikelli.com.au/gallery/dikelli/gowns/gowns/sale_gowns.html However there are no physical links to any of these page on my site and even when I look at my FTP files (I am using Dreamweaver) these directories and files do not exist. Can anyone please tell me why the SEOMOZ crawl is coming up with these errors and how to solve them?0 -
Duplicate Content on Blog
I have a blog I'm setting up. I would like to have a mini-about block set up on every page that gives very brief information about me and my blog, as well as a few links to the rest of the site and some social sharing options. I worry that this will get flagged as duplicate content because a significant amount of my pages will contain the same information at the top of the page, front and center. Is there anything I can do to address this? Is it as much of a concern as I am making it? Should I work on finding some javascript/ajax method for loading that content into the page dynamically only for normal browser pageviews? Any thoughts or help would be great.
Intermediate & Advanced SEO | | grayloon0 -
Removing Duplicate Content Issues in an Ecommerce Store
Hi All OK i have an ecommerce store and there is a load of duplicate content which is pretty much the norm with ecommerce store setups e.g. this is my problem http://www.mystoreexample.com/product1.html
Intermediate & Advanced SEO | | ChriSEOcouk
http://www.mystoreexample.com/brandname/product1.html
http://www.mystoreexample.com/appliancetype/product1.html
http://www.mystoreexample.com/brandname/appliancetype/product1.html
http://www.mystoreexample.com/appliancetype/brandname/product1.html so all the above lead to the same product
I also want to keep the breadcrumb path to the product Here's my plan Add a canonical URL to the product page
e.g. http://www.mystoreexample.com/product1.html
This way i have a short product URL Noindex all duplicate pages but do follow the internal links so the pages are spidered What are the other options available and recommended? Does that make sense?
Is this what most people are doing to remove duplicate content pages? thanks 🙂0 -
Pop Up Pages Being Indexed, Seen As Duplicate Content
I offer users the opportunity to email and embed images from my website. (See this page http://www.andertoons.com/cartoon/6246/ and look under the large image for "Email to a Friend" and "Get Embed HTML" links.) But I'm seeing the ensuing pop-up pages (Ex: http://www.andertoons.com/embed/5231/?KeepThis=true&TB_iframe=true&height=370&width=700&modal=true and http://www.andertoons.com/email/6246/?KeepThis=true&TB_iframe=true&height=432&width=700&modal=true) showing up in Google. Even worse, I think they're seen as duplicate content. How should I deal with this?
Intermediate & Advanced SEO | | andertoons0