Locating Duplicate Pages
-
Hi,
Our website consists of approximately 15,000 pages however according to our Google Webmaster Tools account Google has around 26,000 pages for us in their index.
I have run through half a dozen sitemap generators and they all only discover the 15,000 pages that we know about. I have also thoroughly gone through the site to attempt to find any sections where we might be inadvertently generating duplicate pages without success.
It has been over six months since we did any structural changes (at which point we did 301's to the new locations) and so I'd like to think that the majority of these old pages have been removed from the Google Index. Additionally, the number of pages in the index doesn't appear to be going down by any discernable factor week on week.
I'm certain it's nothing to worry about however for my own peace of mind I'd like to just confirm that the additional 11,000 pages are just old results that will eventually disappear from the index and that we're not generating any duplicate content.
Unfortunately there doesn't appear to be a way to download a list of the 26,000 pages that Google has indexed so that I can compare it against our sitemap. Obviously I know about site:domain.com however this only returned the first 1,000 results which all checkout fine.
I was wondering if anybody knew of any methods or tools that we could use to attempt to identify these 11,000 extra pages in the Google index so we can confirm that they're just old pages which haven’t fallen out of the index yet and that they’re not going to be causing us a problem?
Thanks guys!
-
It's cool. Sorry, the point I was making is that irrespective of what you search for the page that is returned is http://www.refreshcartridges.co.uk/advanced_search_result.php (with nothing after the .php) and as such the search results page couldn't spurn multiple pages which could be indexed by Google.
-
Hmm, I'm not too knowledgeable about php pages. Sorry!
-
Sorry, I'm not sure what happened to that bit.ly address - The actual address of the website is www.refreshcartridges.co.uk.
Ah, I see what you mean about the search results now however this hopefully shouldn't be an issue as for security (our web guy said something about injections) the URL that is returned irrespective of what is searched for is http://www.refreshcartridges.co.uk/advanced_search_result.php
Thanks again!
-
I can't get that link to work.
What I said before still applies with physical input (this is what I assumed when I said it).
For example, user inputs the words "snakes and dogs" and clicks search. The new URL is "www.yoursite.com/search?q=snakes and dogs" All these weird URL pages need noindex meta tags or Google will flag them as duplicate content because, for example, this page and the result for "dogs and snakes" generate almost the same page.
Does that make sense?
It is in Google's Webmaster Guidelines that you should noindex these pages. -
Many thanks for your input on this. I have actually looked at this through the HTML improvements section of GWMT however I am showing only a few dozen duplicated titles / descriptions and this is simply due to the product categories being almost identical (for example HP Deskjet 500 and HP Deskjet 500+)
-
Many thanks for your response. Our site is an eCommerce site that doesn't employ tags as such and our categories are all accounted for in the 15,000 page figure.
-
We did have this at the beginning of the year when we used a ?dispmode=grid and ?dispmode=list to change the way our results were displayed. This has been rectified however by us completely removing the option and any instances of dispmode present in the URL force a 301 to the correct master page. There are still a few hundred instances of this dispmode being present in the Google index but 99% of them have fallen out now.
I have checked and double checked and we don't seem to have any issues like this at present.
-
I'm not certain if this is the case as our search engine requires physical input in order to yield a result. I don't know if it helps but the URL is http://bit.ly/4Cogchww if you fancy taking a look
-
Thanks for your reply. Indeed our website does force www. if someone were to attempt to navigate to us without prefixing www.
-
Hi Chris,
Google Webmaster has a tool that helps identify duplicate HTMLs and maybe you can use that to see if the 11,000 pages are duplicate. IF they are, I am assuming they should have the duplicate Title Tag and etc. which the tool may discover.
-
Have you checked for instances where a page parameter is being seen as another version of the same page? One of the sites I work for had an issue a few months back where every instance of a product page was being flagged as duplicate content because of an oversight. We had one of our coders write a clause into the page where every time a page loaded with a parameter such as ?color=72 it would canonicalize it to the page minus the parameter. This decreased our duplicate content warnings quickly and effectively.
-
it could be that your tags and categories are considered individual pages and therefore creating their own permalink: ex: http:www.example.com/keyword, and http://www.example.com/tag/keyword and http://www.example.com/category/keyword. Another way would be to check the sitemaps you have in webmaster tools and compare those to each other. Just a suggestion.
-
Does your website force 'www.'?
Both yourdomain.com and www.yourdomain.com are separate sites and can have different pages spidered.
-
Be sure to try different combinations of 'site:www.domain.com' and 'site:domain.com'. They will all yield different results.
Sounds to me like you probably have an internal search engine that is generating search results pages based off the search term, and each different results page is a piece of duplicate content.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Will it upset Google if I aggregate product page reviews up into a product category page?
We have reviews on our product pages and we are considering averaging those reviews out and putting them on specific category pages in order for the average product ratings to be displayed in search results. Each averaged category review would be only for the products within it's category, and all reviews are from users of the site, no 3rd party reviews. For example, averaging the reviews from all of our boxes products pages, and listing that average review on the boxes category page. My question is, will this be doing anything wrong in the eyes of Google, and if so how so? -Derick
On-Page Optimization | | Deluxe0 -
Which is better? One dynamically optimised page, or lots of optimised pages?
For the purpose of simplicity, we have 5 main categories in the site - let's call them A, B, C, D, E. Each of these categories have sub-category pages e.g. A1, A2, A3. The main area of the site consists of these category and sub-category pages. But as each product comes in different woods, it's useful for customers to see all the product that come in a particular wood, e.g. walnut. So many years ago we created 'woods' pages. These pages replicate the categories & sub-categories but only show what is available in that particular wood. And of course - they're optimised much better for that wood. All well and good, until recently, these specialist page seem to have dropped through the floor in Google. Could be temporary, I don't know, and it's only a fortnight - but I'm worried. Now, because the site is dynamic, we could do things differently. We could still have landing pages for each wood, but of spinning off to their own optimised specific wood sub-category page, they could instead link to the primary sub-category page with a ?search filter in the URL. This way, the customer is still getting to see what they want. Which is better? One page per sub-category? Dynamically filtered by search. Or lots of specific sub-category pages? I guess at the heart of this question is? Does having lots of specific sub-category pages lead to a large overlap of duplicate content, and is it better keeping that authority juice on a single page? Even if the URL changes (with a query in the URL) to enable whatever filtering we need to do.
On-Page Optimization | | pulcinella2uk0 -
Duplication in landing page
This is driving me mad, I have a site that for some reason google and moz pick up the landing page as a duplicate. They see "mysite/" and "mysite/index.html" as two different pages and giving me warnings for duplication. I have no 301 included at this time and I am using foundation as the base. This is occurring both on a localhost test bed and live....... anyone got an idea how to correct.
On-Page Optimization | | AndyBirtles0 -
Sold Products appear as duplicate pages 'Page Not Found' ???
Hi there, I'm down to just 6 duplicate page warnings but I'm not sure how to deal with this one: Information Page Not Found! http://www.vintageheirloom.com/index.php?route=information/information&information_id=6 My Ecommerce shopping site products are unique, 1 of a kind. So once one product has sold and been delivered we take the product off our website, hence the Information Page Not Found! As I understand when search engines re-index these warnings will drop off but new sold products would replace them. So redirecting seems like hard work and never ending. Is it ok to ignore these warnings? Thanks Mozzers..
On-Page Optimization | | well-its-1-louder0 -
Duplicate page content & title for www.mydomain.com and www.mydomain.com/index.php?
Hi, First post so please be gentle! My Crawl Diagnostics Summary is showing an error relating to duplicate page content and duplicate page title for www.mydomain.com and www.mydomain.com/index.php which are, in my view, the same thing/page? Could anyone shed any light please? Thanks Carl
On-Page Optimization | | Carl2870 -
Is there a SEO penalty for multi links on same page going to same destination page?
Hi, Just a quick note. I hope you are able to assist. To cut a long story short, on the page below http://www.bookbluemountains.com.au/ -> Features Specials & Packages (middle column) we have 3 links per special going to the same page.
On-Page Optimization | | daveupton
1. Header is linked
2. Click on image link - currently with a no follow
3. 'More info' under the description paragraph is linked too - currently with a no follow Two arguments are as follows:
1. The reason we do not follow all 3 links is to reduce too many links which may appear spammy to Google. 2. Counter argument:
The point above has some validity, However, using no follow is basically telling the search engines that the webmaster “does not trust or doesn’t take responsibility” for what is behind the link, something you don’t want to do within your own website. There is no penalty as such for having too many links, the search engines will generally not worry after a certain number.. nothing that would concern this business though. I would suggest changing the no follow links a.s.a.p. Could you please advise thoughts. Many thanks Dave Upton [long signature removed by staff]0 -
Duplicate Page Title Elements
Hello Mozzers. My questions is below and I would like to thank everyone in advance for any feedback 😉 I own a dog supplies site (www.k9electronics.com). When I launched the site several years back I hired a guy for SEO and he optimized my home page for specific categories search terms such as "dog training collars", "dog shock collars:, ect instead of general search terms such as "dog supplies", "dog accessories", ect. I would like to start moving these home page title element terms (starting with "dog shock collars") over to the dog training collars category but have high rankings for this term on the home page. Current Home Page Title Element:
On-Page Optimization | | k9byron
dog training collars, dog shock collars, electric dog collar, dog supplies (recently added) Current Dog Training Collars Category Title Element:
dog training collars I was hoping to add "dog shock collars" to the dog training collars category page until I achieved higher ranking then delete if from the home page. ..or swap it out with "dog accessories". I am currently ranked #5 in Google for "dog shock collars" on the home page & dog training collars category page ...and I am a little concerned about changing these title elements. My question is; If I add 'dog shock collars" to the dog training collars category page title as well, how will it effect my ranking on both pages having this duplicate term in both page titles? Thank You,
Byron-0 -
Too many on-page links
I manualy counted the links on my website http://www.commensus.com which came to around 50, but SEO moz says I have over 100 and google isn't seeing them all.
On-Page Optimization | | jawl44630