How does the crawl find duplicate pages that don't exist on the site?
-
It looks like I have a lot of duplicate pages which are essentially the same url with some extra ? parameters added eg:
http://www.merlin.org.uk/10-facts-about-malnutrition
http://www.merlin.org.uk/10-facts-about-malnutrition?page=1
http://www.merlin.org.uk/10-facts-about-malnutrition?page=2
These extra 2 pages (and there's loads of pages this happens to) are a mystery to me. Not sure why they exist as there's only 1 page.
Is this a massive issue? It's built on Drupal so I wonder if it auto generates these pages for some reason?
Any help MUCH appreciated. Thanks
-
Thanks Ben - much appreciated!
-
This is being caused by your "Related Post" plugin/module. To correct this problem simply add rel="nofollow" to the links in that module.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawling/indexing of near duplicate product pages
Hi, Hope someone can help me out here. This is the current situation: We sell stones/gravel/sand/pebbles etc. for gardens. I will take a type of pebbles and the corresponding pages/URL's to illustrate my question --> black beach pebbles. We have a 'top' product page for black beach pebbles on which you can find different types of quantities (differing from 20kg untill 1600 kg). There is not any search volume related to the different quantities The 'top' page does not link to the pages for the different quantities The content on the pages for the different quantities is not exactly the same (different price + slightly different content). But a lot of the content is the same. Current situation:
Intermediate & Advanced SEO | | AMAGARD
- Most pages for the different quantities do not have internal links (about 95%) But the sitemap does contain all of these pages. Because the sitemap contains all these URL's, google frequently crawls them (I checked the logfiles) and has indexed them. Problems: Google spends its time crawling irrelevant pages --> our entire website is not that big, so these quantity URL's kind of double the total number of URL's. Having url's in the sitemap that do not have an internal link is a problem on its own All these pages are indexed so all sorts of gravel/pebbles have near duplicates. My solution: remove these URL's from the sitemap --> that will probably stop Google from regularly crawling these pages Putting a canonical on the quantity pages pointing to the top-product page. --> that will hopefully remove the irrelevant (no search volume) near duplicates from the index My questions: To be able to see the canonical, google will need to crawl these pages. Will google still do that after removing them from the sitemap? Do you agree that these pages are near duplicates and that it is best to remove them from the index? A few of these quantity pages do have intenral links (a few procent of them) because of a sale campaign. So there will be some (not much) internal links pointing to non-canonical pages. Would that be a problem? Thanks a lot in advance for your help! Best!1 -
Site's pages has GA codes based on Tag Manager but in Screaming Frog, it is not recognized
Using Tag Assistant (Google Chrome add-on), we have found that the site's pages has GA codes. (also see screenshot 1) However, when we used Screaming Frog's filter feature -- Configuration > Custom > Search > Contain/Does Not Contain, (see screenshot 2) SF is displaying several URLs (maybe all) of the site under 'Does Not Contain' which means that in SF's crawl, the site's pages has no GA code. (see screenshot 3) What could be the problem why SF states that there is no GA code in the site's pages when in fact, there are codes based on Tag Assistant/Manager? Please give us steps/ways on how to fix this issue. Thanks! SgTovPf VQNOJMF RCtBibP
Intermediate & Advanced SEO | | jayoliverwright0 -
Do search engines crawl links on 404 pages?
I'm currently in the process of redesigning my site's 404 page. I know there's all sorts of best practices from UX standpoint but what about search engines? Since these pages are roadblocks in the crawl process, I was wondering if there's a way to help the search engine continue its crawl. Does putting links to "recent posts" or something along those lines allow the bot to continue on its way or does the crawl stop at that point because the 404 HTTP status code is thrown in the header response?
Intermediate & Advanced SEO | | brad-causes0 -
Does Google make continued attempts to crawl an old page one it has followed a 301 to the new page?
I am curious about this for a couple of reasons. We have all dealt with a site who switched platforms and didn't plan properly and now have 1,000's of crawl errors. Many of the developers I have talked to have stated very clearly that the HTacccess file should not be used for 1,000's of singe redirects. I figured If I only needed them in their temporarily it wouldn't be an issue. I am curious if once Google follows a 301 from an old page to a new page, will they stop crawling the old page?
Intermediate & Advanced SEO | | RossFruin0 -
URL Parameters Duplicate Page Title
Thanks in advance, I'm getting duplicate page titles because seomoz keeps crawling through my url parameters. I added forcefiltersupdate to the URL parameters in webmaster tools but it has not seemed to have an effect. Below is an example of the duplicate content issue that I am having. http://qlineshop.com/OC/index.php?route=product/category&path=59_62&forcefiltersupdate=true&checkedfilters[]=a.13.13.387baf0199e7c9cc944fae94e96448fa Any thoughts? Thanks again. -Patrick
Intermediate & Advanced SEO | | bamron0 -
XML question - not finding all of the pages
When I run http://www.xml-sitemaps.com/ on my site, it doesn't find all of my pages. The pages do not have any no follows in them (I thought that was the original problem). Has this happened to anyone else? What is the solution?
Intermediate & Advanced SEO | | digitalops0 -
Multiple URL's exist for the same page, canonicaliazation issue?
All of the following URL's take me to the same page on my site: 1. www.mysite.com/category1/subcategory.aspx 2. www.mysite.com/subcategory.aspx 3. www.mysite.com/category1/category1/category1/subcategory.aspx All of those pages are canonicalized to #1, so is that okay? I was told the following my a company trying to make our sitemap: "the site's platform dynamically creates URLs that resolve as 200 and should be 404. This is a huge spider trap for any search engine and will make them wary of crawling the site." What would I need to do to fix this? Thanks!
Intermediate & Advanced SEO | | pbhatt0 -
How do you transition a keyword rank from a home page to a sub-page on the site?
We're currently ranking #1 for a valuable keyword, but the result on the SERP is our home page. We're creating a new product page focused on this keyword to provide a better user experience and create more relevant content. What is the best way to make a smooth transition to make the product page rank #1 for the keyword instead of the home page?
Intermediate & Advanced SEO | | buildasign0