Lots of incorrect urls indexed - Googlebot found an extremely high number of URLs on your site
-
Hi,
Any assistance would be greatly appreciated.
Basically, our rankings and traffic etc have been dropping massively recently google sent us a message stating " Googlebot found an extremely high number of URLs on your site".
This first highligted us to the problem that for some reason our eCommerce site has recently generated loads (potentially thousands) of rubbish urls hencing giving us duplication everywhere which google is obviously penalizing us with in the terms of rankings dropping etc etc.
Our developer is trying to find the route cause of this but my concern is, How do we get rid of all these bogus urls ?. If we use GWT to remove urls it's going to take years.
We have just amended our Robot txt file to exclude them going forward but they have already been indexed so I need to know do we put a redirect 301 on them and also a HTTP Code 404 to tell google they don't exist ? Do we also put a No Index on the pages or what .
what is the best solution .?
A couple of example of our problems are here :
In Google type -
site:bestathire.co.uk inurl:"br"
You will see 107 results. This is one of many lot we need to get rid of.
Also -
site:bestathire.co.uk intitle:"All items from this hire company"
Shows 25,300 indexed pages we need to get rid of
Another thing to help tidy this mess up going forward is to improve on our pagination work. Our Site uses Rel=Next and Rel=Prev but no concanical.
As a belt and braces approach, should we also put concanical tags on our category pages whereby there are more than 1 page. I was thinking of doing it on the Page 1 of our most important pages or the View all or both ?. Whats' the general consenus ?
Any advice on both points greatly appreciated?
thanks
Sarah.
-
Ahhh, I see what you mean now. Yes, good idea .
Will get that implement to.
Yes, everything is duplicated.It's all the same apart from the url which seems to be bringing in to different locations instead of one.
Odd url Generated(notice it has 2 locations in it)
http://www.bestathire.co.uk/rent/Vacuum_cleaners/Walsall/250/Alfreton
Correct location specific urls -
http://www.bestathire.co.uk/rent/Vacuum_cleaners/Walsall/250
http://www.bestathire.co.uk/rent/Vacuum_cleaners/Alfreton/250
thanks
Sarah.
-
Since (I assume this is what is happening) your ecommerce platform is duplicating the entire page, code and all, and putting it at these new URLs, having the canonical tag of the original page URL in the code for the right/real page will mean that, when it gets duplicated, the canonical tag will get duplicated as well and point back to the original URL. Make sense?
Can you talk to your ecommerce platform provider? This can't be an intended feature!
-
Thanks Ruth for the very comprehensive answer. Greatly Appreciated !.
Just to clarify your suggestion about the Rel=Canonical tag. Put it on the preferred pages . When the duplicate odd urls get generated, they Wont have a canonical tag so google will know there are not the original page ?.. Is that correct.
Sorry I just got a bit confused as you said the duplicate pages will have a concanical tag as well ?
As for the existing pages, they are very recent so wouldn't assume they would have any pr to warrent a 301 as opposed to a 404 but guess either would be ok.
Also adding the Meta name no index tag as you suggested to sounds very wise so will get that done to.
We also can't find how these urls were created and then indexed so just hoping a debug file we just created may shed some light.
Will keep you posted....
Many thanks
Sarah
-
Oh how frustrating!
There are a couple of things that you can do. Updating your robots.txt is a good start since the next time your site is crawled, Google should find that and drop at least some of the offending pages from the index. I would also go in to every page of your site and add in a rel=canonical tag to the original version of the URL. That way, even if your ecommerce platform is generating odd versions of the URL, that canonical tag will be on the duplicate versions letting engines know they're not the original page.
For the existing pages, you could just 301 them all back to the original versions, or add the canonical tag pointing back to the original versions. I would also add the tag to these pages to let Google know not to include them in the index.
With pagination and canonicalization there are a few different approaches, and each has its pros and cons. Dr. Pete wrote a really great post on canonicalization that just went out, you can read it here: http://www.seomoz.org/blog/which-page-is-canonical. I also recommend reading Adam Audette's post on pagination options at Search Engine Land: http://searchengineland.com/the-latest-greatest-on-seo-pagination-114284. I hope that helps!
-
As long as you think the sitemap is done right it should be fine.
-
Yes we submitted mini site maps to webmaster originally a couple of months back as our site is 60K pages so we broke is down to categories it etc.
We have not submitted a new map since finding this problem.
We are in the process of using the sitemap generator to generator new site map to see if it picks up anything usual.
Are u suggesting to resubmit ?
thanks
Sarah
-
In the short term I would definitely use canonicals to let Google know which are the right pages until you can fix your problem. Also, have you submitted a sitemap to Webmasters?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google has discovered a URL but won't index it?
Hey all, have a really strange situation I've never encountered before. I launched a new website about 2 months ago. It took an awfully long time to get index, probably 3 weeks. When it did, only the homepage was indexed. I completed the site, all it's pages, made and submitted a sitemap...all about a month ago. The coverage report shows that Google has discovered the URL's but not indexed them. Weirdly, 3 of the pages ARE indexed, but the rest are not. So I have 42 URL's in the coverage report listed as "Excluded" and 39 say "Discovered- currently not indexed." When I inspect any of these URL's, it says "this page is not in the index, but not because of an error." They are listed as crawled - currently not indexed or discovered - currently not indexed. But 3 of them are, and I updated those pages, and now those changes are reflected in Google's index. I have no idea how those 3 made it in while others didn't, or why the crawler came back and indexed the changes but continues to leave the others out. Has anyone seen this before and know what to do?
Intermediate & Advanced SEO | | DanDeceuster0 -
Incorrect Spelling Indexed In Meta Info - Can't Change It
Hi,It would be great if a member of the community could help me to resolve this issue.Google is indexing an incorrect spelling on of our key pages and we can't identify the reason why.- The page in question: https://newbridgesilverware.com/jewelleryAs you can see from the attached image, the Meta Title is rendered to contain the keyword "jewelry" (the American spelling.) We want this to read as "jewellery" - the British-English spelling. Yet in the page source the word is given in the meta title as "jewellery". Nowhere in the page source or on the page itself does the American spelling appear - yet Google still renders it in the Meta Title.Can anyone identify why this is happening and offer any possible solutions?Much appreciatedDhqJp
Intermediate & Advanced SEO | | Johnny_AppleSeed1 -
Need help in de-indexing URL parameters in my website.
Hi, Need some help.
Intermediate & Advanced SEO | | ImranZafar
So this is my website _https://www.memeraki.com/ _
If you hover over any of the products, there's a quick view option..that opens up a popup window of that product
That popup is triggered by this URL. _https://www.memeraki.com/products/never-alone?view=quick _
In the URL you can see the parameters "view=quick" which is infact responsible for the pop-up. The problem is that the google and even your Moz crawler is picking up this URL as a separate webpage, hence, resulting in crawl issues, like missing tags.
I've already used the webmaster tools to block the "view" parameter URLs in my website from indexing but it's not fixing the issue
Can someone please provide some insights as to how I can fix this?0 -
Client wants to remove mobile URLs from their sitemap to avoid indexing issues. However this will require SEVERAL billing hours. Is having both mobile/desktop URLs in a sitemap really that detrimental to search indexing?
We had an enterprise client ask to remove mobile URLs from their sitemaps. For their website both desktop & mobile URLs are combined into one sitemap. Their website has a mobile template (not a responsive website) and is configured properly via Google's "separate URL" guidelines. Our client is referencing a statement made from John Mueller that having both mobile & desktop sitemaps can be problematic for indexing. Here is the article https://www.seroundtable.com/google-mobile-sitemaps-20137.html
Intermediate & Advanced SEO | | RosemaryB
We would be happy to remove the mobile URLs from their sitemap. However this will unfortunately take several billing hours for our development team to implement and QA. This will end up costing our client a great deal of money when the task is completed. Is it worth it to remove the mobile URLs from their main website to be in adherence to John Mueller's advice? We don't believe these extra mobile URLs are harming their search indexing. However we can't find any sources to explain otherwise. Any advice would be appreciated. Thx.0 -
How to make Google index your site? (Blocked with robots.txt for a long time)
The problem is the for the long time we had a website m.imones.lt but it was blocked with robots.txt.
Intermediate & Advanced SEO | | FCRMediaLietuva
But after a long time we want Google to index it. We unblocked it 1 week or 8 days ago. But Google still does not recognize it. I type site:m.imones.lt and it says it is still blocked with robots.txt What should be the process to make Google crawl this mobile version faster? Thanks!0 -
Removing UpperCase URLs from Indexing
This search - site:www.qjamba.com/online-savings/automotix gives me this result from Google: Automotix online coupons and shopping - Qjamba
Intermediate & Advanced SEO | | friendoffood
https://www.qjamba.com/online-savings/automotix
Online Coupons and Shopping Savings for Automotix. Coupon codes for online discounts on Vehicles & Parts products. and Google tells me there is another one, which is 'very simliar'. When I click to see it I get: Automotix online coupons and shopping - Qjamba
https://www.qjamba.com/online-savings/Automotix
Online Coupons and Shopping Savings for Automotix. Coupon codes for online discounts on Vehicles & Parts products. This is because I recently changed my program to redirect all urls with uppercase in them to lower case, as it appears that all lowercase is strongly recommended. I assume that having 2 indexed urls for the same content dilutes link juice. Can I safely remove all of my UpperCase indexed pages from Google without it affecting the indexing of the lower case urls? And if, so what is the best way -- there are thousands.0 -
Proper 301 in Place but Old Site Still Indexed In Google
So i have stumbled across an interesting issue with a new SEO client. They just recently launched a new website and implemented a proper 301 redirect strategy at the page level for the new website domain. What is interesting is that the new website is now indexed in Google BUT the old website domain is also still indexed in Google? I even checked the Google Cached date and it shows the new website with a cache date of today. The redirect strategy has been in place for about 30 days. Any thoughts or suggestions on how to get the old domain un-indexed in Google and get all authority passed to the new website?
Intermediate & Advanced SEO | | kchandler0 -
Can a XML sitemap index point to other sitemaps indexes?
We have a massive site that is having some issue being fully crawled due to some of our site architecture and linking. Is it possible to have a XML sitemap index point to other sitemap indexes rather than standalone XML sitemaps? Has anyone done this successfully? Based upon the description here: http://sitemaps.org/protocol.php#index it seems like it should be possible. Thanks in advance for your help!
Intermediate & Advanced SEO | | CareerBliss0