20 000 duplicates in Moz crawl due to Joomla URL parameters. How to fix?
-
We have a problem of massive duplicate content in Joomla. Here is an example of the "base" URL: http://www.binary-options.biz/index.php/Web-Pages/binary-options-platforms.html
For some reason Joomla creates many versions of this URL, for example:
or
So it lists the URL parameter ?q= and then repeats part of the beforegoing URL. This leads to tens of thousands duplicate pages in our content heavy site.
Any ideas how to fix this? Thanks so much!
-
These are caused by the links to your language pages. If you click one of the language links from within the source code (not on the page) it redirects to a URL with '?q=/index.php/Web-Pages/binary-options-platforms.html' added. Then if you click the same language link on that page it again redirects to another page with previous URL added to the end:
?q=/index.php/Web-Pages/binary-options-platforms.html?q=/index.php/Web-Pages/binary-options-platforms.html.e.g:
On the example page view source, search for German and click the link below:
This link 301 redirects too:
http://www.binary-options.biz/index.php/Web-Pages/binary-options-platforms.html?q=/index.php/Web-Pages/binary-options-platforms.html Then if you view source, search for German and click the link again:
This link 301 redirects too:
So basically every time a web crawler follows a language link, new URLs are being created with the previous URL added to the end, causing a never ending crawl as an infinite amount of new pages will always be created.
I don't think this is connected with the Joomla SEF as Chris pointed out, as your URLs are already SEF.
However it's not an easy thing to identify how to fix the issue with the language links. You should probably speak to the developer who implemented it and/or the creator of the plugin if it is a plugin.
Also do you even need this functionality? As none of the language links work, they just redirect back the main site.
-
Surely your URL structure is not fine.Can you please try this fix and update me?
http://docs.joomla.org/Enabling_Search_Engine_Friendly_(SEF)_URLs_on_Apache
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Strange Crawl Report
Hey Moz Squad, So I have kind of strange case. My website locksmithplusinc.com has been around for a couple years. I have had all sorts of pages and blogs that have maybe ranked for a certain location a longtime ago and got deleted so I could speed up the site and consolidate my efforts. I said that because I think that might be part of the problem. When I was crawl reporting my site just three weeks ago on moz I had over 23 crawl report issues. Duplicate pages, missing meta tags the regular stuff. But now all of a sudden when I crawl report on MOZ it comes up with Zero issues. So I did another crawl On google analytic and this is what came up. SO im very confused because none of these url's are even url's on my site. So maybe people are searching for this stuff and clicking on broken links that are still indexed and getting this 404 error? What do you guys think? Thank you guys so much for taking a shot at this one. siS44ug
Technical SEO | | Meier0 -
Duplicate Content Showing up on Moz Crawl | www. vs. no-www.
Hello Moz Community! I am new to SEO, Moz and this is my first question. My questions; I have a client that is getting flagged for Duplicate Content. He is getting flagged for having two domains that have the same content i.e. www.mysite.com & mysite.com. I read into this and set up a 301 redirect through my hosting site. I evaluated which site had a stronger Page Authority and had the weaker site redirect to the stronger site. However, I am still getting hit for Duplicate pages caused by the www.mysite.com & mysite.com being duplicates. How should I go about resolving this? Is this an example of a Canonical tag needed in the head of the HTML? Any direction is appreciated. Thank You. B/R Will H.
Technical SEO | | MarketingChimp100 -
Mobile URL parameter (Redirection to desktop)
Hello, We have a parallel mobile website and recently we implemented a link pointing to the desktop website. This redirect is happening via a javascript code and results in a url followed by this paramenter: ?m=off Example:
Technical SEO | | echo1
http://www.m.website.com redirects to:
http://www.website.com/?m=off Questions: Will the "http://www.website.com/?m=off" be considered duplicate content with "http://www.website.com" since they both return the same content? Is there any possibility that Google will take into consideration the url ending in "/?m=off"? How should we treat this new url? The webmaster tools URL parameter configuration at the moment isn't experiencing problems but should we submit the parameter anyway in order not to be indexed or should we wait first and see the error response? In case we should submit this for removal... what's the best way to do it? Like this? Parameter: ?m=off Does this parameter change page content seen by the user? - doesn't affect page content Any help is much appreciated.
Thank you!0 -
Duplicate pages in Google index despite canonical tag and URL Parameter in GWMT
Good morning Moz... This is a weird one. It seems to be a "bug" with Google, honest... We migrated our site www.three-clearance.co.uk to a Drupal platform over the new year. The old site used URL-based tracking for heat map purposes, so for instance www.three-clearance.co.uk/apple-phones.html ..could be reached via www.three-clearance.co.uk/apple-phones.html?ref=menu or www.three-clearance.co.uk/apple-phones.html?ref=sidebar and so on. GWMT was told of the ref parameter and the canonical meta tag used to indicate our preference. As expected we encountered no duplicate content issues and everything was good. This is the chain of events: Site migrated to new platform following best practice, as far as I can attest to. Only known issue was that the verification for both google analytics (meta tag) and GWMT (HTML file) didn't transfer as expected so between relaunch on the 22nd Dec and the fix on 2nd Jan we have no GA data, and presumably there was a period where GWMT became unverified. URL structure and URIs were maintained 100% (which may be a problem, now) Yesterday I discovered 200-ish 'duplicate meta titles' and 'duplicate meta descriptions' in GWMT. Uh oh, thought I. Expand the report out and the duplicates are in fact ?ref= versions of the same root URL. Double uh oh, thought I. Run, not walk, to google and do some Fu: http://is.gd/yJ3U24 (9 versions of the same page, in the index, the only variation being the ?ref= URI) Checked BING and it has indexed each root URL once, as it should. Situation now: Site no longer uses ?ref= parameter, although of course there still exists some external backlinks that use it. This was intentional and happened when we migrated. I 'reset' the URL parameter in GWMT yesterday, given that there's no "delete" option. The "URLs monitored" count went from 900 to 0, but today is at over 1,000 (another wtf moment) I also resubmitted the XML sitemap and fetched 5 'hub' pages as Google, including the homepage and HTML site-map page. The ?ref= URls in the index have the disadvantage of actually working, given that we transferred the URL structure and of course the webserver just ignores the nonsense arguments and serves the page. So I assume Google assumes the pages still exist, and won't drop them from the index but will instead apply a dupe content penalty. Or maybe call us a spam farm. Who knows. Options that occurred to me (other than maybe making our canonical tags bold or locating a Google bug submission form 😄 ) include A) robots.txt-ing .?ref=. but to me this says "you can't see these pages", not "these pages don't exist", so isn't correct B) Hand-removing the URLs from the index through a page removal request per indexed URL C) Apply 301 to each indexed URL (hello BING dirty sitemap penalty) D) Post on SEOMoz because I genuinely can't understand this. Even if the gap in verification caused GWMT to forget that we had set ?ref= as a URL parameter, the parameter was no longer in use because the verification only went missing when we relaunched the site without this tracking. Google is seemingly 100% ignoring our canonical tags as well as the GWMT URL setting - I have no idea why and can't think of the best way to correct the situation. Do you? 🙂 Edited To Add: As of this morning the "edit/reset" buttons have disappeared from GWMT URL Parameters page, along with the option to add a new one. There's no messages explaining why and of course the Google help page doesn't mention disappearing buttons (it doesn't even explain what 'reset' does, or why there's no 'remove' option).
Technical SEO | | Tinhat0 -
Language parameter
Hi there, I have a quick technical question regarding to on site language. Is it possible to use a parameter in order to define the language of the web. My web is www.vallnord.com and its default language is catalan. I would like the site to display in the language of the user, maybe the browser. Thanks, G.
Technical SEO | | SilbertAd0 -
Does Google pass link juice a page receives if the URL parameter specifies content and has the Crawl setting in Webmaster Tools set to NO?
The page in question receives a lot of quality traffic but is only relevant to a small percent of my users. I want to keep the link juice received from this page but I do not want it to appear in the SERPs.
Technical SEO | | surveygizmo0 -
Ignore url parameters without the 'parameter=' ?
We are working on an ecommerce site that sorts out the products by color and size but doesn't use the sortby= but uses sortby/. Can we tell Google to ignore the sortby/ parameter in Webmaster Tools even though it is not followed by an = sign? For example: www.mysite.com/shirts/tshirts/shopby/size-m www.mysite.com/shirts/tshirts/shopby/color-black Can we tell WMT to ignore the 'shopby/' parameter so that only the tshirts page will be indexed? Or does the shopby have to be set up as 'shopby=' ? Thanks!
Technical SEO | | Hakkasan0 -
Q Parameters
I'm having several site issues and I want to see if the Q parameter in the URL is the issue. Both of these index. Any capitalization combination brings up another indexed page: http://www.website.com/index.php?q=contact-us. and http://www.website.com/index.php?q=cOntact-us The other issue is Google crawl errors. The website has received increasingly more spam crawl errors. I've read that this is a common issue and most likely is a Google Bot problem. Would removing the q parameter fix this entirely? Here is an example: http://www.website/index.php?q=uk-cheap-chloe-bay-bag-wholesale-shoes
Technical SEO | | DanSpeicher0