How to fix google index filled with redundant parameters
-
Hi All
This follows on from a previous question (http://moz.com/community/q/how-to-fix-google-index-after-fixing-site-infected-with-malware) that on further investigation has become a much broader problem. I think this is an issue that may plague many sites following upgrades from CMS systems.
First a little history. A new customer wanted to improve their site ranking and SEO. We discovered the site was running an old version of Joomla and had been hacked. URL's such as http://domain.com/index.php?vc=427&Buy_Pinnacle_Studio_14_Ultimate redirected users to other sites and the site was ranking for buy adobe or buy microsoft. There was no notification in webmaster tools that the site had been hacked. So an upgrade to a later version of Joomla was required and we implemented SEF URLs at the same time. This fixed the hacking problem, we now had SEF url's, fixed a lot of duplicate content and added new titles and descriptions. Problem is that after a couple of months things aren't really improving. The site is still ranking for adobe and microsoft and a lot of other rubbish and the urls like http://domain.com/index.php?vc=427&Buy_Pinnacle_Studio_14_Ultimate are still sending visitors but to the home page as are a lot of the old redundant urls with parameters in them. I think it is default behavior for a lot of CMS systems to ignore parameters it doesn't recognise so http://domain.com/index.php?vc=427&Buy_Pinnacle_Studio_14_Ultimate displays the home page and gives a 200 response code.
My theory is that Google isn't removing these pages from the index because it's getting a 200 response code from old url's and possibly penalizing the site for duplicate content (which don't showing up in moz because there aren't any links on the site to these url's) The index in webmaster tools is showing over 1000 url's indexed when there are only around 300 actual url's. It also shows thousands of url's for each parameter type most of which aren't used.
So my question is how to fix this, I don't think 404's or similar are the answer because there are so many and trying to find each combination of parameter would be impossible. Webmaster tools advises not to make changes to parameters but even so I don't think resetting or editing them individually is going to remove them and only change how google indexes them (if anyone knows different please let me know)
Appreciate any assistance and also any comments or discussion on this matter.
Regards, Ian
-
Thanks again Alan.
I've checked the site with screaming frog and it doesn't return any url's with parameters so at this stage I might be ok. I am getting a message in webmaster tools saying "severe health issues" but it doesn't appear to be affecting the urls I want to keep. I'll likely remove the entry once things have cleared up some more.
Thanks Jeff
At the moment I'm stuck with Zeus web server (insert expletives here) so no htaccess file or I'd be in a better position. After messing around with it and very limited documentation I can only get the site operating with index.php in the url but with SEF url's for the remainder of it. I'm investigating migration to an apache server so that might make it easier.
Regards
Ian
-
the ability to remove the index.php is built into the stock joomla .htaccess file.
In the joomla backend, global config / site tab/ seo settings > enable "Use URL rewriting".
-
I can see it fixed your problem, but its a ugly fix, you mean need to use parameters in the future, you may already be using them but unaware.
-
OK Might have a solution that would at least work for my situation.
Since implementing SEF URL's on the site I have no real need for any URL's with parameters. By adding the following to robots.txt it should prevent any indexing of old pages or pages with parameters.
Disallow: /index.php?*
Tested it in webmaster tools with some of the offending URL's and it seems to work. I'll wait until the next indexing and post back or mark it as answered.
-
Thanks for your input Alan
There lies my problem. The URL's don't exist but give a 200 response.
http://domain.com/index.php?vc=427&Buy_Pinnacle_Studio_14_Ultimate is the same as
http://domain.com/index.php which is the same as
http://domain.com/?type_anything_here_and it still gives a 200 response. Joomla seems to just ignore parameters from non existing pages after the ?. I found a lot of people are having similar problems here http://forum.joomla.org/viewtopic.php?f=618&t=699954.
Once in googles index I can't see a way of getting rid of thousands or redundant entries. I have the added problem of the site being hosted on a Zeus Web Server which isn't as well documented as apache.
I'm currently looking into wild cards in robots.txt. It will be a slow process to get rid of them all but might finally help me clean up the index.
Ian
-
If the site is returning 200's then that is where the problem lies, you need to find out why.
I can see any other fix, removing the urls is only a temp fix, you must make them return 404's
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Indexed pages
Just started a site audit and trying to determine the number of pages on a client site and whether there are more pages being indexed than actually exist. I've used four tools and got four very different answers... Google Search Console: 237 indexed pages Google search using site command: 468 results MOZ site crawl: 1013 unique URLs Screaming Frog: 183 page titles, 187 URIs (note this is a free licence, but should cut off at 500) Can anyone shed any light on why they differ so much? And where lies the truth?
Technical SEO | | muzzmoz1 -
Did anyone else noticed Google index bug?
Noticed page indexation drop in Search Console for most of my sites. Guys from Search Engine Land seem to know about that: http://selnd.com/1YqiOoQ Did anyone else noticed something weird?
Technical SEO | | solvid1 -
Should I be concerned about Google indexing an old domain if the listings redirect to the new domain?
I noticed this about Moz's old domain SEOMoz.org. If the URLs from the old domain are redirecting, is there any reason to be concerned about an old domain still appearing to be indexed by Google? See here: https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=site%3Aseomoz.org Links to seomoz.org are listed, but if you click them they redirect to moz.com. Is this anything to be concerned about or is everything operating as expected?
Technical SEO | | 352inc0 -
Google Indexing - what did I missed??
Hello, all SEOers~ I just renewed my web site about 3 weeks ago, and in order to preserve SEO values as much as possible, I did 301 redirect, XML Sitemap and so on for minimize the possible data losses. But the problem is that about week later from site renewal, my team some how made mistake and removed all 301 redirects. So now my old site URLs are all gone from Google Indexing and my new site is not getting any index from Google. My traffic and rankings are also gone....OMG I checked Google Webmaster Tool, but it didn't say any special message other than Google bot founds increase of 404 error which is obvious. Also I used "fetch as google bot" from webmaster tool to increase chance to index but it seems like not working much. I am re-doing 301 redirect within today, but I am not sure it means anything anymore. Any advise or opinion?? Thanks in advance~!
Technical SEO | | Yunhee.Choi0 -
Does Google Parse The Anchor Text while Indexing
Hey moz fanz, I'm here to ask a bit technical and open-minding question.
Technical SEO | | atakala
In the Google's paper http://infolab.stanford.edu/~backrub/google.html
They say they parse the page into hits which is basically word occurences.
But I want to know that they also do the same thing while keeping the anchor text database.
I mean do they parse the anchor text or keep it as it is .
For example, let's say my anchor text is "real car games".
When they indexing my link with anchor text, do they parse my anchor text as hits like
"real" distinct hits
"car" distinct hits
"games" distinct hits.
OR do they just use it as it is. As "real car games"0 -
IP addresses indexed?
I've met with a potential client who has a site with 1,000's of very specific part #'s which don't show in the SERP's on Google. They definitely have the issue of dynamic URL's - but the URL for the part # searches is an IP address rather than their domain name - example: 188.888.888.888/partssearch.php?pnum='1233445' I've not seen the IP address used like this for an external website - is this acceptable for SEO purposes? Thanks, Mark
Technical SEO | | DenverKelly0 -
Google Has Indexed Most of My Site, why won't Bing?
We've got 600K+ pages indexed by Google and have submitted our same sitemap.xml's to Bing, but have only seen 100-200 pages get indexed by Bing. Is this fairly typical? Is there anything further we can do to increase indexation on Bing?
Technical SEO | | jamesti0 -
Google Penalty?
Hi, I have recently been asked to help www.mycanvas.ie I have a feeling they have a google penalty. All their Google Keywords have literally dropped out of the Google SERP but they are still shown on Yahoo SERP. I recently did a site:www.mycanvas.ie and the pages are still in google index. The only thing that comes to mind is that the site owner submitted to 380 web directories over a period of 2 months with http://www.directorymaximizer.com/ do you think this could be causing the problem with google? Advise and suggestions are welcomed, thank you.
Technical SEO | | Socialdude0