How to fix google index filled with redundant parameters
-
Hi All
This follows on from a previous question (http://moz.com/community/q/how-to-fix-google-index-after-fixing-site-infected-with-malware) that on further investigation has become a much broader problem. I think this is an issue that may plague many sites following upgrades from CMS systems.
First a little history. A new customer wanted to improve their site ranking and SEO. We discovered the site was running an old version of Joomla and had been hacked. URL's such as http://domain.com/index.php?vc=427&Buy_Pinnacle_Studio_14_Ultimate redirected users to other sites and the site was ranking for buy adobe or buy microsoft. There was no notification in webmaster tools that the site had been hacked. So an upgrade to a later version of Joomla was required and we implemented SEF URLs at the same time. This fixed the hacking problem, we now had SEF url's, fixed a lot of duplicate content and added new titles and descriptions. Problem is that after a couple of months things aren't really improving. The site is still ranking for adobe and microsoft and a lot of other rubbish and the urls like http://domain.com/index.php?vc=427&Buy_Pinnacle_Studio_14_Ultimate are still sending visitors but to the home page as are a lot of the old redundant urls with parameters in them. I think it is default behavior for a lot of CMS systems to ignore parameters it doesn't recognise so http://domain.com/index.php?vc=427&Buy_Pinnacle_Studio_14_Ultimate displays the home page and gives a 200 response code.
My theory is that Google isn't removing these pages from the index because it's getting a 200 response code from old url's and possibly penalizing the site for duplicate content (which don't showing up in moz because there aren't any links on the site to these url's) The index in webmaster tools is showing over 1000 url's indexed when there are only around 300 actual url's. It also shows thousands of url's for each parameter type most of which aren't used.
So my question is how to fix this, I don't think 404's or similar are the answer because there are so many and trying to find each combination of parameter would be impossible. Webmaster tools advises not to make changes to parameters but even so I don't think resetting or editing them individually is going to remove them and only change how google indexes them (if anyone knows different please let me know)
Appreciate any assistance and also any comments or discussion on this matter.
Regards, Ian
-
Thanks again Alan.
I've checked the site with screaming frog and it doesn't return any url's with parameters so at this stage I might be ok. I am getting a message in webmaster tools saying "severe health issues" but it doesn't appear to be affecting the urls I want to keep. I'll likely remove the entry once things have cleared up some more.
Thanks Jeff
At the moment I'm stuck with Zeus web server (insert expletives here) so no htaccess file or I'd be in a better position. After messing around with it and very limited documentation I can only get the site operating with index.php in the url but with SEF url's for the remainder of it. I'm investigating migration to an apache server so that might make it easier.
Regards
Ian
-
the ability to remove the index.php is built into the stock joomla .htaccess file.
In the joomla backend, global config / site tab/ seo settings > enable "Use URL rewriting".
-
I can see it fixed your problem, but its a ugly fix, you mean need to use parameters in the future, you may already be using them but unaware.
-
OK Might have a solution that would at least work for my situation.
Since implementing SEF URL's on the site I have no real need for any URL's with parameters. By adding the following to robots.txt it should prevent any indexing of old pages or pages with parameters.
Disallow: /index.php?*
Tested it in webmaster tools with some of the offending URL's and it seems to work. I'll wait until the next indexing and post back or mark it as answered.
-
Thanks for your input Alan
There lies my problem. The URL's don't exist but give a 200 response.
http://domain.com/index.php?vc=427&Buy_Pinnacle_Studio_14_Ultimate is the same as
http://domain.com/index.php which is the same as
http://domain.com/?type_anything_here_and it still gives a 200 response. Joomla seems to just ignore parameters from non existing pages after the ?. I found a lot of people are having similar problems here http://forum.joomla.org/viewtopic.php?f=618&t=699954.
Once in googles index I can't see a way of getting rid of thousands or redundant entries. I have the added problem of the site being hosted on a Zeus Web Server which isn't as well documented as apache.
I'm currently looking into wild cards in robots.txt. It will be a slow process to get rid of them all but might finally help me clean up the index.
Ian
-
If the site is returning 200's then that is where the problem lies, you need to find out why.
I can see any other fix, removing the urls is only a temp fix, you must make them return 404's
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Follow no-index
I have a question about the right way to not index pages: With a canonical or follow no-index. First we have a blog page: **Blogpage **
Technical SEO | | Happy-SEO
URL: /blog/
index follow Page 2 blog:
URL: /blog?=p2
index follow
rel="prev" /blog/
el="next" ?=p3 Nothing strange here i guess. But we also have other pages with chance on duplicate content: /SEO-category/
/SEO-category/view-more/ Because i don't want the "view-more" items to be indexed i want to set it on: follow no-index (follow to reach pages). But now the "view-more" also have pagination. What is the best way? Option 1:
/SEO-category/view-more/
Follow no-index /SEO-category/view-more?=p2
Follow no-index
rel="prev" /view-more/
el="next" ?=p3 Option 2: /SEO-category/view-more/
Canonical: /SEO-category/ /SEO-category/view-more?=p2
rel="prev" /view-more/
el="next" ?=p3 Option 3: Other suggests? Thanks!0 -
Does Index Status Goes Down If Website is Penalized by Google?
Hi Friends, Few of my friends told me that index status of a website goes down goes if it is penalized by Google. I can see my organic traffic has went down drastically after the Panda update on July 17th 2015 but index status still remains the same. So, I am bit confused. Any advice on this.
Technical SEO | | Prabhu.Sundar0 -
Getting Google to index a large PDF file
Hello! We have a 100+ MB PDF with multiple pages that we want Google to fully index on our server/website. First of all, is it even possible for Google to index a PDF file of this size? It's been up on our server for a few days, and my colleague did a Googlebot fetch via Webmaster Tools, but it still hasn't happened yet. My theories as to why this may not work: A) We have no actual link(s) to the pdf anywhere on our website. B) This PDF is approx 130 MB and very slow to load. I added some compression to it, but that only got it down to 105 MB. Any tips or suggestions on getting this thing indexed in Google would be appreciated. Thanks!
Technical SEO | | BBEXNinja0 -
Google index graph duration in Google Webmaster Tools
Hello guys, I wonder, my sites are currently being indexed every 7 days, exactly. At Index Status page in GWT. However, this new site gets updated almost everyday, how can I ask google to index faster and more frequently/almost daily? Is it about SItemap.xml frequency ? I changed it today to Daily. Thanks!
Technical SEO | | mdmoz0 -
Sitemaps for Google
In Google Webmaster Central, if a URL is reported in your site map as 404 (Not found), I'm assuming Google will automatically clean it up and that the next time we generate a sitemap, it won't include the 404 URL. Is this true? Do we need to comb through our sitemap files and remove the 404 pages Google finds, our will it "automagically" be cleaned up by Google's next crawl of our site?
Technical SEO | | Prospector-Plastics0 -
Pages not Indexed after a successful Google Fetch
I am trying to understand why google isn't indexing key content on my site. www.BeyondTransition.com is indexed and new pages show up in a couple of hours. My key content is 6 pages of information for each of 3000 events (driven by mySQL on a wordpress platform). These pages are reached via a search page, but no direct navigation from the home page. When I link to an event page from an indexed page it doesn't show up in search results. When I use fetch on webmaster tools the fetch is successful but is then not indexed - or if it does appear in results it's directed to the internal search page e.g. http://www.beyondtransition.com/site/races/course/race110003/ has been fetched and submitted with links but when I search for BeyondTransition Ironman Cozumel I get these results.... So what have I done wrong and how do I go about fixing it? All thoughts and advice appreciated Thanks Denis
Technical SEO | | beyondtransition0 -
Google Sandboxing
I have a new site with a new domain that ranked well the 1st week or so after it was indexed then it totally dropped off the SERP. My question is, does Google Sandboxing affect new sites on new domains that don't have any incoming links? The site dropped off before I began link building - from what I've read unnatural link build is often the cause. Can you still be sandboxed without any link building? If this is the case, are there things I can do to get out of the sandbox? Thanks folks, Jason
Technical SEO | | OptioPublishing0 -
How can I get a listing of just the URLs that are indexed in Google
I know I can use the site: query to see all the pages I have indexed in Google, but I need a listing of just the URLs. We are doing a site re-platform and I want to make sure every URL in Google has a 301. Is there an easy way to just see the URLs that Google has indexed for a domain?
Technical SEO | | EvergladesDirect0