Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to fix Google index after fixing site infected with malware.
-
Hi All
Upgraded a Joomla site for a customer a couple of months ago that was infected with malware (it wasn't flagged as infected by google). Site is fine now but still noticing search queries for "cheap adobe" etc with links to http://domain.com/index.php?vc=201&Cheap_Adobe_Acrobat_xi in web master tools (about 50 in total). These url's redirect back to home page and seem to be remaining in the index (I think Joomla is doing this automatically)
Firstly, what sort of effect would these be having on on their rankings? Would they be seen by google as duplicate content for the homepage (moz doesn't report them as such as there are no internal links).
Secondly what's my best plan of attack to fix them. Should I setup 404's for them and then submit them to google? Will resubmitting the site to the index fix things?
Would appreciate any advice or suggestions on the ramifications of this and how I should fix it.
Regards, Ian
-
Thanks Tom
That's a good point. Part of my problem lies in the number of URL's with parameters (thousands). Applying status codes of any type isn't really viable.
Starting to see the url's clean up with the addition of the entries in robot.txt.
Regards
Ian
-
I would make them return a 410 not 404
410's are dead links if you use a 404 google will keep coming back to see if you fixed the 404
sending google to a 410 lets them know it's gone
http://moz.com/learn/seo/http-status-codes
all the best,
tom
-
OK Might have a solution that would at least work for my situation.
Since implementing SEF URL's on the site I have no real need for any URL's with parameters. By adding the following to robots.txt it should prevent any indexing of old pages or pages with parameters.
Disallow: /index.php?*
Tested it in webmaster tools with some of the offending URL's and it seems to work. I'll wait until the next indexing and post back or mark it as answered.
-
Thanks all for you help
A little more information and maybe a little more advice required.
Since fixing the malware http://domain.com/index.php?vc=201&Cheap_Adobe_Acrobat_xi and similar are actually no longer pages. Joomla actually sees anything after ? as a parameter and just ignores it because it no longer matches a page and hence the reason it just defaults to the home page http://domain.com/index.php. This is Joomla and probably most other content management systems default behavior. The problem here lies in the fact that google indexed that page when it was infected and it remains in the index because to google it sees a status code of 200 when re-indexing this page.
The problem is now a bit broader and has more ramifications than first thought. Any pages from the previous system that used parameters would receive a 200 status code and remain in the index. Checking url parameters in web master tools confirms this with various paramaters showing thousands of url's monitored. Keep in mind google is showing a message that there are no problems with parameters for this site.
So the advice I need now is related to url parameters in Web Master tools. The new site uses SEF URLS and so makes much less use of paramaters. How can I ensure that the old redundant pages with parameters are dropped from the index. This would involve thousands of 301's or 404's let alone trying to work them all out. There is a reset link for each parameter in webmaster tools but not much documentation as to what it does. If I reset all the parameters would that clean up the index?
I'd be interested in what others think about this issue because I feel that this might be a common problem with cms based platforms and after major changes, thousands of paramater based url's just defaulting to home and other pages probably affects the site and page ranking.
Ian
-
The search engines are retaining the indexing of the links because following them through the redirect returns a 200 server header - which to the SEs means all is well and there is a page there to index. As you note in other responses - the only way to change that is to force the server to return a 404 header as a signal to the SEs to eventually drop it.
Yes, you could use a robots.txt directive to block those specific URLs that are the target of the spam links, in order to satisfy the URL Removal Tool's requirement for allowing a removal request. That should work as a quicker solution than trying to make coding changes in Joomla (sorry, it's been about 3.5 yrs since I've done any Joomla work).
Good luck!
Paul
[EDIT: Gah...ignore the P.S. as I didn't notice you don't have an easy way to get redirects into the Zeus server before Joomla kicks in. Sorry]
P.S. A final quick option would be to write a redirect in htaccess to 301-redirect the fake URLs to a real 404 page. This would kick in before Joomla got a chance to interfere with its pseudo-redirect.
-
You're right, I guess I was focused on the index. Moz isn't showing any external links to these pages and neither is webmaster tools. My feeling is that google is retaining them for some reason, maybe just the keywords in the url?
-
I've checked the source of the visits and they are only coming form google searches for "cheap adobe" and the like. The original malware used the site to get these searches into the index and then direct them to other sites/pages.
Being a Zeus server it doesn't use htaccess, my task would be a lot simple if it did. It has an alternative rewrite file but documentation is scarce on using it for 404's.
I'll keep researching.
-
That means no body clicks on them, but how did google find them? This is not evidence there is no links, just that no one has visited your site thought them
-
Thanks Paul
I've checked analytics and the only source of these url's is google organic searches, not external sites. I think unfortunately my problem is the dynamic nature of Joomla and a combination of a number of factors that are causing it to do this in an SEO unfriendly way.
I think my biggest challenge is getting the URL's to 404 before I submit them to the web master removal tool (which my research tells me needs to be done before you submit). I think I read there might be a robots.txt option so I'll look into that.
Ian
-
These pages may have links from other spam sites, you don't want them to return a 200.
You want them to 404, in joomla you can make the site use htaccess or not, make sure it dose and 404 the pages there. -
Thanks Alan
This seems to be done by the combination of Joomla/Zeus and the redirection manager. No longer infected, the only visits are from organic searches from google and it's been a couple of months. Whatever the reason Joomla feels it shouldn't 404 these pages and just displays (not 301 redirects them) to the home page.
My feeling is that these URL's in the index and the visits from them probably aren't doing the site any good.
-
Thanks Dave
I think this might be a good option but I have a couple of problem with trying to achieve this. It's a joomla cms running on a zeus server with a Search Engine Friendly URL plugin running. I think that is possibly the worst combination of technologies for SEO in history. The combination of url rewrites in zeus and the redirection manager in Zeus just display the home page with the dodgey URL and give it a 200 status code. I think this is why google is taking so long to drop it from the index.
Ian
-
You absolutely do NOT want to redirect these links to the home page, Ian! These are spam links, coming from completely unrelated sites. They are Google's very definition of unnatural links and 301-redirecting them to your home page also redirects their potential damage to your home page.
You want them to return 404 status as quickly as possible. I'd also be tempted to use the Webmaster Tools remove tool to try to speed up the process, especially if these junk links currently form a large percentage of your overall link profile. (You'll need to find & remove the redirect that currently re-points them to the home page too, for the 404 header to do it's job of telling the search engines to drop the page from their indexes.)
As far as rankings issues, this isn't a potential dupe content issue, it's a damaging unnatural links issue, which is even more significant. These are the kinds of links that could lead to at least algorithmic penalty, or worst case, manual penalty. Either way, these penalties are vastly harder to fix after the fact than to avoid them in the first place.
In addition to the steps above designed to make it clear those links don't belong to your site, I'd keep a good record of the links, their originating domains, and when & how they were originally created due to the malware attack and your fix. That way you have essential documentation should you receive a penalty and need to submit a reinclusion request.
Hope that answers your questions?
Paul
-
why are they redirecting back to home page? do you redirect them or are you still infected?
I would make sure they 404
-
The easiest way would be a permanent re-direct on the offending URLs.
Check the incoming variable i.e. vc and permanently re-direct if it's an offending using 301.Google when seeing the 301 will drop the URL from the index.
There is a URL removal tool in Google Web Master Tools if the URL contains any personal information.
I had a similar issue a few days ago, the index is already starting to clear up, from a corrupt XML site map.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Not Indexing Pages (Wordpress)
Hello, recently I started noticing that google is not indexing our new pages or our new blog posts. We are simply getting a "Discovered - Currently Not Indexed" message on all new pages. When I click "Request Indexing" is takes a few days, but eventually it does get indexed and is on Google. This is very strange, as our website has been around since the late 90's and the quality of the new content is neither duplicate nor "low quality". We started noticing this happening around February. We also do not have many pages - maybe 500 maximum? I have looked at all the obvious answers (allowing for indexing, etc.), but just can't seem to pinpoint a reason why. Has anyone had this happen recently? It is getting very annoying having to manually go in and request indexing for every page and makes me think there may be some underlying issues with the website that should be fixed.
Technical SEO | | Hasanovic1 -
My WP website got attack by malware & now my website site:www.example.ca shows about 43000 indexed page in google.
Hi All My wordpress website got attack by malware last week. It affected my index page in google badly. my typical site:example.ca shows about 130 indexed pages on google. Now it shows about 43000 indexed pages. I had my server company tech support scan my site and clean the malware yesterday. But it still shows the same number of indexed page on google. Does anybody had ever experience such situation and how did you fixed it. Looking for help. Thanks FILE HIT LIST:
Technical SEO | | Chophel
{YARA}Spam_PHP_WPVCD_ContentInjection : /home/example/public_html/wp-includes/wp-tmp.php
{YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-includes/wp-vcd.php
{YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-content/themes/oceanwp.zip
{YARA}webshell_webshell_cnseay02_1 : /home/example2/public_html/content.php
{YARA}eval_post : /home/example2/public_html/wp-includes/63292236.php
{YARA}webshell_webshell_cnseay02_1 : /home/example3/public_html/content.php
{YARA}eval_post : /home/example4/public_html/wp-admin/28855846.php
{HEX}php.generic.malware.442 : /home/example5/public_html/wp-22.php
{HEX}php.generic.cav7.421 : /home/example5/public_html/SEUN.php
{HEX}php.generic.malware.442 : /home/example5/public_html/Webhook.php0 -
Google is still indexing the old domain a year after 301 redirects are put in place
Hi there, You might have experienced this before but for me this is the first. A client of mine moved from domain A (www.domainA.com) to domain B (www.domainB.com). 301 redirects are all in place for over a year. But the old domain is still showing in Google when you search for "site:domainA.com" The HTTP Header check shows this result for the URL https://www.domainA.com/company/cookie-policy.aspx HTTP/1.1 301 Moved Permanently =>
Technical SEO | | iQi
Cache-Control => private
Content-Length => 174
Content-Type => text/html; charset=utf-8
Location => https://www.domain_B_.com/legal/cookie-policy
Server => Microsoft-IIS/10.0
X-AspNetMvc-Version => 5.2
X-AspNet-Version => 4.0.30319
X-Powered-By => ASP.NET
Date => Fri, 15 Mar 2019 12:01:33 GMT
Connection => close Does the redirect look wrong? The change of address request was made on Google Console when the website was moved over a year ago. Edit: Checked the domainA.com on bing and it seems that its not indexed, and replaced with domainB.com, which is the right. Just Google is indexing the old domain! Please let me know your thoughts on why this is happening. Best,0 -
Desktop & Mobile XML Sitemap Submitted But Only Desktop Sitemap Indexed On Google Search Console
Hi! The Problem We have submitted to GSC a sitemap index. Within that index there are 4 XML Sitemaps. Including one for the desktop site and one for the mobile site. The desktop sitemap has 3300 URLs, of which Google has indexed (according to GSC) 3,000 (approx). The mobile sitemap has 1,000 URLs of which Google has indexed 74 of them. The pages are crawlable, the site structure is logical. And performing a Landing Page URL search (showing only Google/Organic source/medium) on Google Analytics I can see that hundreds of those mobile URLs are being landed on. A search on mobile for a longtail keyword from a (randomly selected) page shows a result in the SERPs for the mobile page that judging by GSC has not been indexed. Could this be because we have recently added rel=alternate tags on our desktop pages (and of course corresponding canonical ones on mobile). Would Google then 'not index' rel=alternate page versions? Thanks for any input on this one. PmHmG
Technical SEO | | AlisonMills0 -
How To Cleanup the Google Index After a Website Has Been HACKED
We have a client whose website was hacked, and some troll created thousands of viagra pages, which were all indexed by Google. See the screenshot for an example. The site has been cleaned up completely, but I wanted to know if anyone can weigh in on how we can cleanup the Google index. Are there extra steps we should take? So far we have gone into webmaster tools and submitted a new site map. ^802D799E5372F02797BE19290D8987F3E248DCA6656F8D9BF6^pimgpsh_fullsize_distr.png
Technical SEO | | yoursearchteam0 -
How to Stop Google from Indexing Old Pages
We moved from a .php site to a java site on April 10th. It's almost 2 months later and Google continues to crawl old pages that no longer exist (225,430 Not Found Errors to be exact). These pages no longer exist on the site and there are no internal or external links pointing to these pages. Google has crawled the site since the go live, but continues to try and crawl these pages. What are my next steps?
Technical SEO | | rhoadesjohn0 -
How to stop my webmail pages not to be indexed on Google ??
when i did a search in google for Site:mywebsite.com , for a list of pages indexed. Surprisingly the following come up " Webmail - Login " Although this is associated with the domain , this is a completely different server , this the rackspace email server browser interface I am sure that there is nothing on the website that links or points to this.
Technical SEO | | UIPL
So why is Google indexing it ? & how do I get it out of there. I tried in webmaster tool but I could not , as it seems like a sub-domain. Any ideas ? Thanks Naresh Sadasivan0 -
Dynamically-generated .PDF files, instead of normal pages, indexed by and ranking in Google
Hi, I come across a tough problem. I am working on an online-store website which contains the functionlaity of viewing products details in .PDF format (by the way, the website is built on Joomla CMS), now when I search my site's name in Google, the SERP simply displays my .PDF files in the first couple positions (shown in normal .PDF files format: [PDF]...)and I cannot find the normal pages there on SERP #1 unless I search the full site domain in Google. I really don't want this! Would you please tell me how to figure the problem out and solve it. I can actually remove the corresponding component (Virtuemart) that are in charge of generating the .PDF files. Now I am trying to redirect all the .PDF pages ranking in Google to a 404 page and remove the functionality, I plan to regenerate a sitemap of my site and submit it to Google, will it be working for me? I really appreciate that if you could help solve this problem. Thanks very much. Sincerely SEOmoz Pro Member
Technical SEO | | fugu0