URL Injection Hack - What to do with spammy URLs that keep appearing in Google's index?
-
A website was hacked (URL injection) but the malicious code has been cleaned up and removed from all pages. However, whenever we run a site:domain.com in Google, we keep finding more spammy URLs from the hack. They all lead to a 404 error page since the hack was cleaned up in the code. We have been using the Google WMT Remove URLs tool to have these spammy URLs removed from Google's index but new URLs keep appearing every day. We looked at the cache dates on these URLs and they are vary in dates but none are recent and most are from a month ago when the initial hack occurred.
My question is...should we continue to check the index every day and keep submitting these URLs to be removed manually? Or since they all lead to a 404 page will Google eventually remove these spammy URLs from the index automatically?
Thanks in advance Moz community for your feedback.
-
If the urls follow any particular pattern then you can use a htaccess redirect and return the header code 410 / 403 / 404 to Google. (I suggest 410) They will soon drop out of the index.
I don't know the exact .htaccess syntax off the top of my head but it will be something like this:
If they all come from the same folder then it would look something like this:
RedirectMatch 410 ^/folder/.*$If they have a common character string after the forward slash (such as xyz) then it would look something like this:
RedirectMatch 410 ^/xyz.*$If they have any common character string footprints at all (such as xyz) then it would look something like this (now I'm guessing):
RedirectMatch 410 ^/()xyz.$This would be a pretty easy fix if all of those spammy urls have any common characters after the forward slash or they all originate from a certain folder.
-
You might get a little quicker removal if you send them with a 410 status code. That will let Google know that the page is gone for good. http://searchenginewatch.com/sew/how-to/2340728/matt-cutts-on-how-google-handles-404-410-status-codes
-
No problem at all! These new URLs do not actually exist on the website. Since we cleaned up the malicious code all of these URLs redirect to our 404 page.
-
Sorry to misunderstand the problem. Do those new urls actually exist on your site or just in search?
-
Hi 94501,
Thanks for taking the time to respond. Just to be clear, we are not submitting multiple removals for the same URL and I don't think Google WMT even allows you to do that. Completely new URLs are appearing each day after removing the older ones.
My main concern is having spammy URLs indexed and associated with my website and the negative effects it can have from an SEO perspective.
-
Hi Pete,
It sounds like you've done what you can. I wouldn't submit multiple removals for the same url.
I assume it's out of your site map and you're not still being hacked and have figured out how it happened and taken steps to fix it.
Google will eventually figure it out. I'd try to move on to new stuff.
Best... Mike
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexing
Hi We have roughly 8500 pages in our website. Google had indexed almost 6000 of them, but now suddenly I see that the pages indexed has gone to 45. Any possible explanations why this might be happening and what can be done for it. Thanks, Priyam
Intermediate & Advanced SEO | | kh-priyam0 -
Javascript content not being indexed by Google
I thought Google has gotten better at picking up unique content from javascript. I'm not seeing it with our site. We rate beauty and skincare products using our algorithms. Here is an example of a product -- https://www.skinsafeproducts.com/tide-free-gentle-he-liquid-laundry-detergent-100-fl-oz When you look at the cache page (text) from google none of the core ratings (badges like fragrance free, top free and so forth) are being picked up for ranking. Any idea what we could do to have the rating incorporated in the indexation.
Intermediate & Advanced SEO | | akih0 -
Google's Knowledge Panel
Hi Moz Community. Has anyone noticed a pattern in the websites that Google pulls in to populate knowledge Panels? For example, for a lot of queries Google keeps pulling data from a specific source over and over again, and the data shown in the Knowledge Panel isn't on the target page. Is it possible that Google simply favors some sites over others and no matter what you do, you'll never make it into the Knowledge box? Thanks.
Intermediate & Advanced SEO | | yaelslater0 -
Problem: Magento prioritises product URL's without categories?
HI there, we are moving a website from Shoptrader to Magento, which has 45.000 indexations.
Intermediate & Advanced SEO | | onlinetrend
yes shoptrader made a bit of a mess. Trying to clean it up now. there is a 301 redirect list of all old URL's pointing to the new one product can exist in multiple categories want to solve this with canonical url’s for instance: shoptrader.nl/categorieA/product has 301 redirect towards magento.nl/nl/categorieA/product shoptrader.nl/categorieA/product-5531 has 301 redirect towards magento.nl/nl/categorieA/product shoptrader.nl/categorieA/product¤cy=GBP has 301 redirect towards magento.nl/nl/categorieA/product shoptrader.nl/categorieB/product has 301 redirect towards magento.nl/nl/categorieB/product, has canonical tag towards magento.nl/nl/categorieA/product shoptrader.nl/categorieB/product?language=nl has 301 redirect towards magento.nl/nl/categorieB/product, has canonical tag towards magento.nl/nl/categorieA/product Her comes the problem:
New developer insists on using /productname as canonical instead of /category/category/productname, since Magento says so. The idea is now to redirect to /category/category/productname and there will be a canonical URL on these pages pointing to /productname, loosing some link juice twice. So in the end indexation will take place on /productname … if Google picks it up the 301 + canonical. Would be more adviseable to direct straight to /productname (http://moz.com/community/q/is-link-juice-passed-through-a-301-and-a-canonical-tag), but I prefer to point to one URL with categories attached. Which has more advantages(?): clear menustructure able to use subfolders in mobile searchresults missing breadcrumb What would you say?0 -
What's the best way to check Google search results for all pages NOT linking to a domain?
I need to do a bit of link reclamation for some brand terms. From the little bit of searching I've done, there appear to be several thousand pages that meet the criteria, but I can already tell it's going to be impossible or extremely inefficient to save them all manually. Ideally, I need an exported list of all the pages mentioning brand terms not linking to my domain, and then I'll import them into BuzzStream for a link campaign. Anybody have any ideas about how to do that? Thanks! Jon
Intermediate & Advanced SEO | | JonMorrow0 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
Google Maps Integration Dynamic url
We are integrating Google Maps into a search feature on a website. Would you use the standard dynamic generated long url that appears after a search or find a way of reducing this to a shorter url. Taking into account hundreds of results. Question asked for seo purposes.
Intermediate & Advanced SEO | | jazavide0 -
Need to duplicate the index for Google in a way that's correct
Usually duplicated content is a brief to fix. I find myself in a little predicament: I have a network of career oriented websites in several countries. the problem is that for each country we use a "master" site that aggregates all ads working as a portal. The smaller nisched sites have some of the same info as the "master" sites since it is relevant for that site. The "master" sites have naturally gained the index for the majority of these ads. So the main issue is how to maintain the ads on the master sites and still make the nische sites content become indexed in a way that doesn't break Google guide lines. I can of course fix this in various ways ranging from iframes(no index though) and bullet listing and small adjustments to the headers and titles on the content on the nisched sites, but it feels like I'm cheating if I'm going down that path. So the question is: Have someone else stumbled upon a similar problem? If so...? How did you fix it.
Intermediate & Advanced SEO | | Gustav-Northclick0