URL Index Removal for Hacked Website - Will this help?

Tosten

My main question is: How do we remove URLs (links) from Google's index and the 1000s of created 404 errors associated with them after a website was hacked (and now fixed)?

The story: A customer came to us for a new website and some SEO. They had an existing website that had been hacked and their previous vendor was non-responsive to address the issue for months. This created THOUSANDS of URLs on their website that were then linked to pornographic and prescription med SPAM sites. Now, Google has 1,205 pages indexed that create 404 errors on the new site. I am confident these links are causing Google to not rank well organically.

Additional information:

Entirely new website
Wordpress site
New host

Should we be using the "Remove URLs" tool from Google to submit all 1205 of these pages? Do you think it will make a difference? This is down from the 22,500 URLs that existed when we started a few months back. Thank you in advance for any tips or suggestions!

Mobilio

Yes.

Disavow needed for each site (http/https).

Mcurius

Thanks for clearing this out.
If i have spammy links on http version, but my site is now https, i should upload the same disavow list on both http and https? (i saw one answer of yours in other thread saying just that , and i think is important because many of us are missing this detail)

Mobilio

If they are not your - it's better to disavow them. If they are spammy - disavow them.

Those links may hurt your ranking.

Mcurius

Hi Pete, something in your answer got my attention.
Like one month ago , i saw some (as was proven later) spammy links pointing to one specific page of my site. Those links ( from 20+ domains) were coming from some german domain names with the ltd .xyz extension.
Now the links don't actually exists, but those referring pages saying 410 Gone (nginx server).
Is that bad for that spesific page of mine?
I never saw in past this http status.

Mobilio

If your "bad" link is like http://OURDOMAIN/flibzy/foto-bugil-di-kelas.html then your .htaccess should be:
Redirect 410 /flibzy/foto-bugil-di-kelas.html
that's all.

Yes - you should do this for ALL 1205 URLs. Don't do this on legal pages (before hacking), just on hacked pages. I say "gone" with 410 redirect. It's amazing. In your case gone for good. Time for identify that 1205 URLs and paste them into .htaccess is let's say X hours. Time for identify that 1205 URLs and temporary remove them is Y hours. Since "temporary removal" is up to 30 days this make same job each month. In total for one year you have X in first case and 12*Y in second case. You can see difference, right?

Also today Barry Adams release story about hacking:
http://www.stateofdigital.com/website-hacked-manual-penalty-google/
and it's amazing that site was hacked just for 4 hours but Google notice this. You can see there traffic drop and removal from SERP. Ok, i'm not trying to "fear sells", but keeping bad pages with 404 will take long time. In Jan-Feb 2012 i have new temporary site on mine site within /us/ folder and even today Jan 2016 i still receiving bots crawling this folder. That's why i nuke it with 410. This save the day!

On your case it's same. Bot is wasting time and resources to crawl 404 pages over and over but crawling less your important pages. That's why it's good to nuke them. ONLY them. This will save bot crawling budget on your website. So bot can focus on your pages.

Tosten

Hi Peter,

Thank you for your response! I saw you answered a similar question about a week ago, so thank you for weighing in on my options. So, to clarify, I must do this for all 1,205 of the URLs?

One SPAM link is pointing here: http://OURDOMAIN/flibzy/foto-bugil-di-kelas.html so in your above example, this would look like:

Redirect 410 /dir/http://OURDOMAIN/flibzy/foto-bugil-di-kelas.html/ (?) and do this for each page that Google has indexed?

I saw your example with the iphone on the other post. How did you get that page to say, GONE - The requested resource...

Mobilio

The best is to keep them 404. But fast is to 410 them.

All you need is to place this topmost somewhere of .htaccess:
Redirect 410 /dir/url1/
Redirect 410 /dir/url2/
Redirect 410 /dir1/url3/
Redirect 410 /dir1/url4/

But this won't help you if your URLs have parameters somewhere like index.php?spamword1-blah-blah. For this you need extended version like this:
RewriteEngine on
#RewriteBase /
RewriteCond %{QUERY_STRING} spamword
RewriteRule ^(.)$ /404.html? [R=410,L]
RewriteCond %{QUERY_STRING} spamword1
RewriteRule ^(.)$ /404.html? [R=410,L]
RewriteCond %{QUERY_STRING} spamword2
RewriteRule ^(.*)$ /404.html? [R=410,L]

So why 410? 410 act much faster than 404 but it's DANGEROUS! If you sent 410 to normal URL this is effective nuking it. I found that with 410 bot visit this url 1-2-3 times, but with 404 bot keep visiting over and over eating your crawling budget. URL removal in SearchConsole is OK, but it's fast but works only for 30 days. And will eat almost same time as building list for 404/410s. Hint: You can speedup crawling if you do "fetch and render" then submit to index.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

URL Index Removal for Hacked Website - Will this help?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Wrong URLs indexed, Failing To Rank Anywhere

Removing .html from URLs - impact of rankings?

Help! Website Page Structure.

Will have /index in my url hurt?

URLs: Removing duplicate pages using anchor?

If a website Uses <select>to dropdown some choices, will Google see every option as Content Or Hyperlink?</select>

Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)

I have removed over 2000+ pages but Google still says i have 3000+ pages indexed