Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Using 410 To Remove URLs Starting With Same Word
-
We had a spam injection a few months ago. We successfully cleaned up the site and resubmitted to google. I recently received a notification showing a spike in 404 errors.
All of the URLS have a common word at the beginning injected via the spam:
sitename.com/mono
sitename.com/mono.php?buy-good-essays
sitename.com/mono.php?professional-paper-writerThere's about 100 total URLS with the same syntax with the word "mono" in them. Based on my research, it seems that it would be best to serve a 410. I wanted to know what the line of HTACCESS code would be to do that in bulk for any URL that has the word "mono" after the sitename.com/
-
Martijn -
Thanks for your reply. I tried the code you provided, however it still provided a 404 error. I was able to get the following to work properly - any drawbacks to doing it this way?
RewriteRule ^mono(.*)$ - [NC,R=410,L]
The browser now shows the following anytime there is the word "mono" immediately after "sitename.com/"
The requested resource
/mono.php
is no longer available on this server and there is no forwarding address. Please remove all references to this resource.Additionally, a 410 Gone error was encountered while trying to use an ErrorDocument to handle the request.
-
Thanks for the detailed response. Yes, there are some negative-SEO backlinks to some of the URLs created during the spam injection. I've seen a few backlinks from other forum sites to our site to one of the spam created URLs which has hurt our rankings such as the following URL created on our site:
sitename.com/mono.php?best-resume-writing-service-for-it-professionals
I was confused by the following in your response: "If you can serve the 410s on a custom 410 page which also gives the Meta no-index directive, that will be a very strong signal to Google indeed that those aren't proper pages or fit for indexation"
- Is that all done view the htaccess file? Code? Or is the meta no-index directive done in the robots.txt?- custom 410 page? I've seen some 404 pages, but not custom 410 pages. Would that be similar to a new 404 page?
Thanks for your response.
-
There are so many ways to deal with this. If these were indeed spam URLs, someone may have attached negative-SEO links to them (to water down your site's ranking power). As such, redirecting these URLs back to their parents could pull spam metrics 'onto' your site which would be really bad. I can see why you are thinking about using 410 (gone)
Using Canonical tags to stop Google from indexing those bad parameter-based URLs could also be helpful. If you 'canonicalled' those addresses to their non-parameter based parents, Google would stop crawling those pages. When a URL 'canonicals' to another, different page - it cites itself as non-canonical, and thus gets de-indexed (usually, although this is only a directive). Again though, canonical tags interrelate pages. If those spam URLs were backed by negative SEO attacks, the usage of canonical tags would (again) be highly inadvisable (leaving your 410 suggestion as a better method).
Google listens for wildcard rules in your robots.txt file, though it runs very simplified regex (in fact I think only the "*" wildcard is supported). In your robots.txt you could do something like:
User-agent: *
Disallow: /mono.php?*That would cull Google's crawling of most of those URLs, but not necessarily the indexation. This would be something to do after Google has swallowed most of the 410s and 'got the message'. You shouldn't start out with this, as if Google can't crawl those URLs - it won't see your 410s! Just remember this, so that when the issue is resolved you can smack this down and stop the attack from occurring again (or at least, it will be preemptively nullified)
Finally you have Meta "No-Index" tags. They don't stop Google from crawling a URL, but they will remove those URLs from Google's index. If you can serve the 410s on a custom 410 page which also gives the Meta no-index directive, that will be a very strong signal to Google indeed that those aren't proper pages or fit for indexation
So now we have a bit of an action plan:
- 410 the bad URLs alongside a Meta no-index directive served from the same URL
- Once Google has swallowed all that (may be some weeks or just over 1 month), back-plate it with robots.txt wildcards
With regards to your oriignal question (sorry I took so long to get here) I'd use something like:
Redirect 410 /mono.php?*
I think .htaccess swallows proper regex (I think). The back slashes say "whatever character follows me, treat that character as a value and do not apply its general regex function". It's the regex escape character (usually). This would go in the .htaccess file at the root of your site, not in a subdir .htaccess file
Please sandbox text my recommendation first. I'm really more of a technical data analyst than a developer!
This document seems to suggest that a .htaccess file will properly swallow "" as the escape character:
https://premium.wpmudev.org/forums/topic/htaccess-redirects-with-special-characters
Hope this helps!
-
Hi,
Have you also excluded these pages from the robots.txt file so you can make sure that they're also not being crawled?
The code for the redirect looks something like this:RewriteEngine on
RewriteRule ^/mono* - [G,NC]Martijn.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
In writing the url, it is better to use the language used by the people of my country or English?
We speak Persian and all people search in Persian on Google. But I read in some sources that the url should be in English. Please tell me which language to use for url writing?
Technical SEO | | ghesta
For example, I brought down two models: 1fb0e134-10dc-4737-904f-bfdf07143a98-image.png https://ghesta.ir/blog/how-to-become-rich/
2)https://ghesta.ir/blog/چگونه-پولدار-شویم/0 -
Duplicate content issue: staging urls has been indexed and need to know how to remove it from the serps
duplicate content issue: staging url has been indexed by google ( many pages) and need to know how to remove them from the serps. Bing sees the staging url as moved permanently Google sees the staging urls (240 results) and redirects to the correct url Should I be concerned about duplicate content and request Google to remove the staging url removed Thanks Guys
Technical SEO | | Taiger0 -
Remove sitemap, effect ranking?
We are considering to remove our sitemap because it doesn't display the right structure. Will it affect current rankings if we remove the sitemap en continuing without a sitemap? Thanks
Technical SEO | | rijwielcashencarry0400 -
Why xml generator is not detecting all my urls?
Hi Mozzers, After adding 3 new pages to example.com, when generating the xml sitemap, Iwasn't able to locate those 3 new url. This is the first time it is happening. I have checked the meta tags of these pages and they are fine. No meta robots setup! Any thoughts or idea why this is happening? how to fix this? Thanks!
Technical SEO | | Ideas-Money-Art0 -
Can I use a 410'd page again at a later time?
I have old pages on my site that I want to 410 so they are totally removed, but later down the road if I want to utilize that URL again, can I just remove the 410 error code and put new content on that page and have it indexed again?
Technical SEO | | WebServiceConsulting.com0 -
How to Remove /feed URLs from Google's Index
Hey everyone, I have an issue with RSS /feed URLs being indexed by Google for some of our Wordpress sites. Have a look at this Google query, and click to show omitted search results. You'll see we have 500+ /feed URLs indexed by Google, for our many category pages/etc. Here is one of the example URLs: http://www.howdesign.com/design-creativity/fonts-typography/letterforms/attachment/gilhelveticatrade/feed/. Based on this content/code of the XML page, it looks like Wordpress is generating these: <generator>http://wordpress.org/?v=3.5.2</generator> Any idea how to get them out of Google's index without 301 redirecting them? We need the Wordpress-generated RSS feeds to work for various uses. My first two thoughts are trying to work with our Development team to see if we can get a "noindex" meta robots tag on the pages, by they are dynamically-generated pages...so I'm not sure if that will be possible. Or, perhaps we can add a "feed" paramater to GWT "URL Parameters" section...but I don't want to limit Google from crawling these again...I figure I need Google to crawl them and see some code that says to get the pages out of their index...and THEN not crawl the pages anymore. I don't think the "Remove URL" feature in GWT will work, since that tool only removes URLs from the search results, not the actual Google index. FWIW, this site is using the Yoast plugin. We set every page type to "noindex" except for the homepage, Posts, Pages and Categories. We have other sites on Yoast that do not have any /feed URLs indexed by Google at all. Side note, the /robots.txt file was previously blocking crawling of the /feed URLs on this site, which is why you'll see that note in the Google SERPs when you click on the query link given in the first paragraph.
Technical SEO | | M_D_Golden_Peak0 -
Google News URL Format
Hi, We are currently redesigning our gaming website (www.totallygn.com) and one of our main goals is to get listed by Google News in future. Looking at the Google News URL requirements "The URL for each article must contain a unique number consisting of at least three digits." How does the above affect SEO structure? I was planning on using a format such as www.totallygn.com/xbox-360/360-reviews/fifa-12-review how would this compare to something like? www.totallygn.com/xbox-360/360-reviews/fifa-12-review234 Thanks in advance for your help
Technical SEO | | WalesDragon0 -
Old URL redirect to New URL
Alright I did something dumb a year a go and I'm still paying for it. I changed my hyphenated URL to the non-hyphenated version when I redesigned my website. I say it was dumb because I lost most of my link juice even though I did 301 redirects (via the htaccess file) for almost all of the pages I could find in Google's index. Here's my problem. My new site took a huge hit in traffic (down 60%) when I made the change and even though I've done thousands of redirects my old site is still showing up in the SERPS and send much if not most of my traffic. I don't want to take the old site down in fear it will kill all of my traffic. What should I do? Is there a better method I should explore then 301 redirects? Could the other site be affecting my current rank since it's still there? (FYI...both sites are built on the WP platform). Any help or ideas are greatly appreciated. Thank you! Joe
Technical SEO | | kaje0