Removing URLs in bulk when directory exclusion isn't an option?
-
I had a bunch of URLs on my site that followed the form:
http://www.example.com/abcdefg?q=&site_id=0000000048zfkf&l=
There were several million pages, each associated with a different site_id. They weren't very useful, so we've removed them entirely and now return a 404.The problem is, they're still stuck in Google's index. I'd like to remove them manually, but how? There's no proper directory (i.e. /abcdefg/) to remove, since there's no trailing /, and removing them one by one isn't an option. Is there any other way to approach the problem or specify URLs in bulk?
Any insights are much appreciated.
Kurus
-
I'd go into Google Webmaster Tools and their parameter settings and tell them to ignore this parameter.
I would need to look up the exact syntax, but Google does accept some dynamic exclusions and parameters in robots.txt, and you may be able to put that into robots and then use the URL removal tools.
-
There are no links to these pages, so no juice. There are also no 'new' replacement pages. We just want them out of the index ASAP by any means necessary.
-
You should have 301 your most important pages to the new urls, so that you would keep your juice.
-
Thanks, but the goal is to expedite the removal process via the URL removal tool. We've already 404'd the pages, so they'll be removed from the index. It's a question of timing, since the pages in question are low quality and hurting us in the context of Panda.
-
try 301 redirect for most important links. http://www.seomoz.org/learn-seo/redirection
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
When I crawl my website I have urls with (#!162738372878) at the end of my urls
When I crawl my website I have urls with (#!162738372878) at the end of my urls. I used screaming frog to look check my website and I seen these. My normal urls are in there too, but each of them have a copy with this strange symbol and number at the end. I used a website builder called homestead to make the website and I seen a bunch of there urls in my crawl as well - http://editor.homestead.com/faq is an example I recently created a new website with their new website builder and transferred it to my old domain. However, I didnt know they didnt offer 301 redirects or canonical tags(learned about those afterwards) and I changed my page names. So they recommended I leave the old website published along with the new website. So if I search my website name on google, sometimes both will show in the results. I just want to sort this all out somehow. My website is www.coastlinetvinstalls.com Any feedback is greatly appreciated. Thanks, Matt
Intermediate & Advanced SEO | | Matt160 -
Should I change client's keyword stuffed URLs?
Hi Guys, We currently have a client that offers reviews and preparation classes for their industry (online and offline). One of the main things that I have noticed is how all of their product landing page urls are stuffed with keywords. I have read changing url's will impact up to 25% traffic and to not mess with url's unless it is completely needed. My question is, when url's are stuffed with keywords and make the url length over 200 characters, should I be focusing on a more structured url system?
Intermediate & Advanced SEO | | EricLee1230 -
Establishing if links are 'nofollow'
Wonder if any of you guys can tell me if there is any other way to tell google links are nofollow other than in the html (ie can you tell google to nofollow every link in a subdomain or something). I'm trying to establish if a couple of links on a very high ranking site are passing me pagerank or not without asking them directly and looking silly! Within the source code for the page they are NOT tagged as nofollow at present. Hope that all makes sense 😉
Intermediate & Advanced SEO | | mat20150 -
Content From One Domain Mysteriously Indexing Under a Different Domain's URL
I've pulled out all the stops and so far this seems like a very technical issue with either Googlebot or our servers. I highly encourage and appreciate responses from those with knowledge of technical SEO/website problems. First some background info: Three websites, http://www.americanmuscle.com, m.americanmuscle.com and http://www.extremeterrain.com as well as all of their sub-domains could potentially be involved. AmericanMuscle sells Mustang parts, Extremeterrain is Jeep-only. Sometime recently, Google has been crawling our americanmuscle.com pages and serving them in the SERPs under an extremeterrain sub-domain, services.extremeterrain.com. You can see for yourself below. Total # of services.extremeterrain.com pages in Google's index: http://screencast.com/t/Dvqhk1TqBtoK When you click the cached version of there supposed pages, you see an americanmuscle page (some desktop, some mobile, none of which exist on extremeterrain.com😞 http://screencast.com/t/FkUgz8NGfFe All of these links give you a 404 when clicked... Many of these pages I've checked have cached multiple times while still being a 404 link--googlebot apparently has re-crawled many times so this is not a one-time fluke. The services. sub-domain serves both AM and XT and lives on the same server as our m.americanmuscle website, but answer to different ports. services.extremeterrain is never used to feed AM data, so why Google is associating the two is a mystery to me. the mobile americanmuscle website is set to only respond on a different port than services. and only responds to AM mobile sub-domains, not googlebot or any other user-agent. Any ideas? As one could imagine this is not an ideal scenario for either website.
Intermediate & Advanced SEO | | andrewv0 -
Sudden increase in number of indexed URLs. How ca I know what URLs these are?
We saw a spike in the total number of indexed URLs (17,000 to 165,000)--what would be the most efficient way to find out what the newly indexed URLs are?
Intermediate & Advanced SEO | | nicole.healthline0 -
How important is it to canonicalize mobile URLs to desktop URLs?
I know many SEO's prefer a stylesheet and single URL, but if you use m.domain.com, do you canonicalize to your desktop URLS?
Intermediate & Advanced SEO | | nicole.healthline0 -
Should I 301 Poorly Worded URL's which are indexed and driving traffic
Hi, I'm working on our sites structure and SEO at present and wondering when the benefit I may get from a well written URL, i.e ourDomain / keyword or keyphrase .html would be preferable to the downturn in traffic i may witness by 301 redirecting an existing, not as well structured, but indexed URL. We have a number of odd looking URL's i.e ourDomain / ourDomain_keyword_92.html alongside some others that will have a keyword followed by 20 underscores in a long line... My concern is although i would like to have a keyword or key phrase sitting on its own in a well targeted URL string I don't want to mess to much with pages that are driving say 2% or 3% of our traffic just because my OCD has kicked in.... Some further advice on strategies i could utilise would be great. My current thinking is that if a page is performing well then i should leave the URL alone. Then if I'm not 100% happy with the keyword or phrase it is targeting I could build another page to handle the new keyword / phrase with the aim of that moving up the rankings and eventually taking over from where the other page left off. Any advice is much appreciated, Guy
Intermediate & Advanced SEO | | guycampbell0 -
Competitior 'scraped' entire site - pretty much - what to do?
I just discovered a competitor in the insurance lead generation space has completely copied my client's site's architecture, page names, titles, even the form, tweaking a word or two here or there to prevent 100% 'scraping'. We put a lot of time into the site, only to have everything 'stolen'. What can we do about this? My client is very upset. I looked into filing a 'scraper' report through Google but the slight modifications to content technically don't make it a 'scraped' site. Please advise to what course of action we can take, if any. Thanks,
Intermediate & Advanced SEO | | seagreen
Greg0