New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Disadvantages of Migrating Website to New URL
Hi There, I am currently struggling with the ranking of my website. No matter how many initiatives I try (backlinking, blog commenting, social posting, etc.) I can't seem to make any progression in Google Search. I've done competitive metrics through Open Site Explorer and can't seem to really find the reason why my site is not ranking as well as my competitors. The only one possible glaring element I've thought about is my website URL. This company is in the heating and cooling industry and majority of my competitors have either "heating" or "cooling" or both in their website URL's but mine does not. Does anyone have any thoughts or recommendations on if changing my URL and then redirecting my current URL would be a step in the right direction help me to climb the rankings in Google Search? Thanks!
On-Page Optimization | | MainstreamMktg0 -
Toxic URL???
Hi I have a URL that produced page 1, number 1 to 3 for most of our industries top phrases. Then we received a google penalty, (as did several of our competitors on the same day). We were effectively wiped from google. After much disavowing we were allowed back into the search results, this took about 3 months. I have employed the services of a top London SEO company for over a year now and have seen no significant improvement. I believe they are doing there best, however there results are VERY poor. According to the various tools, (searchmetrics, woorank, semrush) to name but a few, our site scores very well, yet we are not getting the results. Page one seems to be full of totally new websites, most of which I have never heard of, and have appeared from nowhere. Should I scrap our URL and put up a completely new one, and put a redirect from the original one? This would be a biggy since our url has been around for 20 years. Thanks for reading. Andy
On-Page Optimization | | First-VehicleLeasing0 -
Value of URL Changes
Hi Guys, I have a question. Each product listed on my webstie has product number like /product.php?id=3624. After I spent many hours with MOZ, I figured out that this approach is wrong and I should use the product name as URL to achieve better SEO performance. Now I am planing to change the URL generating algoritm but should I do it for existing products. Some of them have already been linked to external websites. I am thinking to create mirror URLs but this may cause rather damage on my website. Do you know what is the right answer? Best, Tony
On-Page Optimization | | Threeding.com0 -
What word should I use in my URL for my blog
Should I use the word "blog" in my sub folder as in : http://www.mybusiness.com/blog or should I use http://www.mybusiness.com/news. Is there a difference for when my site is crawled. I understand that a blog works a little differently. Can someone explain the basics?
On-Page Optimization | | graemesanderson0 -
Keyword density or No. of Time keyword used
Now, I know that there is no set figure to be used here, whichever metric you are using and it will depend on the article and what is natural. However, lets suppose for a minute that we are taking a keyword in isolation, and I have a 2000 word article using the keyword 17 times and rank no. 3 in Google SERPS. The no. 1 slot uses the keyword 8 times but only has a 800 word article and only a B grade on the onpage ranker. Of course, there are off page factors as well, but just wondering what your thoughts are on whether you look at density or total keyword usage. It is easy to just write without think about keyword density or usage, but occasionally you end up using the keyword about 50 times, and it is then I have to actually think about it. Other articles I barely use the keyword because the article just writes itself and it works out fine, but these are generally shorter. With longer articles on my best converting pages, I can't help but think about it more and it ends up a little hit and miss.
On-Page Optimization | | TheWebMastercom1 -
URL structure
Hello all, I am about to sort out my websites link structure, and was wondering which approach to our services page would be best. should we have: services/digital-marketing & services/website-design etc or: digital-marketing/website-design & digital-marketing/seo Basically I see digital marketing as the top level category that is the umbrella term for all of our digital services. But would it make more sense to have service to be the main category and digital marketing within that (along with all the other services from web design to seo)? all thoughts welcome!
On-Page Optimization | | wseabrook0 -
URLs and folder structure for an E-commerce
Hi there !-) I´m helping a friend who has a e-commerce about nail polish in Brazil. I´m a little in doubt about the urls and folder structure. Two questions: 1. There are 10 products per category and 50 categories. Should I put them all in the root folder or creat 2 major categories ( 25 sub-categories each one)? 2. Whats the better product page url ( the store has around 500) nailpolish.com/IMPORT/BRAND/NAME-OF-THE-PRODUCT OR nailpolish.com/COMPLETE-NAME-OF-THE-PRODUCT Whats the best recomandation?
On-Page Optimization | | SeoMartin10