New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What is the best tag to use for your Logo ?
Hi, I'm wondering what is the best tag to use on your logo. We're currently using h1 and i want to scrap that ASAP.
On-Page Optimization | | Alex.harvey.Cortex0 -
Does anyone use Genesis Framework? If so can a newbie use it and a few other questions
Hi, So as I search the wonderful land of the internet, I see this Genesis framework brought up quite a bit. I have researched it for a few weeks, but it seems like it uses hooks instead of shortcodes. So I am curious if anyone has used it? And if so what your thoughts are about it? I am a COMPLETE newbie here, so hooks look scary. I am sure with time they will seem like second nature. They claim it has airtight security. So if you have used this framework, how is this any different than an updated stock wordpress site? I understand that vulnerabilities may be in plugins and such, but if it is really airtight, that seems great. Any thoughts are appreciated as I just want the best user experience. So many people use this framework, yet my site gets if I'm lucky 1000 views each month. It is a basic site to let people know we exist. So its not like I have a popular blog with 50,000 pageviews each month. But... going into the future, I want a pleasant and consistent user experience. Maybe a wordpress theme is all you need. Maybe a framework is more for developers. Any thoughts are greatly appreciated. Chris
On-Page Optimization | | asbchris0 -
Wordpress category url problem.
I have set up wordpress categories but the permalinks are showing as www.mydomain.com/?cat=12 as opposed to the category name. The child categories though work fine and show as www.mydomain.com/category/chidcatgegory I've obviously got my permalink settings wrong somewhere. How do I fix this?
On-Page Optimization | | SamCUK0 -
URL best practices
Hi, I have a problem here, I used http://www.vietnamvisacorp.com/faqs.html instead of http://www.vietnamvisacorp.com/faqs. Hence, http://www.vietnamvisacorp.com/faqs will be caused 404 page. My question is should I change from faqs.html to faqs (no .html)? Thanks in advance any advice?
On-Page Optimization | | JohnHuynh0 -
Using a more relevant brand title for blog
I'm a newbee here so I appologize in advance for asking a question that might already be aswered ( i looked I promise). The question is this, I've been fiddling with the title tags and came upon the need to make a decision about separating our blog brand to be more specific to it's content. We're a moving company, our primary website talks about services and is branded with our name (%page_name% | 2 Brothers Moving & Delivery Portland Oregon), our blog is a work in progress "Moving Guide" (%post_title% | Portland Moving Guide). Should I stick with the standard brand name on the blog or call it something keyword specific like above? As a side question what do you all think about my titles in the first place? In case you'd like to take a look: www.2brothersmoving.net www.2brothersmoving.net/blog
On-Page Optimization | | r1200gsa0 -
Is it worth changing urls with underscores?
A few pages on one of my sites have underscores linking keywords rather than hyphens (keywords_and_keyword rather than keyword-and-keyword). Possibly from a time before I knew hyphens were preferred... One of the pages ranks well, and drives a good amount of traffic. The others do not do so well, but are still within the top 10 landing pages for the site. Is it worth me changing the underscores to hyphens (setting up 301 redirects first of course) or doesn't it make that much difference?
On-Page Optimization | | Jingo010 -
Using commas in the title tag?
Is there a disadvantage/advantage to using commas to separate words in the title tag. Which will be more effective as a title tag: "keyword1 keyword2 - Brand" OR "keyword1, keyword2 - Brand"?
On-Page Optimization | | Audiohype0 -
How to use good keyword URL to help main site
Hi. I'm a long time ecommerce guy and starting a third business. The main site URL is the name of the new business but I also purchased a .com URL that is our #1 keyword to target. So I need to know the best strategy to use the keyword url for helping with getting a top ranking for that keyword. I'm curious if I can or should build out the keyword URL site for the search engines and use a 301 redirect. Can you get top ranking for a site that just redirects? Anyway, I guess you get my question. This keyword gets a ton of perfectly targeted traffic so seems like a goldmine if I work it right. Thanks very much.
On-Page Optimization | | jimmyseo1