New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt Question for E-Commerce Sites
Hi All, I have a couple of e-commerce clients and have a question about URLs. When you perform a search on website all URLs contain a question mark, for example: /filter.aspx?search=blackout I'm not sure that I want these indexed. Could I be causing any harm/danger if I add this to the robots.txt file? /*? Any suggestions welcome! Gavin
On-Page Optimization | | IcanAgency0 -
New Page Not ranking?
One of this client's top keyword is "oak beams". They already rank well in the UK for other related terms like "reclaimed oak beams" at /reclaimed-oak-beams/ and "air dried oak beams" at /air-dried-oak-beams/ We have created a page at /oak-beams/ but this page ranks nowhere? Instead the reclaimed oak beams or air dried oak beams page ranks for the term "oak beams". Any ideas why Google is swapping between those pages and not choosing the /oak-beams/ page? A few notes are that the /oak-beams/ page is newest page on the site and yes I know there are no links pointing to it but there are no links pointing to the other pages either?
On-Page Optimization | | Marketing_Today0 -
Use of the word Find in Title Tags
Hey, So i'm looking to make content that is optimized for Finding an injury lawyer in boston. The Phrase "Personal Injury Lawyers in Boston" get's a lot more searches than "Find Personal Injury Lawyers in Boston" but with the Find is it less competitive? The same thing goes for "Find lawyers in Boston" vs. "Lawyers in Boston." My question is, is it better to put the word FInd in front or not? Is there a downside?
On-Page Optimization | | RafeTLouis0 -
301 vs. keeping identical URL
Hey everybody! I have a question pertaining to our redesign. The situation is as follows: /drug-rehab/alcohol-withdrawal-los-angeles gets a decent amount of views on out website, and we would like it to be on our redesigned site. I was curious what impact, if any, I would see given the two scenarios below. 301 to /alcohol-withdrawal make the new page /drug-rehab/alcohol-withdrawal-los-angeles as well The second situation is that there are a serious of other pages which don't seem to be of drastic benefit, which I don't feel NEED to be on the website. For example: /post-acute-alcohol-withdrawal-treatment/drug-los-angeles /rehabs-resources/drug-abuse/sub-acute-alcohol-withdrawal etc It appears to me that the content on these pages is rather similar, and I feel like they don't really say anything special. Can I 301 them to the new page? Should I let them die in the black hat inferno they were made in? Any thoughts are greatly appreciated! Thanks guys!
On-Page Optimization | | HashtagHustler0 -
Should you use Plural version of a keyword or singular
H If kw research shows that singular version of a keyword has higher search volume than plural version should you still use plural version in main on-page areas to try and catch both instances or focus on the singular ? cheers dan
On-Page Optimization | | Dan-Lawrence0 -
Canonical URL Tag
Hi, I have two pages that are identical on my site: http://www.absolutepower.nl/creatine-monohydraat and http://www.absolutepower.nl/CREATINE/creatine-monohydraat Should I use the canonical URL tag in this case? Thanks, Jasper
On-Page Optimization | | Japking0 -
Best information organization for a new site?
I'm launching a new stain removal website, and wanted to know what would be considered the best way to organize the content? Since most articles will roughly involve "removing X from Y" or "how to remove Z," I can see two ways... 1. Organize articles by Stained Items, Stain Agents and perhaps Cleaning Detergents. 2. Spread the categories out more, to try and group stained items according to categories... E.g. Hard surfaces, delicates, fabrics, ceramics etc. Any thoughts on which of these two might be the best way to organize the site, or are there any better suggestions? Not sure what the main considerations are here... Either of these two seem equally user-friendly.
On-Page Optimization | | ZakGottlieb710 -
Alt tag using photoshop
Simple question i think. Ive started adding alt tags to images using the slice tool in photoshop. This takes up a menu were the last part of is alt tag: This way to add alt tags does work right? I used SEO-browser afterwards and couldnt see the tag. There are maybe other better ways to see if your tags are in there ? Dan L.
On-Page Optimization | | danlae0