New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL keyword separator best practice
Hello. Wanted to reach out see what the consensus is re-keyword separators So just taken on a new client and all their urls are structured like /buybbqpacks rather than buy-bbq-packs - my understanding is that it comes down to readability, which influences click through, rather than search impact on the keyword. So we usually advise on a hyphen, but the guy's going to have to change ALLOT of pages & setup redirects to change it all wasn't sure if it was worth it? Thanks! Stu
On-Page Optimization | | bloomletsgrow0 -
Use cookie-free domains
Is there anything simple i can install to reduce the Use of cookie-free domains, i have tried to used fooman extension but had major conflicts with other extensions? Kind regards
On-Page Optimization | | Mikaai0 -
URL for a new website
Hi, I am creating a new website for a client. Is it best to include the keywords from the most common search in the domain name, they would like: forenamesurname.com but should I be recommending: weddingmakeupbyforename.com Does it make much difference to search rankings if the keyword is in the domain name? Thanks v much
On-Page Optimization | | danieldunn100 -
I have more pages in my site map being blocked by the robot file than I have being allowed to be crawled. Is Google going to hate me for this?
Using some rules to block all pages which start with "copy-of" on my website because people have a bad habit of duplicating new product listings to create our refurbished, surplus etc. listings for those products. To avoid Google seeing these as duplicate pages I've blocked them in the robot file, but of course they are still automatically generated in our sitemap. How bad is this?
On-Page Optimization | | absoauto0 -
URL extensions naming
I have always wrote URL extensions as www.mysite.com/two_words.html .... when I need to separate two words, I use _ as the separator ... I am a first time SEO Moz user ... I While looking around the tools on SEO Moz, I happened to stumble across the on-page analysis. A great tool indeed, rather worryingly though, one issue it flagged to me was my URL extension "Characters which are less commonly used in URLs may cause problems with accessibility, interpretation and ranking in search engines. It is considered a best practice to stick to standard URL structures to avoid potential problems." Can someone advice me if this really is a problem, its just not this project, its tons of sites I have already developed that I am also worried about ... I always write file extensions with more than one word using _ to separate the words. How should I write the extension, I am almost embarrassed to ask this question ... Surely, even Google's algorithms are not smart enough to decipher two words without some some sort of spacing .... Regards J
On-Page Optimization | | Johnny4B0 -
Question about URLs
Hello! I have a client that wants to upload an URL like this: www.example.com/keyword/page-name.html The main problem is that www.example.com/keyword/ doesn't exist and gives a 404 error so I'd prefer not doing that...... What do you think about this? And if the client wants to go ahead, is there any solution? A 301 to the final page would help? Thank you in advance!
On-Page Optimization | | Juandbbam0 -
Updating Old Posts
I have ~ 45 posts that I wrote 2-3 years ago that need to be updated with current information and I'm wondering if I should: Just update them Update them and change the date published to present day Publish the updated info. as a completely new post other? ... and why. I've read so many conflicting thoughts on this, really curious to hear what other Pro members think (or would do if it were them). To give a little more background, the topics of the posts are various retirement communities. Things that may have changed could be they added new amenities, new home types, prices, number of homes still available, etc. I have one page of my site that acts as sort of a directory linking to an article(post) for each community, but worried if I add all the updates as new posts I'll have to link to separate articles about each community which doesn't really make things too friendly for the reader. They want to know about what's going on with each community now...not back 3 years ago. Thoughts? Suggestions? Many thanks! Ryan
On-Page Optimization | | ryanerisman0 -
Trailing slash on URLs
Hi everyone My question is regarding trailing URLs in Wordpress A designer setup a site and made the URL structure something like: www.website.com/description/ but I want to change it to www.website.com/description as this is more SEF A trailing / was added to the permalink structure in Wordpress, if I change this will google see these as new URLs and will all the current URLs become 404? Or should I just leave it? Thanks in advance
On-Page Optimization | | webseoservices0