New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
When using long-tail keywords, should you exactly match for the url or delete "in" "to" etc.?
long-tail keyword - "seizures in adults with no history" Should you include "in and with" in the url?
On-Page Optimization | | Moleculera0 -
Url shows up in "Inurl' but not when using time parameters
Hey everybody, I have been testing the Inurl: feature of Google to try and gauge how long ago Google indexed our page. SO, this brings my question. If we run inurl:https://mysite.com all of our domains show up. If we run inurl:https://mysite.com/specialpage the domain shows up as being indexed If I use the "&as_qdr=y15" string to the URL, https://mysite.com/specialpage does not show up. Does anybody have any experience with this? Also on the same note when I look at how many pages Google has indexed it is about half of the pages we see on our backend/sitemap. Any thoughts would be appreciated. TY!
On-Page Optimization | | HashtagHustler1 -
Is it better to try and boost an old page that ranks on page #5 or create a better new page
Hello Everyone, We have been looking into our placements recently and see that one of our blog posts shows on page #5 for a popular keyword phrase with a lot of search volume. Lets say the keyword is "couples fitness ideas" We show on page 5 for a post /couples-fitness-ideas-19-tips-and-expert-advice/ We want to try and get on the first page for that phrase and wanted to know if it is better if we did one of the following: 1. Create a new page with over 100 ideas with a few more thousands of words. with a new url (thinking /couples-fitness-ideas) 2. Create a new page with a new url (thinking /couples-fitness-ideas) with the same content as the currently ranking post. We would want to do this for more freedom with layout and design of the page rather than our current blog post template. Add more content, let's say 100 more ideas. Then forward the old URL to the new one with a 301 redirect. 3. Add more content to the existing post without changing the layout and change the URL. Look forward to your thoughts
On-Page Optimization | | MobileCause0 -
Best .net cms and ecommerce platform for seo?
Hi and thanks for looking. Due to internal constraints we are looking at a .net cms and ecommerce solution, I would have preferred a magento and wordpress solution but we have to work within our constraints. Can anyone share their experiences or thoughts on what cms they would use with seo in mind and what eCommerce platform, again with seo in mind. thanks again for looking and all replies gratefully received.
On-Page Optimization | | Renford_Nelson1 -
Solve duplicate content issues by using robots.txt
Hi, I have a primary website and beside that I also have some secondary websites with have same contents with primary website. This lead to duplicate content errors. Because of having many URL duplicate contents, so I want to use the robots.txt file to prevent google index the secondary websites to fix the duplicate content issue. Is it ok? Thank for any help!
On-Page Optimization | | JohnHuynh0 -
Using rel="nofollow"
Hello, Quick question really, as far as the SERPs are concerned If I had a site with say 180 links on each page - 80 above suggested limit, would putting 'rel="nofollow"' on 80 of these be as good as only having 100 links per page? Currently I have removed the links, but wereally need these as they point to networked sites that we own and are relevant... But we dont want to look spammy... An example of one of the sites without the links can be seen here whereas a site with the links can be seen here You can see the links we are looking to keep (at the bottom) and why... Thanks
On-Page Optimization | | TwoPints0 -
Wordpress categories tags and robots.txt
I am relatively new at this and see a variety of people that seem to disagree on if you should block google from indexing category and tag pages through robot.txt or no-follow because of google viewing it as duplicate content. I tryst this communities answers over the web at large obviosly, so what do you all think? Thanks, Steven
On-Page Optimization | | sfmatthews0 -
Anchor text, same page, different kewords to same URLs
Could someone please tell how Google treats the use of anchor text from a single page when using different keywords that all point to the same URL. So for instance I am doing a blog post and use the following anchor text which all point to the same URL: Cool Widget >> www.domain.com/widget Awesome Widget >> www.domain.com/widget Mighty Widget >> www.domain.com/widget I have read that Google will only take noticeof the first one? Thanks
On-Page Optimization | | blagger0