New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirect chains error on the home page URL
Hello Everyone, I'm getting redirect chains error on the home page URL:
On-Page Optimization | | Nikhil_Falcon
http://ebitdacatalyst.com in Wordpress. I've checked my redirection list in the plugin, and haven't found any redirections on http://ebitdacatalyst.com. Can anyone please help me in solving this issue? I don't know where from it's coming.0 -
Help recover lost traffic (70%) from robots.txt error.
Our site is a company information site with 15 million indexed pages (mostly company profiles). Recently we had an issue with a server that we replaced, and in the processes mistakenly copied the robots.txt block from the staging server to a live server. By the time we realized the error, we lost 2/3 of our indexed pages and a comparable amount of traffic. Apparently this error took place on 4/7/19, and was corrected two weeks later. We have submitted new sitemaps to Google and asked them to validate the fix approximately a week ago. Given the close to 10 million pages that need to be validated, so far we have not seen any meaningful change. Will we ever get this traffic back? How long will it take? Any assistance will be greatly appreciated. On another note, these indexed pages were never migrated to SSL for fear of losing traffic. If we have already lost the traffic and/or if it is going to take a long time to recover, should we migrate these pages to SSL? Thanks,
On-Page Optimization | | akin671 -
What Are other people using in replacement for sliders?
Hello, Moz Community! I am currently trying to replace a slider on our client's site. Sliders, in my opinion, are awful, they slow load times and just don't convey a solid message. I am using Wordpress and the visual composer plugin. Any ideas are really appreciated even they may seem a bit much, if I don't know how to do it I will figure it out. I apologize as well if this isn't the appropriate place for this type of question.
On-Page Optimization | | Striventa0 -
Using keywords in my URL: Doing a redirect to /keyword
My website in "On Page Grade" received an A.Anyway, I only have 1 thing to optimize:_"Use Keywords in your URL__Using your targeted keywords in the URL string adds relevancy to your page for search engine rankings, assists potential visitors identify the topic of your page from the URL, and provides SEO value when used as the anchor text of referring links."_My website is ranking in top10 for a super high competitive keyword and all my others competitors have the keyword on their domain, but not for my URL.Since I can't change my domain for fixing this suggestion, I would like to know what do you think about doing a 301 redirect from / to mydomainname.com/keyword/So the index of my website would be the /keyword.I don't know if this can make a damage to my SERP for the big change ir it would be a great choice.
On-Page Optimization | | estebanseo0 -
Use External Links
Hey 🙂 I noticed when analysing my pages that Moz gives the following advice about adding external links to my articles; "On any page specifically targeting a keyword, link externally to at least one (and possibly more than one) relevant, trusted resources as a best practice." As a small business I work pretty damn hard to get visitors to my website, so why on earth would I want to go to all that trouble just to send them away again to a trusted resouce? Secondly, what exactly is a "trusted resource"? Can I simply use search and use the top competitor, for example Moz or Wikipedia and does the anchor need to be an exact match or will a partial suffice. I say this because I already have the top spot for my longtail, so an exact match would be pointless. And lastly, I notice that pretty much all quality sites use external links to open in the same window i.e. not target=_blank, I never thought of it before today, but now that I'm considering using external linking in my articles I guess it's important to know the answer - i.e. Is this a best practice and does this give any seo benefit? Cheers, Lee :)
On-Page Optimization | | LeeC0 -
Any benefit to using HeadSpace AND All in One?
I noticed that somewhere along the line (outside developer or SEO) I ended up with HeadSpace AND All in One on one of my WP sites. There are functions that I appreciate with both and I wonder if there is any danger to completing both forms for a post or page? Is there really any benefit or just a waste of time? I keep finding articles that compare the 2, but nothing that talks about using them together. If I get rid of All in One, …. WOW. Mid question, i realized I'm a dum-dum. All in One has the same no follow options I thought I would miss from HeadSpace. So new question…if I uninstall headspace, will I lose the data/settings that it was used to set up? Jenn
On-Page Optimization | | vernonmack0 -
Blogs & CMS setup and integration
Hi Guys, I'm quite new to all things website wise. I understand the 'basics' of SEO (i think) and i'm alright with html but my experience stops there. Since i signed up to SEOMOZ, i've realised the importance of SEO and therefore our company has now employed 'an expert' to help us to go forward who happens to work for our web hosting company. Throughout the discussions, they strongly emphasised the importance of blogging and having a proper blog setup on our site and so we went along with the suggestion along with the SEO. Before we began, i went away and redesigned our site updating a lot of the content, the layout etc to basically give them a much better starting point because ultimately it would only benefit us. They have just started the work this month and to say i'm underwhelmed is an understatement!! With regards to the blog, as i didn't know what they were planning to do, i created three links at the bottom of our pages which looked like blog posts, a general blog page, and some blog posts which are all simple html pages. I assumed that they would then go away and create the blog thereselves and then obviously add it to our site. They have just come back now and said to me that as our site doesn't have a CMS, they will have to forward me the weekly posts and i will have to add them to the site myself and post them to our facebook and twitter pages. I am not particularly impressed with this as this is what we have paid just over £800 for which supposedly included them setting it up and managing it and did not include the SEO which is a lot of money for a small company as ourselves. What i wanted to know is that from our site as it stands now: http://www.customdesignedcable.co.uk, would it be difficult to include some sort of blog system that would integrate into our pages on the footer and the blog page or would i have to redesign the site through something such as joomla? I have never used joomla and i've only just found out that it exists. I'm asking on this forum as all of you guys know what your talking about and before i go back to them all guns blazing tomorrow, it would be great to be 'well up' on how difficult it would be and what the easiest ways would be to do it just to give me some ammo because i think they are talking out of there backsides! Any help you could give me would be greatly appreciated!! Big Thanks Guys!! I look forward to hearing from you. Chris.
On-Page Optimization | | Chris_CDC0 -
Using categories in Permalinks
I am looking at updating my WP Permalink structure and wanted to know if I should continue to include the category after my domain as in www.maximphotostudio.net/weddings/6081/columbus_wedding_photography/ or maybe www.maximphotostudio.net/6081/columbus_wedding_photography and www.maximphotostudio.net/6082/dayton_wedding_photography. Any help is appreciated.
On-Page Optimization | | maximphotostudio0