New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Using Schema markup for Feefo reviews
I am a little confused about whether or not it is ok to use Schema markup with reviews collected through Feefo. We use Feefo to collect reviews from our customers and these get displayed on our website. We get service ratings as well as product ratings through Feefo. My question is: Is it ok to use Schema markup for these? I would have thought they would fall under 3rd party reviews, but this article from the Feefo website seems to suggest that it would be ok to use markup in the way they recommend. Can anyone confirm how Google handles review markup like this? Thanks in advance!
On-Page Optimization | | ViviCa10 -
Link flow for multiple links to same URL
Hi there,
On-Page Optimization | | doctecs
my question is as follows: How does Google handle link flow if two links in a given page point to the same URL? (do they flow link individually or not?) This seems to be a newbie question, but actually it seems that there is little evidence and even also little consensus in the SEO community about this detail. Answers should include source Information about the current state of art at Google is preferable The question is not about anchor text, general best practises for linking, "PageRank is dead" etc. We do know that the "historical" PageRank was implemented (a long time ago) without special handling for multiple links, as e.g. last stated by Matt Cutts in this video: http://searchengineland.com/googles-matt-cutts-one-page-two-links-page-counted-first-link-192718 On the other hand, many people from the SEO community say that only the first link counts. But so far I could not find any data to back this up, which is quite surprising.0 -
Any benefit to using HeadSpace AND All in One?
I noticed that somewhere along the line (outside developer or SEO) I ended up with HeadSpace AND All in One on one of my WP sites. There are functions that I appreciate with both and I wonder if there is any danger to completing both forms for a post or page? Is there really any benefit or just a waste of time? I keep finding articles that compare the 2, but nothing that talks about using them together. If I get rid of All in One, …. WOW. Mid question, i realized I'm a dum-dum. All in One has the same no follow options I thought I would miss from HeadSpace. So new question…if I uninstall headspace, will I lose the data/settings that it was used to set up? Jenn
On-Page Optimization | | vernonmack0 -
Should I use nofollow when interlinking large, networked sites?
My company runs a network of very large networked sites, each with thousands of content pages. In our main navigation we are currently not nofollowing links between these networked sites. The links appear on every single page in the top navigation, and there are thousands of pages on each site. I am worried this will look to Google like we have suspiciously received thousands of links from one domain - one link from every page on the domain. Should we be nofollowing these navigation links between the different sites in our network?
On-Page Optimization | | Natasha90040 -
Using a keyword on homepage of a blog
I have a blog and the homepage has the 5 most recent posts. I ran a report card on my homepage for my main keyword. One of the problems is that the keyword only appears 1 time. I don't want to put it in the signature of every post because I found that causing problems with self-cannibalizing. I checked my competitor and they got a check mark for this but I looked at their homepage and I found the keyword NOWHERE! So where is my competitor hiding the keywords and how can I get the keywords on the homepage when the content is constantly changing? Thanks in advance!
On-Page Optimization | | 2bloggers0 -
2 Question about URL structure
Hello guys 1 - I have a question about the best structure for URLs from the point of view of SEO: Is it OK to use the URL as mywebsite.com.br/long-tail-article Or is better this mywebsite.com.br/category/long-tail-article 2 - When part of my keyword is already in my "category", for example: mywebsite.com.br/digital-marketing/digital-marketing-is-good I leave it as it is, or in the following way: mywebsite.com.br/digital-marketing-is-good NOTE: Do not take into account that this URL would be different from other URLs in this category
On-Page Optimization | | seomasterbrasil0 -
Directory site with an URL structure dilemma
Hello, We run a site, which lists local businesses and tag them by their nature of business (similar to Yelp). Our problem is, that our category and sub-category(i.e.: www.example.com/budapest/restaurant or www.example.com/budapest/cars/spare-parts) pages are extremely weak, and get almost no traffic, but most of the traffic (95+ percent) goes for the actual business pages. While this might be a completely normal thing, I still would like to strengthen our category (listing) pages as well, as these should be the ones targeted by some of general keywords, like ‘restaurant’ or ‘restaurant+budapest’. One of the issues I have identified as a possible problem, that we do not have a clear hierarchy within the site, so while the main category pages are linked from the homepage (and the sub-categories from here), there is no bottom-up linking from the business pages back to the category pages, as the business page URLs look like this: www.example.com/business/onyx-restaurant-budapest. I think, that the good site- and url structure for the above would be like this: www.example.com/budapest/restaurant/hungarian/onyx-restaurant. My only issue is, perhaps not with the restaurants but with others, that some of the businesses have multiple tags, so they can be tagged i.e. as car saloon, auto repair and spare parts at the same time. Sometimes, they even have 5+ tags on them. My idea is, that I will try to identify a primary tag for all the businesses (we maintain 99 percent of them right now), and the rest of their tags would be secondary ones. I would then use canonicalization and mark the page with the primary tag in the url as the preferred one for that specific content. With this scenario, I might have several URLs with the same content (complete duplicates), but they would point to one page only as the preferred one, while our visitors could still reach the businesses in any preferred ways, so either by looking for car saloons, auto-repair or spare parts. This way, we could also have breadcrumbs on all the pages, which now we miss completely. Can this be a feasible scenario? Might it have a side-effect? Any hints on how to do it a better way? Many thanks, Andras
On-Page Optimization | | Dilbak0