Culling 99% of a website's pages. Will this cause irreparable damage?
-
I have a large travel site that has over 140,000 pages. The problem I have is that the majority of pages are filled with dupe content. When Panda came in, our rankings were obliterated, so I am trying to isolate the unique content on the site and go forward with that.
The problem is, the site has been going for over 10 years, with every man and his dog copying content from it. It seems that our travel guides have been largely left untouched and are the only unique content that I can find. We have 1000 travel guides in total.
My first question is, would reducing 140,000 pages to just 1,000 ruin the site's authority in any way?
The site does use internal linking within these pages, so culling them will remove thousands of internal links throughout the site.
Also, am I right in saying that the link juice should now move to the more important pages with unique content, if redirects are set up correctly?
And finally, how would you go about redirecting all theses pages? I will be culling a huge amount of hotel pages, would you consider redirecting all of these to the generic hotels page of the site?
Thanks for your time, I know this is quite a long one,
Nick
-
Thank you all for the positive feedback.
Lately I have made the time for SEOmoz Q&A as I have been doing various SEO research and these boards can be a great way to stretch thought processes.
-
Seriously, Ryan is always ALL OVER Seomoz comments with good feedback
-
Just figured out how to do this, I'm new to SEOMoz Q&A, thanks for the nudge! Ryan certainly deserves it!
-
I do hope Ryan gets "Good Answer" and/or "Endorsed Answer" for this... hint, hint
-
Your understanding is correct.
Google does not care how many directories appear in a URL. The two URLs you offered as an example are viewed equally by Google. What's important is how many clicks it takes users to access those links.
-
Hi Ryan,
Sorry for not getting back to you straight away, I've been in meetings all day.
You've given me some excellent ideas again!!
Just to clarify, the old URL's are in the following format:
www.url.com/resort_hotels/hotels_in_rome.asp
I am aiming to use the following structure for the new website:
or
I was wondering if you knew from a search engine perspective, which URL is the better option. From a user perspective, I would assume the second.
I am operating under the assumption that Google rates a URL's importance by the number of clicks it is from the homepage and not the number of directories (www.url.com/.../.../...) within the URL?
If this is the case I will probably go for the second URL structure, but place links higher up the hierarchical structure of the site for the more important locations.
Unfortunately, the landing pages for the cars and flights house exactly the same content with just the location text tweaked. There is nothing else unique on these pages, which is why I find myself with no other option but to get rid of them.
I really like your idea of testing landing pages for a specific area. This may be a good way to go, but creating two paragraphs of text for both the flight and car hire pages is not an option at this time. With 40,000 locations we’d need to produce 160,000 paragraphs of unique text, which would cost around $400,000, may be slightly less with bulk discounting.
If I was to spend that much money on content writing, I would probably expand the hotel side of the site as this is most profitable. But my priority after the launch of the new site is an extensive link building campaign to assist the transition.
Thanks so much again for your help Ryan, you're a star!
Did you know whether Google rates a URL's importance by the number of clicks it is from the homepage and not the number of directories (www.url.com/.../.../...) within the URL? It is really important that I find this one out!
Take care buddy,
Nick
-
Nick,
Sounds like you have a good strategy. I only have two additional items to share based on your latest reply.
www.url.com/resort_hotels/hotels_in_rome.asp
That url seems a bit spammy to me. Mentioning "hotels" twice is something I would avoid. I would consider something along the lines of the below options instead:
www.url.com/resorts/hotels_in_rome
www.url.com/resort_hotels/rome
I also wanted to talk about the landing pages for cars and air travel once more. Before directing all your current pages to a generic page I would take a look at the existing 140 pages and ask once again, do any of the pages have anything that is unique which can be used for the location based car and air landing pages?
Your plans are to develop these pages with quality content over time, which is great. I hate the idea of having establishing pages for each area, pulling back to having one generic page, then expanding again to location-based pages.
If you sincerely intend to develop these pages on a reasonable time period, I would suggest establishing one page for each location even if it was thin on content to start with. Driving directions, local driving laws, testimonials, anything that can be used as a starting point to hold your footing would be preferred.
If you do pull back to a generic "car rentals" page, I have two ideas. Build out your location landing page for one area such as London. Closely watch your conversion rates on users on the London page versus the generic page. If there is a significant difference, it may help speed up your transition. If you realize you are losing $$ every day you don't have those pages, then perhaps you can hire additional help to speed up the process.
The final idea would be to build country-based landing pages for car rentals as an stop-gap measure. Your Milan, Rome, etc pages could all direct to "Cars Italy" and "Air Italy".
There are tons of choices on the internet for travel providers. You have an extremely well established user base. My top concern for any migration is to maintain all my existing relationships. Some travel sites do great with a single landing page for air/cars/hotels. It sounds like your site has catered to clients in a specific way, and I would be sensitive to maintaining your current user experience.
One last idea that just came to me. After the migration poll users for feedback. Take surveys, offer discounts, generate hype but engage users because they will offer a different point of view which you may not have considered.
-
Ryan, you have given me some excellent ideas here and a great overall structure to make the transition between sites. I can't thank you enough for your help. I will certainly consult an SEO before proceeding with anything, but your insight has given me a lot to think about.
With regards to the sites current pages, the majority of locations only have 3 pages; Hotels, Car Hire & Flights. It is the amount of locations covered that make the site so expansive.
So with Hotels being our biggest earner, my idea going forward was to:
-
Use the travel guide's unique content for the hotel landing pages, i.e. [Hotels in Rome]
-
Redirect all of the old Car & Air location pages to the new website’s generic Car Hire & Flights pages.
This would mean that there wouldn’t be any location-based pages for Flights and Car Hire. The idea would be to build these up gradually as it would take some time and money to add the unique content required.
- From every hotel landing page we would use anchor text to promote the generic Flights and Car Hire pages. For example, [Buy Cheap Flights] or [Cheap Car Hire]
This additional anchor text should help our external link building and the generic Flights and Car Hire pages would house a search form for users to search any location.
So essentially, the majority of the site would be made up of Hotel landing pages, until we began building the site further.
I can see that your main concern is that the correct redirects are in place.
The site currently has the following URL structures, with locations for each:
Apart from the sitemaps, each have locations with them, for example:
www.url.com/resort_hotels/hotels_in_rome.asp
So my idea is to:
1) Redirect all “resort_hotels” URL’s to their relevant hotel page on the new website, for example,
www.url.com/resort_hotels/hotels_in_rome.asp
will go to the “Hotels in Rome” page on the new website.
- The rest of the pages will be redirected to the home page for their category, for example,
will go to “Flights” home page on the new website; and,
will go to “Car Hire” home page on the new website, etc.
Unless there is something really wrong with this strategy, or you have any instant criticism, I would like to thank you for your help again and ask that if you need anything, please don’t hesitate to drop me a message on here. You have given up enough your time and I’m more than grateful.
Kind Regards,
Nick
[I am using my work’s account, which is why I am displayed as Steve]
-
-
The transition I mentioned would allow for a smoother migration process rather then a "cold turkey" switch from the old site to the new site. You clearly recognize the end goal is to create your new site and delete the old site. The good news is that change does not have to happen over night.
You can build out your new site completely and go live with it. At that point you would update any external links you control along with your advertisements, signatures, etc. You would also want to reach out to partners and any sites with links that you can influence. Update those links so they point to your new pages.
The final step is the redirection of your 140k page old site to the appropriate pages on the new site. Clearly you wish to begin with the most prominent pages such as your landing pages along with any important pages such as "Contact Us", your reservation system, etc.
The next step would be applying your redirect rules to the remaining pages. Extensive testing will be required.
You should set up GA or another tracking tool to monitor your old site. You will want to closely monitor activity for quite some time. Specifically look for any issues with 404s and multiple redirects.
With respect to your anchor text, I suspect it was used to sculpt your site so your link value was focused on a particular page for each topic. When you have 140 pages on a given topic, you can pursue an incredible amount of longtail phrases. Now I suspect you may have 4 pages for each area: Rome, Rome by Air, Rome by Car, and Rome hotels. If that is the case your future anchor text linking will be a lot more straight forward.
I want to say "I wouldn't be concerned about the anchor text" but you have a major project ahead of you, you are highly dependent on SEO and there are many opportunities for something to go wrong. In that context, I would share the anchor text would be on the list of things to think about, but the proper redirects is a much larger concern.
A final thought I would offer: this is all high level, generic advice. I would recommend hiring a SEO who could offer a proper evaluation of your site along with a migration plan. Once the change has been completed and tested, you should gain many advantages with your new site. Hopefully they will offset any loss from the migration. Once you are confident in your new site, I would recommend a SEO campaign promoting your new site.
-
Hi again Ryan,
All the URL's are currently coded as .asp (www.url.com/Rome.asp) and we aim to build the new site with user friendly permalinks (www.url.com/Rome). So in answer to your question, yes, the sites could co-exist.
I'd hadn't thought of doing it this way, what a great idea.
With regards to the site's internal linking structure, I'm probably not explaining myself correctly. I understand that all of the site's juice needs to be recycled, but I'm now thinking that on many of the 120,000 pages there are links with anchor text to other relevant parts of the site, will removing these links, because there are so many of them, ruin the site's authority.
In addition, I would be really interested to hear your ideas on staging a transition.
I can't thank you enough for this Ryan, my head's spinning at the moment!
-
You are on the right track. The link value from your existing pages must be saved.
Prior to offering a further reply I would like to ask a couple questions:
-
how are your currently URLs coded? As an older site I presume your page URLs end in .asp?
-
will your new design also be in asp?
What I am trying to determine is, will the new site require new URLs. If your current page is /rome.asp and the new page will be /rome.php then the URL will change so both your new site and old site can co-exist at the same domain. This process will be helpful for staging a transition.
PS. My recommendation for URLs would be to use friendly URLs which do not show an extension (i.e. /rome) but that is not the present focus.
-
-
Thanks for a swift answer Ryan, very helpful indeed!
Put simply, the site is split into three key areas, Hotels, Flights & Car Rentals, each with about 40,000 pages each. The problem is that each of these pages uses a generic paragraph or two that is more or less the same, but tweaked slightly to update the location in question. For example,
"Our goal is to provide the best choice of hotels in Rome."...
"Our goal is to provide the best choice of hotels in Barcelona."...
Obviously Google sees this as duplicate content and rightly so, but other than rewriting 120,000 pages of content, I can't see an alternative to the problem, other than to remove the pages in question.
The site has so many quality links going into it, from authorities all over the web; it would be a shame to waste this juice on pages that are getting penalised.
The travel guide areas are all unique; there is a single guide for each of the 1000 destinations. For example,
http://www.url.com/guides/rome
My idea was to use this unique content to promote our hotel pages, for example,
http://www.url.com/hotels-in-rome
This page would have the unique travel content from that area plus a list of the hotels we have available in Rome.
Every other duped page on the site relating to "Rome", does have "Rome" in its URL, so a regex expression could be used to redirect all "Rome" themed pages to the "Hotels in Rome" page that would house the unique content.
All other pages that did not have unique content written about them could be redirected to the generic Hotels, Flights or Car Rentals pages as they all have either “hotels”, “flights” or “car-rental” in their URL.
The site is over 10 years old, is written in .asp and is managed with a bespoke piece of software created specifically for the site itself.
However, this doesn't really matter as we’re having the website redesigned at the same time as removing the dupe content and it will be built to our own specification.
My idea is to begin building up these locations from when the redesign goes live. This way I could keep a track of our content as it expands.
My main worry is that the culling of these pages, will remove 99% of our internal linking structure. And I'm wondering if removing this will dramatically reduce the authority of the site. However, at this point I’m struggling to see another option.
Sorry for the length of this reply, any ideas would be welcome, I just thought it would be best if you knew a bit more of the background.
Thanks again Ryan!
-
Hi Steve.
I would suggest taking a good look at your content pages. I understand having dupe pages, but you are suggesting a possible 140:1 ratio which is....wow.
As I am sure you are aware, there is not going to be any quick and easy fix. Here are some initial thoughts:
The first step I would take is to look for ANY commonalities between pages you can grab. For example, you mention 1000 travel guides. Are all the travel areas unique? For example, is there only one guide for Rome? If so, do all the duplicate Rome pages have "Rome" in the URL? If so, you can consider adding a regex expression to your htaccess file (presuming you are on an apache server) which could cover your 301s.
Is there any unique code on the pages which are common for a given guide? Using the above example, do all the "Rome" pages have "Rome" in the page title? If so, you could possibly update all the "Rome" pages by adding the correct canonical to the page's meta information. At this point it would truly depend on how your site is coded. Do you have a CMS? In what language are your pages written?
The bottom line, there is simply too much value in those pages to discard them. They each need to be properly 301'd as the preferred method. The 301s really need to be handled with general expressions which cover a large number of pages at once. You cannot use individual redirects even if you wanted to as it would cripple your web server.
I would not redirect all the pages to a single home page unless every other opportunity was completely explored.
-
Wow that's a big, bold move! I don't know how to answer it but if I were you I'd wait until you get a few, nice and comprehensive answers on here before doing anything to drastic. Either that or use a private Q&A question to SEOmoz staff if you have any points spare to do so. With such a large change, you want to ensure you're doing it right.
I'll be interested to see the answers you get for this.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does creating too many parent pages damage my website's SEO?
I need to know how to keep my website structure well organised and ensure Google still recognises the key pages. I work for a travel company which needs to give customers various pieces of information on our website and this needs to be well organised in terms of structure. For example, customers need information on airport pick-ups and drop-offs for each of our destinations but this isn't something that needs to rank on Google. Logically for site structure would be to create a parent page: thedragontrip.com/transfers/india Is creating parent pages for unimportant content a bad idea?
Intermediate & Advanced SEO | | nicolewretham1 -
Moved brand's shop to a new domain. will our organic traffic recuperate?
Hello, We are a healthcare company with a strong domain authority and several thousand pages of service related content at brand.com. We've been operating an ancillary ecommerce store that sells related 3rd party products at brand.com/shop for a little over a year. We recently invested in a platform upgrade and moved our site to a new domain, brandshop.com. We implemented page-level 301 redirects including all category pages, product detail pages, canonical and non-canonical URLs, etc.. which the understanding that there would not be any loss in page rank. What we're seeing over the last 2 months is an initial dive in organic traffic, followed by a ramp-up period of if impressions (but not position) in the following weeks, another drop and we've steady at this low for the last 2 weeks. Another area that might have hurt us, the 301 redirects were implemented correctly immediately post launch (on a wednesday), but it was discovered on the following Monday that our .htaccess file had reverted to an old version without the redirect rules. For 3-4 days, all traffic was being redirected from brand.com/shop/url to brandshop.com/badurl. Can we expect to recover our organic traffic giving the launch screw up with the .htaccess file, or is it more of an issue with us separating from the brand.com domain? Thanks,
Intermediate & Advanced SEO | | eugene_p
Eugene0 -
Google is ranking the wrong page and I don't know why?
I have an E-Commerce store and to make things easy, let's say I am selling shoes. There is: Category named 'Shoes' and 3 products 'Sport shoes', 'Hiking shoes' and 'Dancing shoes' My problem: For the keyword 'Shoes' Google is showing the product result 'Sport shoes'. This makes no sense from user perspective. (It's like searching for 'iPhone' and getting a result for 'iPhone 4s' instead of a general overview.) Now what are the specifics of my category page (Which I want Google to rank): It has more external links with higher quality It has more internal links It has much higher page authority It has useful text to guide the user for the keyword It is a category instead of a product All this given, I just don't know how I can signal Google that this page makes sense to show in SERPs? Hope you can help with this!
Intermediate & Advanced SEO | | soralsokal0 -
My Website Has a Google Penalty, But I Can't Disavow Links
I have a client who has definitely been penalized, rankings dropped for all keywords and hundreds of malicious backlinks when checked with WebMeUp....However, when I run the backlink portfolio on Moz, or any other tool, they don't appear anyone, and all the links are dead when I click on the actual URL. That being said, I can't disavow links that don't exist, and they don't show up in Webmaster Tools, but I KNOW this site has been penalized. Also- I noticed this today (attached). Any suggestions? I've never come across this issue before. xT6JNJC.png
Intermediate & Advanced SEO | | 01023450 -
How do I get rel='canonical' to eliminate the trailing slash on my home page??
I have been searching high and low. Please help if you can, and thank you if you spend the time reading this. I think this issue may be affecting most pages. SUMMARY: I want to eliminate the trailing slash that is appended to my website. SPECIFIC ISSUE: I want www.threewaystoharems.com to showing up to users and search engines without the trailing slash but try as I might it shows up like www.threewaystoharems.com/ which is the canonical link. WHY? and I'm concerned my back-links to the link without the trailing slash will not be recognized but most people are going to backlink me without a trailing slash. I don't want to loose linkjuice from the people and the search engines not being in consensus about what my page address is. THINGS I"VE TRIED: (1) I've gone in my wordpress settings under permalinks and tried to specify no trailing slash. I can do this here but not for the home page. (2) I've tried using the SEO by yoast to set the canonical page. This would work if I had a static front page, but my front page is of blog posts and so there is no advanced page settings to set the canonical tag. (3) I'd like to just find the source code of the home page, but because it is CSS, I don't know where to find the reference. I have gone into the css files of my wordpress theme looking in header and index and everywhere else looking for a specification of what the canonical page is. I am not able to find it. I'm thinking it is actually specified in the .htaccess file. (4) Went into cpanel file manager looking for files that contain Canonical. I only found a file called canonical.php . the only thing that seemed like it was worth changing was changing line 139 from $redirect_url = home_url('/'); to $redirect_url = home_url(''); nothing happened. I'm thinking it is actually specified in the .htaccess file. (5) I have gone through the .htaccess file and put thes 4 lines at the top (didn't redirect or create the proper canonical link) and then at the bottom of the file (also didn't redirect or create the proper canonical link) : RewriteEngine on
Intermediate & Advanced SEO | | Dillman
RewriteCond %{HTTP_HOST} ^([a-z.]+)?threewaystoharems.com$ [NC]
RewriteCond %{HTTP_HOST} !^www. [NC]
RewriteRule .? http://www.%1threewaystoharems.com%{REQUEST_URI} [R=301,L] Please help friends.0 -
Can SEO increase a page's Authority? Or can Authority only be earned via #RCS?
Hi all. I am asking this question to purposefully provoke a discussion. The CEO of the company where I am the in-house SEO sent me a directive this morning. The directive is to take our Website from a PR3 site to a PR5....in 6 months. Now, I know Page Rank is a bit of a deprecated concept, but I'm sure you would agree that "Authority" is still crucial to ranking well. When he first sent me the directive it was worded like this "I want a plan in place with the goal being to "beat" a specific competitor in 6 months." When I prodded him to define "beat," i.e. did he mean "outrank" for every keyword, he answered that he wanted our site to have the same "Authority" that this particular competitor has. So I am left pondering this question: Is it possible for SEO to increase the authority of a page? Or does "Authority" come from #RCS? The second part of this question is what would you do if you were in my shoes? I have been devoting huge amounts of time on technical SEO because the Website is a mess. Because I've dedicated so much time to technical issues, link-earning has taken a back seat. In my mind, why would anyone want to link to a crappy site that has serious technical issues (slow load times, no persistent cart, lots of 404s, etc)? Shouldn't we make the site awesome before trying to get people to link to us? Given this directive to improve our site's "Authority" - would you scrap the technical SEO and go whole hog into a link-earning binge, or would you hunker down and pound away at the technical issues? Which one would you do first if you couldn't do both at the same time? Comments, thoughts and insights would be greatly appreciated.
Intermediate & Advanced SEO | | danatanseo1 -
What's the best SEO practice for having dynamic content on the same URL?
Let's use this example... www.miniclip.com and there's a function to log in... If you're logged in and a cookie checks that you're logged in and you're on page, let's say, www.miniclip.com/racing-games however the banners being displayed would have more call to action and offers on the page when a user is not logged in to entice them to sign up but the URL would still be www.miniclip.com/racing-games if and if not logged in, what would be the best URL practice for this? just do it?
Intermediate & Advanced SEO | | AdiRste0 -
Is there any negative SEO effect of having comma's in URL's?
Hello, I have a client who has a large ecommerce website. Some category names have been created with comma's in - which has meant that their software has automatically generated URL's with comma's in for every page that comes beneath the category in the site hierarchy. eg. 1 : http://shop.deliaonline.com/store/music,-dvd-and-games/dvds-and-blu_rays/ eg. 2 : http://shop.deliaonline.com/store/music,-dvd-and-games/dvds-and-blu_rays/action-and-adventure/ etc... I know that URL's with comma's in look a bit ugly! But is there 'any' SEO reason why URL's with comma's in are any less effective? Kind Regs, RB
Intermediate & Advanced SEO | | RichBestSEO0