Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Best way to "Prune" bad content from large sites?
-
I am in process of pruning my sites for low quality/thin content. The issue is that I have multiple sites with 40k + pages and need a more efficient way of finding the low quality content than looking at each page individually. Is there an ideal way to find the pages that are worth no indexing that will speed up the process but not potentially harm any valuable pages?
Current plan of action is to pull data from analytics and if the url hasn't brought any traffic in the last 12 months then it is safe to assume it is a page that is not beneficial to the site. My concern is that some of these pages might have links pointing to them and I want to make sure we don't lose that link juice. But, assuming we just no index the pages we should still have the authority pass along...and in theory, the pages that haven't brought any traffic to the site in a year probably don't have much authority to begin with.
Recommendations on best way to prune content on sites with hundreds of thousands of pages efficiently? Also, is there a benefit to no indexing the pages vs deleting them? What is the preferred method, and why?
-
I have a section of my website where I heavily use embedded content. Embeds from Youtube, Slideshare, Twitter, Quora etc. Google thinks they're thin, and they don't show up in my analytics because you can read the content without clicking on the page.
http://getonthemap.us/twitter/blog
But I like them, and I think they're helpful. So I no-indexed all but one of the blog posts in that section. It retains the backlinks to the posts, but cleans me up with Google.
If you're deleting, can't you do that quickly from your console?
-
It's hard to say exactly without seeing your site since there are so many potential variables (e.g. are most of your blog posts low quality or just a minority? etc) that would define the best way to go about it.
What I can say though is that you're on the right track as far as using analytics data to determine which ones are providing value right now. There is a danger in losing some rankings if you go removing a huge volume of these posts. Unless they're utter rubbish posts, they'll likely be providing relevance signals to Google on what your site is about. That said, I do think it's a necessary evil and I'd expect you'll be rewarded for it in the long run provided you start replacing the trash with high quality posts in the future.
As for the benefits, if they really are low quality then user engagement is going to be terrible which is obviously not what you should be aiming for. It's also going to be chewing up your crawl budget for no good reason so the leaner your site is, the better base you have to start rebuilding with quality instead of quantity. For the same reason, I generally suggest removing tags and categories that aren't providing any actual benefit too - in most cases I see they're just there either "for good SEO" or because the site owners things that's how users are browsing their site but in almost all cases, that's not true. As always, check your own data on this to be sure.
As for removing vs noindex, this one is always contentious but I lean toward removing simply because it's going to clean things up for the user too and ultimately they should be your primary focus. Having 40,000+ pages of trash on your website is a fantastic indicator to them that your site may not be somewhere they want to be and noindexing them won't do anything to change the user's experience.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Are "Powered By..." type footer backlinks good or bad for SEO?
Hi guys, We're running a software company which is also selling WP themes amongst other things. We've heard recently that footer backlinks like "Powered by BigBangThemes" might do more harm than good. Some clients usually forget to change them - so we want to make sure we stop including them in case this is true. Thanks!
Intermediate & Advanced SEO | | andy.bigbangthemes0 -
On 1 of our sites we have our Company name in the H1 on our other site we have the page title in our H1 - does anyone have any advise about the best information to have in the H1, H2 and Page Tile
We have 2 sites that have been set up slightly differently. On 1 site we have the Company name in the H1 and the product name in the page title and H2. On the other site we have the Product name in the H1 and no H2. Does anyone have any advise about the best information to have in the H1 and H2
Intermediate & Advanced SEO | | CostumeD0 -
Help FORUM ( User generated content ) SEO best practices
Hello Moz folks ! For the very first time im dealing with a massive community who rely on UGC ( user generated content ). Their forum is finding a great deal of duplicate content/broken link/ duplicate title and on-site issue. I have Advance SEO knowledge related to ecommerce or blogging but new to forum and UGC. I would really love to learn or get ressources links that would allow me to see/understand the best practices in term of SEO. Any help is greatly appreciated. Best, Yan
Intermediate & Advanced SEO | | ydesjardins2000 -
Duplicate Content www vs. non-www and best practices
I have a customer who had prior help on his website and I noticed a 301 redirect in his .htaccess Rule for duplicate content removal : www.domain.com vs domain.com RewriteCond %{HTTP_HOST} ^MY-CUSTOMER-SITE.com [NC]
Intermediate & Advanced SEO | | EnvoyWeb
RewriteRule (.*) http://www.MY-CUSTOMER-SITE.com/$1 [R=301,L,NC] The result of this rule is that i type MY-CUSTOMER-SITE.com in the browser and it redirects to www.MY-CUSTOMER-SITE.com I wonder if this is causing issues in SERPS. If I have some inbound links pointing to www.MY-CUSTOMER-SITE.com and some pointing to MY-CUSTOMER-SITE.com, I would think that this rewrite isn't necessary as it would seem that Googlebot is smart enough to know that these aren't two sites. -----Can you comment on whether this is a best practice for all domains?
-----I've run a report for backlinks. If my thought is true that there are some pointing to www.www.MY-CUSTOMER-SITE.com and some to the www.MY-CUSTOMER-SITE.com, is there any value in addressing this?0 -
Do links to PDF's on my site pass "link juice"?
Hi, I have recently started a project on one of my sites, working with a branch of the U.S. government, where I will be hosting and publishing some of their PDF documents for free for people to use. The great SEO side of this is that they link to my site. The thing is, they are linking directly to the PDF files themselves, not the page with the link to the PDF files. So my question is, does that give me any SEO benefit? While the PDF is hosted on my site, there are no links in it that would allow a spider to start from the PDF and crawl the rest of my site. So do I get any benefit from these great links? If not, does anybody have any suggestions on how I could get credit for them. Keep in mind that editing the PDF's are not allowed by the government. Thanks.
Intermediate & Advanced SEO | | rayvensoft0 -
News sites & Duplicate content
Hi SEOMoz I would like to know, in your opinion and according to 'industry' best practice, how do you get around duplicate content on a news site if all news sites buy their "news" from a central place in the world? Let me give you some more insight to what I am talking about. My client has a website that is purely focuses on news. Local news in one of the African Countries to be specific. Now, what we noticed the past few months is that the site is not ranking to it's full potential. We investigated, checked our keyword research, our site structure, interlinking, site speed, code to html ratio you name it we checked it. What we did pic up when looking at duplicate content is that the site is flagged by Google as duplicated, BUT so is most of the news sites because they all get their content from the same place. News get sold by big companies in the US (no I'm not from the US so cant say specifically where it is from) and they usually have disclaimers with these content pieces that you can't change the headline and story significantly, so we do have quite a few journalists that rewrites the news stories, they try and keep it as close to the original as possible but they still change it to fit our targeted audience - where my second point comes in. Even though the content has been duplicated, our site is more relevant to what our users are searching for than the bigger news related websites in the world because we do hyper local everything. news, jobs, property etc. All we need to do is get off this duplicate content issue, in general we rewrite the content completely to be unique if a site has duplication problems, but on a media site, im a little bit lost. Because I haven't had something like this before. Would like to hear some thoughts on this. Thanks,
Intermediate & Advanced SEO | | 360eight-SEO
Chris Captivate0 -
How To Best Close An eCommerce Site?
We're closing down one of our eCommerce sites. What is the best approach to do this? The site has a modest link profile (a young site). It does have a run of site link to the parent site. It also has a couple hundred email subscribers and established accounts. Is there a gradual way to do this? How do I treat the subscribers and account holders? The impact won't be great, but I want to minimize collateral damage as much as possible. Thanks.
Intermediate & Advanced SEO | | AWCthreads0 -
Posing QU's on Google Variables "aclk", "gclid" "cd", "/aclk" "/search", "/url" etc
I've been doing a bit of stats research prompted by read the recent ranking blog http://www.seomoz.org/blog/gettings-rankings-into-ga-using-custom-variables There are a few things that have come up in my research that I'd like to clear up. The below analysis has been done on my "conversions". 1/. What does "/aclk" mean in the Referrer URL? I have noticed a strong correlation between this and "gclid" in the landing page variable. Does it mean "ad click" ?? Although they seem to "closely" correlate they don't exactly, so when I have /aclk in the referrer Url MOSTLY I have gclid in the landing page URL. BUT not always, and the same applies vice versa. It's pretty vital that I know what is the best way to monitor adwords PPC, so what is the best variable to go on? - Currently I am using "gclid", but I have about 25% extra referral URL's with /aclk in that dont have "gclid" in - so am I underestimating my number of PPC conversions? 2/. The use of the variable "cd" is great, but it is not always present. I have noticed that 99% of my google "Referrer URL's" either start with:
Intermediate & Advanced SEO | | James77
/aclk - No cd value
/search - No cd value
/url - Always contains the cd variable. What do I make of this?? Thanks for the help in advance!0