Moz Crawler suddenly reporting 1000s of duplicates (BE.net)

Progauto

In the last 3-4 days we've had several thousand 'duplicate content' warnings appear in our crawl report, 99% of them related to our on-site blog. The blog is BlogEngine.Net, but the pages simply don't exist. The majority seem to be Roger trying quasi-random URLs like:
/?page=410

/?page=151

Etc. etc. The blog will present content for these requests, but it is of course the same empty page since there's only unique content for up to /?Page=10 or so.

Two questions:

1. Did something change recently? These blogs have been up for months, and this problem has only come up this week. Did Roger change to become more aggressive lately?

2. Suggested remediation? On one of the blogs I've put no-index no-follow for any page that has a /?page querystring, and we'll see what effect that has come next crawl next week. However, I'm not sure this will work as per:

http://moz.com/community/q/functionality-of-seomoz-crawl-page-reports

Anyone else had dynamic blogs suddenly blossom into thousands of duplicate content warnings? Google (rightly) ignores these pages completely.

Progauto

Hate to bump my own question, but it appears I spoke too soon about no-index,no-follow solving this. The duplicate errors went away for about 5 days, but then yesterday spiked with the same problem. I've confirmed that no-index, no-follow are present on the pages being detected as bad.

As per the best practices document:

http://moz.com/learn/seo/robotstxt

Using meta robots no index no follow is the recommended option:

Block with Meta NoIndex

This tells engines they can visit, but are not allowed to display the URL in results. This is the recommended method

But it apparently isn't working, as evidenced by the new surge of duplicate errors. Is there anything else I can do? I don't want to explicitly block Roger in robots.txt as that seems rather backward. Should Roger be included the Bad Robots List?

Progauto

Peter -

Thanks for the clarification. I understand the philosophy at hand, and I kind of even understood it before I had asked the question. I'm handling these with a mix of canonical and no-index/no-robot.

Related to that, update:

By marking the superfluous pages no-index/no-follow the error count for the site has diminished by about 10,000 and the warning count by about 28,000 so that seems to be the way to go. The pages that had content are 'low value' in this context, since that content was readily available elsewhere.

Peterli

Hi there!

Thanks for writing in with a great question.

We definitely count those dynamic URLs as duplicate content. While we are pretty sure that search engines can figure this stuff out and know which URL to index, it's still considered best practices to canonicalize or otherwise direct crawlers to the original URL (as far as I know. I'm not a professional SEO so you might be better off asking the Pro Q&A community at www.moz.com/community/q - they are all SEOs like you).

Since some dynamic URL generators can cause problems for crawlers, we do try to be overly-inclusive of these issues rather than overly-exclusive. We want people to know about potential issues with sites, even if they're not really issues in the scheme of the site owner's specific SEO implementation plan.

In sum, we'd rather leave those judgments up to you and at the same time, provide you with the data you need to make these decisions. I hope this helps explain our thinking here! However, if you think that our crawler might be having issues, and you do not want to post your site urls here you could always send us a support ticket at help@moz.com. That way can can examine it a bit further and provide some insights into why our crawler thinks this way!

Hope this helps!

Peter
Moz Help Team.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Crawler suddenly reporting 1000s of duplicates (BE.net)

Block with Meta NoIndex

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Google is reporting a server error, but there's no server error.

What is the "UPDATE" indicate in the Google Search Console Query Reports?

New GSC Search Analytics report: position mixes web and image

Sudden Increase In Number of Pages Indexed By Google Webmaster When No New Pages Added

404 errors more than 1.8 lacs, Duplicate Content, Duplicate title, missing meta description increasing as site is based on regular ticket selling (CRM), kindly help

How are 301s reported in GA?

Duplicate page content

On what report do I get to know where do the external links come from?