Moz Crawler suddenly reporting 1000s of duplicates (BE.net)
-
In the last 3-4 days we've had several thousand 'duplicate content' warnings appear in our crawl report, 99% of them related to our on-site blog. The blog is BlogEngine.Net, but the pages simply don't exist. The majority seem to be Roger trying quasi-random URLs like:
/?page=410/?page=151
Etc. etc. The blog will present content for these requests, but it is of course the same empty page since there's only unique content for up to /?Page=10 or so.
Two questions:
1. Did something change recently? These blogs have been up for months, and this problem has only come up this week. Did Roger change to become more aggressive lately?
2. Suggested remediation? On one of the blogs I've put no-index no-follow for any page that has a /?page querystring, and we'll see what effect that has come next crawl next week. However, I'm not sure this will work as per:
http://moz.com/community/q/functionality-of-seomoz-crawl-page-reports
Anyone else had dynamic blogs suddenly blossom into thousands of duplicate content warnings? Google (rightly) ignores these pages completely.
-
Hate to bump my own question, but it appears I spoke too soon about no-index,no-follow solving this. The duplicate errors went away for about 5 days, but then yesterday spiked with the same problem. I've confirmed that no-index, no-follow are present on the pages being detected as bad.
As per the best practices document:
http://moz.com/learn/seo/robotstxt
Using meta robots no index no follow is the recommended option:
Block with Meta NoIndex
This tells engines they can visit, but are not allowed to display the URL in results. This is the recommended method
But it apparently isn't working, as evidenced by the new surge of duplicate errors. Is there anything else I can do? I don't want to explicitly block Roger in robots.txt as that seems rather backward. Should Roger be included the Bad Robots List?
-
Peter -
Thanks for the clarification. I understand the philosophy at hand, and I kind of even understood it before I had asked the question. I'm handling these with a mix of canonical and no-index/no-robot.
Related to that, update:
By marking the superfluous pages no-index/no-follow the error count for the site has diminished by about 10,000 and the warning count by about 28,000 so that seems to be the way to go. The pages that had content are 'low value' in this context, since that content was readily available elsewhere.
-
Hi there!
Thanks for writing in with a great question.
We definitely count those dynamic URLs as duplicate content. While we are pretty sure that search engines can figure this stuff out and know which URL to index, it's still considered best practices to canonicalize or otherwise direct crawlers to the original URL (as far as I know. I'm not a professional SEO so you might be better off asking the Pro Q&A community at www.moz.com/community/q - they are all SEOs like you).
Since some dynamic URL generators can cause problems for crawlers, we do try to be overly-inclusive of these issues rather than overly-exclusive. We want people to know about potential issues with sites, even if they're not really issues in the scheme of the site owner's specific SEO implementation plan.
In sum, we'd rather leave those judgments up to you and at the same time, provide you with the data you need to make these decisions. I hope this helps explain our thinking here! However, if you think that our crawler might be having issues, and you do not want to post your site urls here you could always send us a support ticket at help@moz.com. That way can can examine it a bit further and provide some insights into why our crawler thinks this way!
Hope this helps!
Peter
Moz Help Team.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved URL Crawl Reports providing drastic differences: Is there something wrong?
A bit at a loss here. I ran a URL crawl report at the end of January on a website( https://www.welchforbes.com/ ). There were no major critical issues at the time. No updates were made on the website (that I'm aware of), but after running another crawl on March 14, the report was short about 90 pages on the site and suddenly had a ton of 403 errors. I ran a crawl again on March 15 to check if there was perhaps a discrepancy, and the report crawled even fewer pages and had completely different results again. Is there a reason the results are differing from report to report? Is there something about the reports that I'm not understanding or is there a serious issue within the website that needs to be addressed? Jan. 28 results:
Reporting & Analytics | | OliviaKantyka
Screen Shot 2022-03-16 at 3.00.52 PM.png March 14 results:
Screen Shot 2022-03-15 at 10.31.22 AM.png March 15 results:
Screen Shot 2022-03-15 at 4.06.42 PM.png0 -
Stripping Out Referral Spam From Past Reports
Hi, I'm looking to confirm the best approach for retroactively stripping away referral spam (free buttons, SEMalt, etc.). Now to be clear, I already have filters in place to ignore them from current stats, so moving forward I'm fine. However, I'd love to go back and check untainted stats. I've setup segments using a regex to strip the root words away and it seems to be working. I have a regex setup to strip out things like: social-buttons|seoanalyses|copyrightclaims|classifiedads|jobsense|free-share-buttons|e-buyeasy|acrobats.hol|cheap-online|amezon|search-help|qut-smoking and so forth. I've been going through my referral data, noticing obvious spam, and adding their domains to my segment. Is this the optimal way for me to get a clear, untainted view of my past stats?
Reporting & Analytics | | kirmeliux0 -
Google Analytics Landing Page Report Discrepancy
I have noticed that when I run a landing page report and use the advanced option so I can view only the landing pages that include a particular string in the URL, have noticed that I in the report, the graph at the top will say one thing, but the data below says something else. For example, the graph for one particular search shows 200 Impressions, but the info below says 700 impressions and 610 clicks. Anyone seen anything similar or have any ideas why? Thanks! Craig
Reporting & Analytics | | TheCraig0 -
Pages with Duplicate Page Content
Hi Just started use the Moz and got an analytics report today! There about 104 duplicate pages apparently, the problem is that they are not duplicates, but just the way the page has been listed with a description! The site is an Opencart and every page as got the name of the site followed by the product name for the page! How do you correct this issue?? Thank for your help
Reporting & Analytics | | DRSMPR1 -
Google Analytics Auto Report Config
Buonpormeriggio from 15 degrees C, very cloudy wetherby UK 🙂 In Google Analytics I love how you can eMail dashboard reports as illustrated here:
Reporting & Analytics | | Nightwing
http://i216.photobucket.com/albums/cc53/zymurgy_bucket/philpotts-tweak-auto-date-range-2_zps3e2b3e50.jpg But a client wants a weekly report but with a week range from Tues to Mon so my question is please:
How do you or can you tweak the date range so when you select weekly it doesnt spit out the standard Mon to sunday report (or Whatever it is). Thanks is advance,
David0 -
When i first add my url to seomoz then i had a general report of all the faults my website had in SEO and suggestions where can i find it now , i cant find it ?!
when i first add my url to seomoz then i had a general report of all the faults my website had in SEO and suggestions where can i find it now , i cant find it ?!
Reporting & Analytics | | fireproductsuk0 -
Time until duplicate penalty is lifted?
Hello, I recently discovered that half of the pages on my site, about 3,500 were not being indexed or were indexing very very slow and with a heavy weight on them. I discovered the problem in the "HTML Suggestions" within Google's Webmaster Tools. An example of my main issue. All 3 of these URL were showing 200 Status OK in Google. www.getrightmusic.com/mixtape/post/ludacris_1_21_gigawatts_back_to_the_first_time www.getrightmusic.com/mixtape/post/ludacris_1_21_gigawatts_back_to_the_first_time/ www.getrightmusic.com/mixtape/ludacris_1_21_gigawatts_back_to_the_first_time I added some code to the .htaccess in order to remove the trailing slashes across the board. I also properly set up my 404 redirects, which were not properly set up by my developer (when the site "relaunched" 6 months ago 😞 ) I then added the Canonical link rel tags on the site posts/entries. I'm hoping I followed all the correct steps in fixing the issue and now, I guess, I just have to wait until the penalty gets lifted? I'm also not %100 certain that I have been penalized. I'm just assuming based on the SERP ceiling I feel and the super slow or lack of indexing my content. Any insight, help or comments would be super helpful. Thank you. Jesse
Reporting & Analytics | | getrightmusic0 -
GA custom reports involving pages and goals - what are the metrics saying?
Hi, All! I would like to create a custom report that will enable me to see which of my pages are contributing to goal completion on my site (so I can then optimize the pages that are contributing the most, with maximal ROI for the optimization investment). If I make the dimension "page/page title" and the metric "goal X completions" - which would make sense - what exactly are the numbers that I am seeing telling me? Is it how many times a person started the goal funnel from that pages (meaning every goal would appear only once and there be no overlap)? That doesn't appear to be the case with the numbers, because the headline in the main "Goals" section tells me I have 30 goal completions for that goal, for example, but the headline in the custom report (which is adding up all the numbers) is, say, 100. Or does it mean the number of times that this page was ever in the navigation path of someone who ended up completing a goal? Then the same goal would be counted multiple times, for each page in the path. Additionally, I see this strange thing on some of my reports where the actual funnel pages appear as contributing towards goals, which I guess makes sense, but again the numbers don't match up. If the goal was to get to page B, and the funnel was A->B, and there were supposedly 30 goal completions, my custom report says that A gave 28 goal completions and B gave 25. Anyone know for sure - or through testing - what the case is with all these things? Any explanations will be much appreciated!
Reporting & Analytics | | debi_zyx0