Moz Crawler URL paramaters & duplicate content
-
Hi all, this is my first post on Moz Q&A
Questions:
- Does the Moz Crawler take into account rel="canonical" for search results pages with sorting / filtering URL parameters?
- How much time does it take for an issue to disappear from the issues list after it's been corrected? Does it come op in the next weekly report?
I'm asking because the crawler is reporting 50k+ pages crawled, when in reality, this number should be closer to 1000. All pages with query parameters have the correct canonical tag pointing to the root URL, so I'm wondering whether I need to noindex the other pages for the crawler to report correct data?:
Original (canonical URL): DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas
Filter active URL: DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas&booking_date=&booking_days=1&booking_persons=1&priceFilter%5B%5D=0%2C500&includedPriceFilter%5B%5D=drinks-soft
Also, if noindex is the only solution, will it impact the ranking of the pages involved?
Note: Google and Bing are semi-successful in reporting index page count, each reporting around 2.5k result pages when using the site:DOMAIN.com query. The rel canonical tag was missing for a short period of time about 4 weeks ago, but since fixing the issue these pages still haven't been deindexed.
Appreciate any suggestions regarding Moz Crawler & Google / Bing index count!
-
Happy to help!
We crawled roughly 49k pages because there were that many links on the site that we could find. 50k is also the new standard crawl limit for campaigns in Standard and Medium subscriptions. Adding a rel=canonical to a page doesn't mean it won't get crawled by our campaign crawler, only that the crawler is to refer to the canonicalized link for reporting purposes.
Without going into too specific of URL details, these pages are considered duplicates because their canonical tags point to different URLs. For example,
is considered a duplicate of
DOMAIN.COM/charters/search/mx/QR?booking_date=&booking_days=&booking_persons=limit%252525253D20
because the canonical tag for the first page is
DOMAIN.COM/charters/search/mx/QR?offset=20
while the canonical for the second URL is
DOMAIN.COM/charters/search/mx/QR
Since the canonical tags point to different pages it is assumed that DOMAIN.COM/charters/search/mx/QR?offset=20 and DOMAIN.COM/charters/search/mx/QR are likely to be duplicates themselves.
Here is how our system interprets duplicate content vs. rel=canonical:
Assuming A, B, C, and D are all duplicates,
If A references B as the canonical, then they are not considered duplicates
If A and B both reference C as canonical, A and B are not considered duplicates of each other
If A references C as a canonical, A and B are considered duplicated
If A references C as canonical, B references D, then A and B are considered duplicatesThe above example from your campaign actually falls into the fourth example I've listed above. Hope this helps clear things up
-
Thanks Sam!
I've read the post and checked my canonical tags but still can't seem to find what's causing the canonicalized pages to be indexed by RogerBot. The same page shows up in Moz's crawl test 100 times with slightly different parameters.
I'll keep investigating but some specific feedback from Moz staff would be appreciated
-
Hi!
I'm going to leave the strategy discussion open to the community but from a technical standpoint, we will count rel=canonical on dynamic urls as long as they are implemented correctly. Dr. Pete has a great post where he talks about canonicals that might be helpful as well. Updates to campaigns happen on a weekly basis depending on when the campaign was created. So if it was created on a Tuesday, you'll see updated campaign data every Tuesday after. You can run a crawl test (accessible from Research Tools) to get 3k page crawls in between your updates though. Hope this helps!
-
Thanks for the info searchbuzz. So if I understand correctly, new pages are crawled and kept in the index (up to the campaign limit), but issues on indexed pages are reported separately.
My issue is that due to the dynamic URLs used in search filters on my site I actually have 49k issues detected (over 95% are duplicate content and long URL issues because the crawler is indexing the same page many times for each URL parameter combination). The crawl test can't index the entire site because it generates a huge amount of pages.
It's a travel-related website with listings in 233 cities and multiple filter functionality, so each unique 'page' of results is indexed more than 100 times, even though there's a rel="canonical" tag pointing to the non-parametrized URL of that page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content found in scan
On June 8th we ran a Moz Crawl on our site. We found 144 pages that were flagged with duplicate content.
Moz Bar | | StickyLife
Again on June 13th we ran another moz crawl on our site and found 137 pages that were flagged with duplicate content. Then one final scan on June 22nd with 161 pages of duplicate content. After comparing the 3 different scans I see that, without making any changes, pages that were not flagged as duplicate content are now being flagged as duplicate content. While at the same time, pages that were originally flagged as duplicate content are now no longer showing up with duplicate content. I could understand if we made some changes to these pages but no changes were made. For example: On the 8th this page was flagged as duplicate content - https://www.stickylife.com/star-magnet
On the 13th and 22nd it was not flagged as duplicate content but no changes were made to that page. For reference it was flagged as duplicate content with the following page: https://www.stickylife.com/baseball-glove-magnet This page was also Not changed or altered between between these dates. In addition, when Moz scans our site through our campaign every Friday the results do not match what we see when we do a manual scan. Moz's weekly scan only reveals 14 pages with duplicate content as opposed to the numbers you see above. Why such inconsistencies in the Moz Scans?0 -
Not sure where this url has come from
can anyone please let me know why this has happened on my site. I have just done a crawl test and it comes back with the following <colgroup><col width="576"></colgroup>
Moz Bar | | in2townpublicrelations
| http://howtodrinkless.com/web/20150709201150/http:/www.howtodrinkless.com/ |0 -
How to upload the bulk Keywords with Tags in MOZ Rank Tracker Tool?
Trying to upload multiple keywords at a time with their different Tags. But here i can upload the keyword one by one also i am not able to associate tags with the keyword.
Moz Bar | | _nitman2 -
My product pages have no weight / links from root domains with the Moz tool bar
Hi, When I view my home page (http://www.arkwildlife.co.uk) with the Moz toolbar, I see a good PA and a good amount of links from root domains. As I go down the site, it seems to get worse. The category pages (http://www.arkwildlife.co.uk/Category/0/Straight_Foods.html) have a little but not much and then from this point onwards, it's nothing. The product page (http://www.arkwildlife.co.uk/Item/Straight_Foods~Sunflower_Seeds/SUNH/Premium_Sunflower_Hearts.html) is reporting to have no root domain links but I am not sure why. Interestingly, when I click through to a review page (http://www.arkwildlife.co.uk/StockReview/0/SUNH/0/Premium_Sunflower_Hearts.html) it does have some juice. Would anyone be able help on why this is happening and what I need to be looking at in order to resolve it please? EDIT: I've been looking at the hyperlinks and notice something odd. If I review the score with the first link below, it gives a score of 1, but the second gives a PA of 13 with one root domain linked. 1:http://www.arkwildlife.co.uk/Item/Straight_Foods~Sunflower_Seeds/SUNH/Premium_Sunflower_Hearts.html 2:http://www.arkwildlife.co.uk/Item/Straight_Foods%7ESunflower_Seeds/SUNH/Premium_Sunflower_Hearts.html Please note the "%7E" instead of the "~" in the URL. The browser appears to show the ~ character no matter what but the rank of the page changes. I don't understand what the Moz toolbar is doing with this. Note: This behaviour only happens in Mozilla Firefox, in chrome both the rankings are zero for each URL. Many Thanks.
Moz Bar | | nawgie0 -
Duplicate content errors
Hi I am getting some errors for duplicate content errors in my crawl report for some of our products www.....com/brand/productname1.html www.....com/section/productname1.html www.....com/productname1.html we have canonical in the header for all three pages rel="canonical" href="www....com/productname1.html" />
Moz Bar | | phes0 -
Why does Moz Analytics drop my Google Analytics connection every time I login?
I get the following message every time I check the Moz Analytics dashboard on any of the sites I manage. It seems like a bug in the system. Does anyone know how to fix it? "We lost connection with your Google Analytics account. Don't worry — you won't lose any data. Please reauthorize now."
Moz Bar | | cbizzle0 -
Moz Analytics only working with Google Analytics?
Hi, My company uses Piwik for analytics. Do you know if there is a way to make the new Moz Analytics work with Piwik? Or if Piwik support will be provided at some point? Thanks! P.S. : I know it would be easier to switch to Google Analytics, but for many reasons we can't 😞
Moz Bar | | Xilopix0 -
Does Moz Pro generate similar keyword phrases in a list (preferably showing their difficulty %) or is it only one phrase at a time with no similar words/phrases suggested?
I just signed up for Moz Pro but the keyword research seems to only let you try one keyword phrase at a time. Is there a way for it to give related keywords along with their difficulty % info, etc. It is far too slow and inconvenient doing one at a time.
Moz Bar | | SavingSpotlight0