Moz Crawler URL paramaters & duplicate content
-
Hi all, this is my first post on Moz Q&A
Questions:
- Does the Moz Crawler take into account rel="canonical" for search results pages with sorting / filtering URL parameters?
- How much time does it take for an issue to disappear from the issues list after it's been corrected? Does it come op in the next weekly report?
I'm asking because the crawler is reporting 50k+ pages crawled, when in reality, this number should be closer to 1000. All pages with query parameters have the correct canonical tag pointing to the root URL, so I'm wondering whether I need to noindex the other pages for the crawler to report correct data?:
Original (canonical URL): DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas
Filter active URL: DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas&booking_date=&booking_days=1&booking_persons=1&priceFilter%5B%5D=0%2C500&includedPriceFilter%5B%5D=drinks-soft
Also, if noindex is the only solution, will it impact the ranking of the pages involved?
Note: Google and Bing are semi-successful in reporting index page count, each reporting around 2.5k result pages when using the site:DOMAIN.com query. The rel canonical tag was missing for a short period of time about 4 weeks ago, but since fixing the issue these pages still haven't been deindexed.
Appreciate any suggestions regarding Moz Crawler & Google / Bing index count!
-
Happy to help!
We crawled roughly 49k pages because there were that many links on the site that we could find. 50k is also the new standard crawl limit for campaigns in Standard and Medium subscriptions. Adding a rel=canonical to a page doesn't mean it won't get crawled by our campaign crawler, only that the crawler is to refer to the canonicalized link for reporting purposes.
Without going into too specific of URL details, these pages are considered duplicates because their canonical tags point to different URLs. For example,
is considered a duplicate of
DOMAIN.COM/charters/search/mx/QR?booking_date=&booking_days=&booking_persons=limit%252525253D20
because the canonical tag for the first page is
DOMAIN.COM/charters/search/mx/QR?offset=20
while the canonical for the second URL is
DOMAIN.COM/charters/search/mx/QR
Since the canonical tags point to different pages it is assumed that DOMAIN.COM/charters/search/mx/QR?offset=20 and DOMAIN.COM/charters/search/mx/QR are likely to be duplicates themselves.
Here is how our system interprets duplicate content vs. rel=canonical:
Assuming A, B, C, and D are all duplicates,
If A references B as the canonical, then they are not considered duplicates
If A and B both reference C as canonical, A and B are not considered duplicates of each other
If A references C as a canonical, A and B are considered duplicated
If A references C as canonical, B references D, then A and B are considered duplicatesThe above example from your campaign actually falls into the fourth example I've listed above. Hope this helps clear things up
-
Thanks Sam!
I've read the post and checked my canonical tags but still can't seem to find what's causing the canonicalized pages to be indexed by RogerBot. The same page shows up in Moz's crawl test 100 times with slightly different parameters.
I'll keep investigating but some specific feedback from Moz staff would be appreciated
-
Hi!
I'm going to leave the strategy discussion open to the community but from a technical standpoint, we will count rel=canonical on dynamic urls as long as they are implemented correctly. Dr. Pete has a great post where he talks about canonicals that might be helpful as well. Updates to campaigns happen on a weekly basis depending on when the campaign was created. So if it was created on a Tuesday, you'll see updated campaign data every Tuesday after. You can run a crawl test (accessible from Research Tools) to get 3k page crawls in between your updates though. Hope this helps!
-
Thanks for the info searchbuzz. So if I understand correctly, new pages are crawled and kept in the index (up to the campaign limit), but issues on indexed pages are reported separately.
My issue is that due to the dynamic URLs used in search filters on my site I actually have 49k issues detected (over 95% are duplicate content and long URL issues because the crawler is indexing the same page many times for each URL parameter combination). The crawl test can't index the entire site because it generates a huge amount of pages.
It's a travel-related website with listings in 233 cities and multiple filter functionality, so each unique 'page' of results is indexed more than 100 times, even though there's a rel="canonical" tag pointing to the non-parametrized URL of that page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz not updating the spam score metrics
Hi Experts, I have done everything to come over the following metrics flagged as spam on my website by Moz, few months ago. But, Moz has not updated the spam score yet. ✓Low MozTrust or MozRank Score - improved from 2 to 4.5
Moz Bar | | jamesh.rich01
✓Large Site with Few Links - My website have more than 6K backlinks
✓Small Proportion of Branded Links - My website have a good amount of branded backlinks
✓Thin Content - Every webpage on website has more than 500 words content
✓External Links in Navigation - There is no external link in navigation other than social media links
✓No Contact Info - The proper address has alreay been placed on website footer
✓Low Number of Pages Found - I am wondering if there are any standard score or number of links to reach to remove these flag?
Also, please suggest some ways that will help me improve moz spam score at faster rate. Thanks for your help in advance!"0 -
Moz Bar not providing any data. Tried logging out/back in and un/re-installing, but no dice.
Used Mozbar for a long time, and normally works fine. Suddenly finding that it is not providing any data. All of the fields are there, but it does not provide me with PA/DA, etc, and all social metrics are at 0. This is across all sites, not just on in particular. Have tried logging out and in, deactivating and activating, and reinstalling. Nothing has worked.
Moz Bar | | SearchPros2 -
4 days waiting for a Moz Crawl - How quick are yours?
Hi there Please could anyone say how long they have been waiting for crawl results. I requested a crawl on a 20 page website and I have been waiting 4 days since last weekend. I checked Moz Health and there have been no related issues there: http://health.moz.com/ Your response would be welcome. Thanks
Moz Bar | | SEOguy10 -
Has MOZ stopped including Text to Code Ratio?
My Moz extension no longer displays text to code ratio. Anyone else encounter this or know why it's gone? I see it in videos and descriptions and I used to see it in my extension, but it is no longer there.
Moz Bar | | dan.bertone0 -
Using Moz to find blogger with good PR.
Hi, My manager has recently informed me that you can use Moz in order to find blogger by category and their page rank. Is this possible? The reason for this is because we are looking for blogger with a good page rank and who are relevant to our sector, which is days out, outdoor activities, zipwires, adventure etc. We then want to contact them to see if we can invite them to write about our Tree Top Adventure and zip trekking activities. Any help would be much appreciated.
Moz Bar | | GoApe_20140 -
No Keyword in URL, but it is there
Hi,friends. I have grade A on all pages but system shows like no keyword in url, but it is there. for example i have Latvian keyword: bīstamo kravu pārvadājumi (with Latvian characters like Ū, Ā) I have url with exact keyword in it: http://vervo.lv/lv/kravu-parvadajumi/transports/bistamo-kravu-parvadajumi So why moz dont see that keyword, it does not understand special characters?
Moz Bar | | Liva0 -
When Moz provides an on-page grade report card, what tells Moz which keyword to grade for?
I just started working with the on-page grader and of couse am particularly interested in fixing my pages with not so stellar grades. Can someone tell me how SEO Moz decides what keyword to grade a page for? I receive a report each week that tells me how many pages I have with A's, B's etc. But what tells Moz to grade a page for a particular keyword? Is it the first keyword listed in the meta description?
Moz Bar | | AliciaMarie0