Moz Crawler URL paramaters & duplicate content
-
Hi all, this is my first post on Moz Q&A
Questions:
- Does the Moz Crawler take into account rel="canonical" for search results pages with sorting / filtering URL parameters?
- How much time does it take for an issue to disappear from the issues list after it's been corrected? Does it come op in the next weekly report?
I'm asking because the crawler is reporting 50k+ pages crawled, when in reality, this number should be closer to 1000. All pages with query parameters have the correct canonical tag pointing to the root URL, so I'm wondering whether I need to noindex the other pages for the crawler to report correct data?:
Original (canonical URL): DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas
Filter active URL: DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas&booking_date=&booking_days=1&booking_persons=1&priceFilter%5B%5D=0%2C500&includedPriceFilter%5B%5D=drinks-soft
Also, if noindex is the only solution, will it impact the ranking of the pages involved?
Note: Google and Bing are semi-successful in reporting index page count, each reporting around 2.5k result pages when using the site:DOMAIN.com query. The rel canonical tag was missing for a short period of time about 4 weeks ago, but since fixing the issue these pages still haven't been deindexed.
Appreciate any suggestions regarding Moz Crawler & Google / Bing index count!
-
Happy to help!
We crawled roughly 49k pages because there were that many links on the site that we could find. 50k is also the new standard crawl limit for campaigns in Standard and Medium subscriptions. Adding a rel=canonical to a page doesn't mean it won't get crawled by our campaign crawler, only that the crawler is to refer to the canonicalized link for reporting purposes.
Without going into too specific of URL details, these pages are considered duplicates because their canonical tags point to different URLs. For example,
is considered a duplicate of
DOMAIN.COM/charters/search/mx/QR?booking_date=&booking_days=&booking_persons=limit%252525253D20
because the canonical tag for the first page is
DOMAIN.COM/charters/search/mx/QR?offset=20
while the canonical for the second URL is
DOMAIN.COM/charters/search/mx/QR
Since the canonical tags point to different pages it is assumed that DOMAIN.COM/charters/search/mx/QR?offset=20 and DOMAIN.COM/charters/search/mx/QR are likely to be duplicates themselves.
Here is how our system interprets duplicate content vs. rel=canonical:
Assuming A, B, C, and D are all duplicates,
If A references B as the canonical, then they are not considered duplicates
If A and B both reference C as canonical, A and B are not considered duplicates of each other
If A references C as a canonical, A and B are considered duplicated
If A references C as canonical, B references D, then A and B are considered duplicatesThe above example from your campaign actually falls into the fourth example I've listed above. Hope this helps clear things up
-
Thanks Sam!
I've read the post and checked my canonical tags but still can't seem to find what's causing the canonicalized pages to be indexed by RogerBot. The same page shows up in Moz's crawl test 100 times with slightly different parameters.
I'll keep investigating but some specific feedback from Moz staff would be appreciated
-
Hi!
I'm going to leave the strategy discussion open to the community but from a technical standpoint, we will count rel=canonical on dynamic urls as long as they are implemented correctly. Dr. Pete has a great post where he talks about canonicals that might be helpful as well. Updates to campaigns happen on a weekly basis depending on when the campaign was created. So if it was created on a Tuesday, you'll see updated campaign data every Tuesday after. You can run a crawl test (accessible from Research Tools) to get 3k page crawls in between your updates though. Hope this helps!
-
Thanks for the info searchbuzz. So if I understand correctly, new pages are crawled and kept in the index (up to the campaign limit), but issues on indexed pages are reported separately.
My issue is that due to the dynamic URLs used in search filters on my site I actually have 49k issues detected (over 95% are duplicate content and long URL issues because the crawler is indexing the same page many times for each URL parameter combination). The crawl test can't index the entire site because it generates a huge amount of pages.
It's a travel-related website with listings in 233 cities and multiple filter functionality, so each unique 'page' of results is indexed more than 100 times, even though there's a rel="canonical" tag pointing to the non-parametrized URL of that page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Utilizing the Moz Location in keywords vs including the location in the actual keyword phrase
I've searched the Moz blog as well as Q&A forums - but I specifically have a question about utilizing the Moz Location in keywords vs including the location in the actual keyword phrase. I've done both and get different rankings. For instance, I have "it company" with the Location set to Corpus Christi, Texas and our rank is #3. Then I also have "it company corpus christi" with the Location set to Corpus Christi, Texas and our rank is #6. Additionally, I threw in "it company corpus christi" with the Location set to National and our rank is #7. What is the best practice and what are the determining factors for the differences? Thank you in advance!
Moz Bar | | LBoxerger0 -
Moz Page Analysis Country different to Who.is?
If I analyse a domain with Moz Page Analysis tool, it says that the domain is hosted in the United States but if look up the same domain on who.is, the hosting location is Italy?
Moz Bar | | Marketing_Today0 -
How can I find duplicate pages from a Moz Crawl?
We have many duplicate pages that show up on the Moz Crawl, and we're trying to fix these but it's very difficult because I can't see a way to isolate the code where the duplicate is found. For instance, http://experiencemission.org/immersion/ is one of our main pages, and the crawl shows one duplicate of http://experiencemission.org/immersion. It appears that one of our staff manually edited the source code in one of our pages but forgot the trailing slash. This would be an easy fix but the problem is that this page is linked to internally on our website 2423 times, so it's next to impossible to find the code that is incorrect. We have many other pages with this same basic problem. We know we have duplicates, but it's next to impossible to isolate them. So my question is this: When viewing the Moz Crawl data is there any way to see where a specific duplicate page link is located on our website? Thanks for any and all help!
Moz Bar | | expmission0 -
Moz Crawler not Identifying all Duplicate Pages
On two recent site crawls (9/27/14 and 11/4/14) for duplicate content the Moz tool did not ID the following 2 pages, which are 100% duplicate to each other: http://www.hooksandlattice.com/planter-hampton-241212.html ; Screenshot: http://screencast.com/t/DdwWroUU http://www.hooksandlattice.com/planter-hampton-721212.html ; Screenshot: http://screencast.com/t/8Lb1cJZmGrhX As I'm working feverishly to re-write and update the site (goal is ZERO duplicates) I'm finding it challenging to use the Moz tool to get the project done. Does anyone have any feedback or help they can provide for how I can identify all duplicate pages associated with my domain? Thank you! Lindsey Pfeiffer
Moz Bar | | CMC-SD0 -
Moz Crawl Test Trying to Crawl Contact Form Submit Button Location?
Moz Crawl Test for some reason is trying to Crawl a contact form Widget Submit Location. My obvious guess is that obviously the crawl cannot submit to the required fields…..I believe this because they're only kicking back these errors on the pages I have a contact form widget on. http://crawfordspest.com/pest-control/crawfords@crawfordspest.com 1412553693 404 : Received 404 (Not Found) error response for page. Error attempting to request page; see title for details. 404
Moz Bar | | Funk-Creative-Media
http://crawfordspest.com/tree-services/crawfords@crawfordspest.com 1412553693 404 : Received 404 (Not Found) error response for page. Error attempting to request page; see title for details. 404
http://crawfordspest.com/lawn-care/crawfords@crawfordspest.com 1412553693 404 : Received 404 (Not Found) error response for page. Error attempting to request page; see title for details. 404
http://crawfordspest.com/specialty-services/crawfords@crawfordspest.com 1412553693 404 : Received 404 (Not Found) error response for page. Error attempting to request page; see title for details. 404 Can you shed any insight to this? I'm a bit worried that I'll have to complete gut the contact form which was one of the major requests my client requested. Or in a worse scenario make all fields not required. It would let so much spam in. I have never seem anything like this at all. But I've learned a lot from Moz, and with major errors like 404 damage Domain Authority greatly. I've fixed 404 issues with newly acquired clients existing sites and tracked through Moz and the domain authority flies up once these errors are fixed. Along with fixing what Webmaster Tools through Google reports back. ..... Let me know if you have any expertise on this matter.0 -
My product pages have no weight / links from root domains with the Moz tool bar
Hi, When I view my home page (http://www.arkwildlife.co.uk) with the Moz toolbar, I see a good PA and a good amount of links from root domains. As I go down the site, it seems to get worse. The category pages (http://www.arkwildlife.co.uk/Category/0/Straight_Foods.html) have a little but not much and then from this point onwards, it's nothing. The product page (http://www.arkwildlife.co.uk/Item/Straight_Foods~Sunflower_Seeds/SUNH/Premium_Sunflower_Hearts.html) is reporting to have no root domain links but I am not sure why. Interestingly, when I click through to a review page (http://www.arkwildlife.co.uk/StockReview/0/SUNH/0/Premium_Sunflower_Hearts.html) it does have some juice. Would anyone be able help on why this is happening and what I need to be looking at in order to resolve it please? EDIT: I've been looking at the hyperlinks and notice something odd. If I review the score with the first link below, it gives a score of 1, but the second gives a PA of 13 with one root domain linked. 1:http://www.arkwildlife.co.uk/Item/Straight_Foods~Sunflower_Seeds/SUNH/Premium_Sunflower_Hearts.html 2:http://www.arkwildlife.co.uk/Item/Straight_Foods%7ESunflower_Seeds/SUNH/Premium_Sunflower_Hearts.html Please note the "%7E" instead of the "~" in the URL. The browser appears to show the ~ character no matter what but the rank of the page changes. I don't understand what the Moz toolbar is doing with this. Note: This behaviour only happens in Mozilla Firefox, in chrome both the rankings are zero for each URL. Many Thanks.
Moz Bar | | nawgie0 -
Duplicate content - Which is the other duplicate page?
Hi I just ran a campaign, and I got a duplicate content warning for some of my pages. When I go into the diagnostic report, I am unable to find the page detected by google as 'duplicate' to the main page. Unless I know which 2 pages are being detected as duplicate, it'll be really difficult to actually solve the problem. Would be great to have any kind of help here. Thanks in advance!
Moz Bar | | rjchugh0 -
Pro.moz.com referrals and transactions
Our Google Analytics account reports a significant number of visits and transactions as referrals from pro.moz.com. That is nice and we appreciate this but where is the traffic generated from ? (my first MOZ question)
Moz Bar | | silverpewter0