Craw Diagnostics Questions
-
SEO Moz is reporting that I have 50+ pages with a duplicate content issue based on this URL: http://www. f r e d aldous.co.uk/art-shop/art-supplies/art-canvas.html?manufacturer=178
But I have included this tag in the source: rel="canonical" href="http://www.f r e daldous.co.uk/art-shop/art-supplies/art-canvas.html"/>
(I have purposefully added white space to the URLs in this message as I'm not sure about the rules for posting links here)
I though this "canonical" tag prevented the duplicate content being indexed?
is the reporting by SEOMoz wrong or being over cautious?
-
Hi Niall,
This isn't a case of the canonical tag being properly applied, but a case where two or more pages are so similar in code that they are setting off the SEOmoz duplicate content flags.
First of all, those pages look different to us humans. But the SEOmoz web app uses a similarity threshold of 95% of the html code. This takes everything on the page, both hidden and visible into account.
In this case, it's counting all of the navigation and sidebar as well, which is significant. What's left of the unique content - the part that matters, makes up less than 5% of the code.
Here's a tool you can use to check the similarity: http://www.duplicatecontent.net/
I ran the pages through a couple of tools which showed 98% HTML similarity. And 99% text similarity.
For perspective, take a look at Google's cached versions of one of these pages. This is how googlebot sees the page: http://webcache.googleusercontent.com/search?q=cache:mdybPKIjOxUJ:www.fredaldous.co.uk/craft-shop/general-crafts.html+http://www.fredaldous.co.uk/craft-shop/general-crafts.html&hl=en&gl=us&strip=1
That, as we say, is a lot of links!
Since Panda, when I see a site with this many navigation links, I usually advise them to restructure their site architecture into more of a Pyramid shape, so that you reduce the overall navigation on each page.
Hope this helps! Best of luck with your SEO.
-
It claims that this is one of the duplicate URLS:
http://www.f r e daldous.co.uk/photo-gift/design-led-gifts.html?manufacturer=436
Now I am confused as page is no where near duplicate content of the URL I posted 1st.
Can anyone explain this?
-
Helo Niall,
It seems that you have inserted the rel="canonical" href= in the correct spot. I think the software is giving you the potentials which is always a bonus precaution. I really don't want to make a premature determination without knowing which 50 pages are showing up as duplicate. A deeper look will allow me to give you a more accurate response.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Fundamental HTTP to HTTPS Redirect Question
Hi All I'm planning a http to https migration for a site with over 500 pages. The site content and structure will be staying the same, this is simply a https migration. Can I just confirm the answer to this fundamental question? From my reading, I do not need to create 301 redirect for each and every page, but can add a single generic redirect so that all http references are redirected to https. Can I just double check this would suffice to preserve existing google rankings? Many Thanks
Technical SEO | | ruislip180 -
Numerous 404 errors on crawl diagnostics (non existent pages)..
As new as them come to SEO so please be gentle.... I have a wordpress site setup for my photography business. Looking at my crawl diagnostics I see several 4xx (client error) alerts. These all show up to non existent pages on my site IE: | http://www.robertswanigan.com/happy-birthday-sara/109,97,105,108,116,111,58,104,116,116,112,58,47,47,109,97,105,108,116,111,58,105,110,102,111,64,114,111,98,101,114,116,115,119,97,110,105,103,97,110,46,99,111,109 | Totally lost on what could be causing this. Thanks in advance for any help!
Technical SEO | | Swanny8110 -
X-cart page crawling question.
I have an x-cart site and it is showing only 1 page being crawled. I'm a newbie, is this common? Can it be changed? If so, how? Thanks.
Technical SEO | | SteveLMCG0 -
Long Domain Name - Subpage URL Question
I have a long domain name, so domainname/services/page title can get pretty lengthy. I have a services page, as a summary page since there's a few of htem, with more detailed on the actual page. In this situation, would it be better to do domainname.com/services/service-name which can exceed the suggested 70 characters, or would it be a better idea to do domain.com/service-name and just have hte m under the services menu? Is there any advantage/disadvantage to going out 2-3 tiers? or having the sub pages of those services off the domain instead of a child of the root child page Please let me know if any clarification is needed. Thanks!
Technical SEO | | tgr0ss0 -
Google Knowledge Graph related question
I have a client who is facing age discrimination in the film industry. (Big surprise there.) The problem is, when you type in his name, Google's new Knowledge Graph displays a brief bio about him to the right of the search results. This bio snippet includes his year of birth. Wikipedia is credited as the source for the bio information about him, and yet, his Wikipedia entry doesn't include his age or birth date. Neither does his iMDb bio. So the question is, How can he figure out where Google is getting that birthdate from? He wants to try and remove it, not falsify it. Thanks for any help you can offer.
Technical SEO | | JamesAMartin0 -
Question concerning a 302 Redirect
Hi! I've already done some research on redirects, but I still have a question concerning a 302 redirect implemented at the homepage of a website. The Website www.domainA.com has a 302 redirect to www.domainA.com/content/.... Also all subsequent pages have the /content/ directory in their URLs: e.g domainA.com/content/products First thing I was wondering about, was the use of a redirect to a new site using an additional directory /content/... Why would anyone do this? Would it be enough to replace the 302 with a 301 redirect, or would you recommend to change the entire structure and eliminate this /content/ directory? The most logical structure would be www.domainA.com/products/.., and not www.domainA.com/content/products, right? Second thing: Given that 302 means temporary redirect, what are the actual implications when redirecting from domainA.com to domainA.com/content? I've heard that 302 redirects don't pass linkjuice and are detrimental for the site's rankings... What are the actual implications concerning the example above (302 redirect from domainA.com to domainA.com/content ? Would be great to get some advice about the first problem and maybe some insights about the second one concerning 302s in general. Thanks in advance! Cheers, Chris
Technical SEO | | adwordize0 -
Duplicate content question with PDF
Hi, I manage a property listing website which was recently revamped, but which has some on-site optimization weaknesses and issues. For each property listing like http://www.selectcaribbean.com/property/147.html there is an equivalent PDF version spidered by google. The page looks like this http://www.selectcaribbean.com/pdf1.php?pid=147 my question is: Can this create a duplicate content penalty? If yes, should I ban these pages from being spidered by google in the robots.txt or should I make these link nofollow?
Technical SEO | | multilang0