How to remove Duplicate content due to url parameters from SEOMoz Crawl Diagnostics
-
Hello all
I'm currently getting back over 8000 crawl errors for duplicate content pages . Its a joomla site with virtuemart and 95% of the errors are for parameters in the url that the customer can use to filter products.
Google is handling them fine under webmaster tools parameters but its pretty hard to find the other duplicate content issues in SEOMoz with all of these in the way.
All of the problem parameters start with
?product_type_
Should i try and use the robot.txt to stop them from being crawled and if so what would be the best way to include them in the robot.txt
Any help greatly appreciated.
-
Hi Tom
It took a while but I got there in the end. I was using joomla 1.5 and I downloaded a component called "tag meta" which allows you to insert tags including the canonical tag on specific urls or more importantly urls which begin in a certain way. Now how you use it depends on how your sef urls are set up or what sef component you are using but you can put a canonical tag on every url in a section that has view-all-products in it.
So in one of my examples I put a canonical tag pointing to /maternity-tops.html (my main category page for that section) on every url that began with /maternity-tops/view-all-products
I hope this if of help to you. It takes a bit of playing around with but it worked for me. The component also has fairly good documentation.
Regards
Damien
-
Damien,
Are you able to explain how you were able to do this within virtuemart?
Thanks
Tom
-
So leave the 5 pages of dresses as they are because they are all original but have the canonical tag on all of the filter parameters pointing to Page 1 of dresses.
Thank you for your help Alan
-
It should be on all versions of the page, all pointing to the one version.
Search engines will then see all as one page
-
Hi Alan
Thanks for getting back to me so fast. I'm slightly confused on this so an example might help One of the pages is http://www.funkybumpmaternity.com/Maternity-Dresses.html.
There are 5 pages of dresses with options on the left allowing you to narrow that down by color, brand, occasion and style. Every time you select an option on combination of options on the left for example red it will generate a page with only red dresses and a url of http://www.funkybumpmaternity.com/Maternity-Dresses/View-all-products.html?product_type_1_Colour[0]=Red&product_type_1_Colour_comp=find_in_set_any&product_type_id=1
The options available are huge which I believe is why i'm getting so many duplicate content content issues on SEOMoz pro. Google is handling the parameters fine.
How should I implement the canonical tag? Should I have a tag on all filter pages referencing page 1 of the dresses? Should pages 2-5 have the tag on them? If so would this mean that the dresses on these pages would not be indexed?
-
This sounds more like a case for a canonical tag,
dont exculed with robots.txt this is akin to cutting off your arm, because you have a spliter in your finger.
When you exclude use robots, link juce passing though links to these pages is lost.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content, Canonicalization may not work in our scenario.
I'm new to SEO (so please excuse the lack of terminology), and will be taking over our companies inbound marketing completely, I previously just did data analysis and managed our PPC campaigns within Google and Bing/Yahoo, now I get all three, Yipee! But I digress. Before I get started here, I did read: http://moz.com/community/q/new-client-wants-to-keep-duplicate-content-targeting-different-cities?sort=most_helpful and I found both the answers there to be helpful, but indirect for my scenario. I'm conducting our companies first real SEO audit (thanks MOZ for the guide there), and duplicate content is going to be our number one problem to tackle. Our companies website was designed back in 2009, with the file structure /city-name/product-name. The problem with this is, we are open in over 50 cities now (and headed to 100 fast), and we are starting to amass duplicate content. Five products (and expanding), times the locations... you get it. My Question(s): How should I deal with this? The pages are almost identical, except listing the different information for each product depending upon it's location. However, for one of our products, Moz's own tools (PRO) did not find all the duplicate content, but did find some (I'm assuming it's because the pages have different course options and the address for the course is different, boils down to a different address on the very bottom of the body and different course options on the right sidebar). The other four products duplicate content were found and marked extensively. If I choose to use Canonicalization to link all the pages to one main page, I believe that would pass all the link juice to that one page, but we would no longer show in a Google search for the other cities, ex: washington DC example product name. Correct me if I'm wrong here. **Should I worry about the product who's duplicate content only was marked four times out of fifty cities? **I feel as if this question answers itself, but I still would like to have someone who knows more than me shed some light on this issue. The other four products are not going to be an issue as they are only offered online, but still follow the same file structure with /online in place of /city-name. These will be Canonicalized together under the /online location. One last thing I will mention here, having the city name in the url gives us a nice advantage (I think) when people are searching for products in cities we offer our product. (correct me again) If this is not the case, I believe I could talk our team into restructuring the files (if you think that's our best option). Some things you need to know about our site: We use a cookie for the location. Once you land on a page that has a location tied to it, the cookie is updated and saved. If the location does not exist, then you are redirected to a page to chose a location. I'm pretty sure this can cause some SEO issues too, but once again not sure. I know this is a wall of text, but I cannot tell you enough how appreciative I am in advance for your informative answers. Thanks a million, Trenton
Moz Pro | | PM_Academy0 -
Pages Crawled: 1 Why?
I have some campaigns which have only 1 page crawled, while some other campaigns, having completely similar URL (subdomain) and number of keywords and pages, have all pages crawled... Why is that so? It has been also a while I waited and so far no change...
Moz Pro | | BritishCouncil0 -
"link_count" column in Crawl Diagnostics report
On the Crawl Diagnostics report, does "link_count" represent external (links to this URL), internal, both, or what ?
Moz Pro | | GlennFerrell0 -
How does SEOmoz pull its duplicate page title and content information?
I ask because I am getting errors based on URLs that do not even exist on our site. For example: http://www.robots.com/applications/abb/panasonic/robots this URL does not even exist for our site, but somehow it is listed in the error section of page title duplication tool. http://www.robots.com/applications/ exists, but there is no place to get to an ABB or a Panasonic robot from this page, not to mention an ABB/Panasonic (which for sure does not exist). ?? We have quite a few of these out there and just wondering how to find out where the link is coming from. When we checked our URLs through Integrity, links like the one listed above (which we had 29 of them listed) that do not show up. Thoughts? Thanks! Janelle
Moz Pro | | jwanner0 -
Crawl Disgnosis only crawling 250 pages not 10,000
My crawl diagnosis has suddenly dropped from 10,000 pages to just 250. I've been tracking and working on an ecommerce website with 102,000 pages (www.heatingreplacementparts.co.uk) and the history for this was showing some great improvements. Suddenly the CD report today is showing only 250 pages! What has happened? Not only is this frustrating to work with as I was chipping away at the errors and warnings, but also my graphs for reporting to my client are now all screwed up. I have a pro plan and nothing has (or should have!) changed.
Moz Pro | | eseyo0 -
Crawl frequency
What is the frequency of crawl. My crawl diagnostics shows data from over a month ago
Moz Pro | | deBreezeInteractive0 -
How to Stop SEOMOZ from Crawling a Sub-domain without redoing the whole campaign?
I am using SEOMOZ for a client to track their website's performance and fix any errors and issues. A few weeks ago, they created a sub-domain (sub.example.com) to create a niche website for some of their specialized content. However, when SEOMOZ re-crawled the main domain (example.com), it also reported the errors for the subdomain. Is there any way to stop SEOMOZ from crawling the subdomain and only crawl the main domain? I know that can be done by starting a new campaign, but is there any way to work around an existing campaign? I'm asking because we would like to avoid the setting up the campaign again and losing the historical data as well. Any input would be greatly appreciated. Thanks!
Moz Pro | | TheNorthernOffice790 -
We were unable to grade that page. We received a response code of 301\. URL content not parseable
I am using seomoz webapp tool for my SEO on my site. I have run into this issue. Please see the attached file as it has the screen scrape of the error. I am running an on page scan from seomoz for the following url: http://www.racquetsource.com/squash-racquets-s/95.htm When I run the scan I receive the following error: We were unable to grade that page. We received a response code of 301. URL content not parseable. This page had worked previously. I have tried to verify my 301 redirects and am unable to resolve this error. I can perform other on page scans and they work fine. Is this a known problem with this tool? I have verified ensuring I don't have it defined. Any help would be appreciated.
Moz Pro | | GeoffBatterham0