Duplicate content http:// something .com and http:// something .com/
-
Hi,
I've just got a crawl report for a new wordpress blog with suffusion theme and yoast wordpress seo module and there is duplicate content for:
http:// something .com
and
http:// something .com/
I just can't figure out how to handle this. Can I add a redirect for .com/ to .com in htaccess?
Any help is appreciated!
By the way, the tag value for rel canonical is **http:// something .com/ **for both.
-
All so rember the canonicalization SEO advice: url canonicalization by MATT CUTTS on JANUARY 4, 2006 in GOOGLE/SEO (I got my power back!) Before I start collecting feedback on the Bigdaddy data center, I want to talk a little bit about canonicalization, www vs. non-www, redirects, duplicate urls, 302 “hijacking,” etc. so that we’re all on the same page. Q: What is a canonical url? Do you have to use such a weird word, anyway? A: Sorry that it’s a strange word; that’s what we call it around Google. Canonicalization is the process of picking the best url when there are several choices, and it usually refers to home pages. For example, most people would consider these the same urls: www.example.com example.com/ www.example.com/index.html example.com/home.asp But technically all of these urls are different. A web server could return completely different content for all the urls above. When Google “canonicalizes” a url, we try to pick the url that seems like the best representative from that set. Q: So how do I make sure that Google picks the url that I want? A: One thing that helps is to pick the url that you want and use that url consistently across your entire site. For example, don’t make half of your links go to http://example.com/ and the other half go to http://www.example.com/ . Instead, pick the url you prefer and always use that format for your internal links. Q: Is there anything else I can do? A: Yes. Suppose you want your default url to be http://www.example.com/ . You can make your webserver so that if someone requests http://example.com/, it does a 301 (permanent) redirect to http://www.example.com/ . That helps Google know which url you prefer to be canonical. Adding a 301 redirect can be an especially good idea if your site changes often (e.g. dynamic content, a blog, etc.). Q: If I want to get rid of domain.com but keep www.domain.com, should I use the url removal tool to remove domain.com? A: No, definitely don’t do this. If you remove one of the www vs. non-www hostnames, it can end up removing your whole domain for six months. Definitely don’t do this. If you did use the url removal tool to remove your entire domain when you actually only wanted to remove the www or non-www version of your domain, do a reinclusion request and mention that you removed your entire domain by accident using the url removal tool and that you’d like it reincluded. Q: I noticed that you don’t do a 301 redirect on your site from the non-www to the www version, Matt. Why not? Are you stupid in the head? A: Actually, it’s on purpose. I noticed that several months ago but decided not to change it on my end or ask anyone at Google to fix it. I may add a 301 eventually, but for now it’s a helpful test case. Q: So when you say www vs. non-www, you’re talking about a type of canonicalization. Are there other ways that urls get canonicalized? A: Yes, there can be a lot, but most people never notice (or need to notice) them. Search engines can do things like keeping or removing trailing slashes, trying to convert urls with upper case to lower case, or removing session IDs from bulletin board or other software (many bulletin board software packages will work fine if you omit the session ID). Q: Let’s talk about the inurl: operator. Why does everyone think that if inurl:mydomain.com shows results that aren’t from mydomain.com, it must be hijacked? A: Many months ago, if you saw someresult.com/search2.php?url=mydomain.com, that would sometimes have content from mydomain. That could happen when the someresult.com url was a 302 redirect to mydomain.com and we decided to show a result from someresult.com. Since then, we’ve changed our heuristics to make showing the source url for 302 redirects much more rare. We are moving to a framework for handling redirects in which we will almost always show the destination url. Yahoo handles 302 redirects by usually showing the destination url, and we are in the middle of transitioning to a similar set of heuristics. Note that Yahoo reserves the right to have exceptions on redirect handling, and Google does too. Based on our analysis, we will show the source url for a 302 redirect less than half a percent of the time (basically, when we have strong reason to think the source url is correct). Q: Okay, how about supplemental results. Do supplemental results cause a penalty in Google? A: Nope. Q: I have some pages in the supplemental results that are old now. What should I do? A: I wouldn’t spend much effort on them. If the pages have moved, I would make sure that there’s a 301 redirect to the new location of pages. If the pages are truly gone, I’d make sure that you serve a 404 on those pages. After that, I wouldn’t put any more effort in. When Google eventually recrawls those pages, it will pick up the changes, but because it can take longer for us to crawl supplemental results, you might not see that update for a while. That’s about all I can think of for now. I’ll try to talk about some examples of 302′s and inurl: soon, to help make some of this more concrete. http://www.ragepank.com/articles/3/preventing-duplicate-content/ Hope I was of help, Thomas Von Zickell
-
thanks!
Can some body please also clarify exactly what should be in the second line:
As eyepaq wrote: RewriteRule ^(.+)/$ [%{HTTP_HOST}...] [R=301,L]
Should I insert something in/after "[%{HTTP_HOST}...]"?
-
After RewriteEngine if i'm not wrong
-
Should I keep the existing wordpress rewrite? If I keep it, should I then place your code before or after?
BEGIN WordPress
RewriteEngine On
RewriteBase /
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
END WordPress
-
Hi,
Google is pretty good in understanding that the trailing slash version is the same with the non-trailing slash version so you are safe on that side.
Even if the crawler said this is an issue it's not something you should focus on.
However, if you want to play by the book, you can httaccess it so it will 301 redirect to oen or another.
Bellow is a sample code:
#get rid of trailing slashes
RewriteCond %{HTTP_HOST} ^(www.)?example.com$ [NC]
RewriteRule ^(.+)/$ [%{HTTP_HOST}...] [R=301,L]Hope it helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content in Shopify reported by Moz
According to Moz crawl report, there are hundreds of duplicate pages in our Shopify store ewatchsale.com. The main duplicate pages are:
On-Page Optimization | | ycnetpro101
https://ewatchsale.com/collections/seiko-watches?page=2
https://ewatchsale.com/collections/all/brand_seiko
(the canonical page should be https://ewatchsale.com/collections/seiko-watches) https://ewatchsale.com/collections/seiko-watches/gender_mens
(the canonical page should be https://ewatchsale.com/collections/seiko-watches/mens-watches) Also, I want to exclude indexing of pages URLs with "filter parameters" like https://ewatchsale.com/collections/seiko-watches/color_black+mens-watches+price_us-100-200 Shopify advised we can't access our robots.txt file. How can we exclude SE crawling of the page URLs with filter names?
How can we access the robots.txt file?
How can we add canonical code to the preferred collection pages? Which templates and what codes to add? Thanks for your advice in advance!0 -
Content hidden behind a 'read all/more..' etc etc button
Hi Anyone know latest thinking re 'hidden content' such as body copy behind a 'read more' type button/link in light of John Muellers comments toward end of last year (that they discount hidden copy etc) & follow up posts on Search Engine Round Table & Moz etc etc ? Lots of people were testing it and finding such content was still being crawled & indexed so presumed not a big deal after all but if Google said they discount it surely we now want to reveal/unhide such body copy if it contains text important to the pages seo efforts. Do you think it could be the case that G is still crawling & indexing such content BUT any contribution that copy may have had to the pages seo efforts is now lost if hidden. So to get its contribution to SEO back one needs to reveal it, have fully displayed ? OR no need to worry and can keep such copy behind a 'read more' button/link ? All Best Dan
On-Page Optimization | | Dan-Lawrence0 -
Duplicate Content
Hi I am new to SEO and at the moment looking at warnings from the crawl diagnostics report. When I have looked at the content from the urls given I cant see anything obvious that relates to duplicate content. Whats the best way to find out the problem please?
On-Page Optimization | | Pauline080 -
New jobboard: Can redirecting folder (site.com/jobboard) to subdomain (jobboard.site.com) hurt SEO?
Hi there, I'm planning to implement a jobboard on my website which needs to be installed on a subdomain (jobboard.site.com) but I'd really like to use site.com/jobboard for promoting this jobboard (jobboard collects external industry jobs). Are there any possible disadvantages when I set up a 301 redirect from jobboard.site.com to site.com/jobboard? Also: What if I want to move this jobboard to a unique domain one day (e.g. jobboard-industry-xy.com), Would that be tricky (as I'd basically have to redirect the folder-to-subdomain redirect to an external domain and therefore get a folder-to-subdomain-to-external-domain redirect...)? Cheers, Thomas
On-Page Optimization | | stl990 -
Duplicate Content Daily Rates
Our finance information site want to publish daily rates each day of the main currency / share etc prices. We've created a template with the main headers e.g. Eurozone. GBP v EUR 1.1762. Australia. GBP v AUD 1.1494.... and list top 20 currencies. We want to roll this out daily Mon - Friday. The only content that will change would be the rates on a daily basis. It's v useful info to users but we're a little cautious about it being seen as duplicate content. What advice would you give re title tags too in this new product rollout.
On-Page Optimization | | stevanl0 -
Duplicate Title question
Thanks Mozzers in advance for any insight into what I'm sure is a basic SEO question. I'm working with a resort in the great state of Maine. Their home page title reads Maine Resorts, Resorts in Maine, (company name). The site has about 400 URL's and over half of the URL's utilize the first keyword phrase of the home page title, "Maine Resorts." Predominately, I find them used on the Accommodations pages (pages that describe each room with a picture) which I would label as deeper pages and non-conversion type pages. The page titles themselves are not exact duplicates of the Home Page Title but might read something like "Maine Resorts, Company Name, Accommodation Listing." My concern is that the heavy use of "Maine Resorts" as the first phrase in over 200 plus pages might be competing against the home page and pulling the home page ranking down. Thanks for any help given!
On-Page Optimization | | hawkvt10 -
How to avoid duplicate content on ecommerce pages?
I am currently building the site architecture for a very large ecommerce site. I am wondering how I should build it out if I have products that I want to include in multiple categories within my site. For example: Lets say I sell fitness equipment and I have categories for things such as: Treadmill, Exercise Bike, Stair Stepper, Weight Benches etc. But then I also have specific brand category pages such a: Precor, Life Fitness, Hammer, Body Solid So my question is how do I structure this so I am building this correctly? If I sell a Precor Treadmill I will want to include that product under the "Treadmill" category page as well as under the "Precor Equipment" category page. Can I get some advice for the best way to structure this? It's obviously something I want to avoid at all costs of doing improperly and having to fix later. Thank you Jake
On-Page Optimization | | PEnterprises0 -
Best practice for franchise sites with duplicated content
I know that duplicated content is a touchy subject but I work with multiple franchise groups and each franchisee wants their own site, however, almost all of the sites use the same content. I want to make sure that Google sees each one of these sites as unique sites and does not penalize them for the following issues. All sites are hosted on the same server therefor the same IP address All sites use generally the same content across their product pages (which are very very important pages) *templated content approved by corporate Almost all sites have the same design (A few of the groups we work with have multiple design options) Any suggestions would be greatly appreciated. Thanks Again Aaron
On-Page Optimization | | Shipyard_Agency0