Include Cross Domain Canonical URL's in Sitemap - Yes or No?
-
I have several sites that have cross domain canonical tags setup on similar pages. I am unsure if these pages that are canonicalized to a different domain should be included in the sitemap. My first thought is no, because I should only include pages in the sitemap that I want indexed.
On the other hand, if I include ALL pages on my site in the sitemap, once Google gets to a page that has a cross domain canonical tag, I'm assuming it will just note that and determine if the canonicalized page is the better version. I have yet to see any errors in GWT about this. I have seen errors where I included a 301 redirect in my sitemap file. I suspect its ok, but to me, it seems that Google would rather not find these URL's in a sitemap, have to crawl them time and time again to determine if they are the best page, even though I'm indicating that this page has a similar page that I'd rather have indexed.
-
I looked at the sitemap, and they are including the http://www.seomoz.org/blog/the-story-of-seomoz but not the canonical page - http://www.masternewmedia.org/entrepreneurship-the-full-story-of-seomoz-told-by-rand-fishkin/
So based on this example, the page on SEOMoz is still included in the sitemap, regardless if it has a canonical or not.
This seems to make sense, since canonical links are used only as a hint and not an absolute directive.
I also noticed that Google is choosing to index and rank both pages, on Page 1.
SEOMoz is ranking higher on my browser for "the full story of seomoz". A few things going on here.
-
Why is google choosing to rank SEOMoz higher than Mastermedia.org for this page? There's a canonical setup, but google is choosing not to follow it. (again its a hint not an absolute) this doesn't always work.
-
I would think Google would be able to filter out the duplicate content easy. In this example, they are clearly not. SEOMoz is ranking #4 and Masternewmedia.org is ranking #5 for query "the full story of seomoz"
-
-
Right - as far as I know, you're supposed to put end URLs into a sitemap, not urls which 301 redirect. Cross domain canonical is still kind of new, but I would treat them as a 301 redirect and not include them in a sitemap.
Now, if you're curious, SEO Moz did a whiteboard Friday where they talked about this same exact issue (cross domain canonical), and as an experiment, re-posted a blog article from another blogger on SEO Moz.
http://www.seomoz.org/blog/cross-domain-canonical-the-new-301-whiteboard-friday
http://www.seomoz.org/blog-sitemap.xml
http://www.seomoz.org/blog/the-story-of-seomoz
The blog is still included in the blog sitemap. I think it probably won't 'hurt' to keep those pages in the sitemap, since a lot of sitemaps automatically generated CMS tools won't have been updated to deal with this yet.
-
There is no BIG problem if you add the pages that contain cross domain canonical tag on them. Why?
The reason why I can say this is because Google is not only indexing the pages from sitemap.xml file, Google have their own crawler and they have the ability to crawl and index the website no matter if you do not have an xml sitemap.
Google is very good at (in my opinion) picking the instructions that are available on the page so if you add the page in the xml sitemap, the crawler will read the instructions on the page and will only index the page that contain original content.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How does a page with a canonical for another domain impact SEO?
Hi, We have a requirement to host files that contains .html, .css, .js, and .pdf files externally on AWS S3 bucket. We have a landing page on our site that contains a link to those external links (i.e. pdf). On our site's (hosted on Drupal), landing page we already have a canonical link for the current landing page. On the .html file which is hosted externally, we were thinking to add the same canonical link that exists for the landing page so that search engines will go to the externally available .html file and interpret that the externally hosted file is related to our landing page. I was wondering if this is an acceptable solution without any SEO penalty. If there is a penalty, what would be the alternative solution to this so we can host files externally and drive most of the traffic to our landing page? Example Landing page: absolute url = https://www.site-domain.com/page-url ...... Externally available .html file (static) ......
Intermediate & Advanced SEO | | KendallHershey0 -
Print pages returning 404's
Print pages on one of our sister sites are returning 404's in our crawl but are visible when clicked on. Here is one example: https://www.theelementsofliving.com/recipe/citrus-energy-boosting-smoothie/print Any ideas as to why these are returning errors? Thank you!
Intermediate & Advanced SEO | | FirstService0 -
Should I include URLs that are 301'd or only include 200 status URLs in my sitemap.xml?
I'm not sure if I should be including old URLs (content) that are being redirected (301) to new URLs (content) in my sitemap.xml. Does anyone know if it is best to include or leave out 301ed URLs in a xml sitemap?
Intermediate & Advanced SEO | | Jonathan.Smith0 -
Is it bad I have a cluster of canonical urls that 301 re-direct?
Just went through a migration. We have a group of canonical URLs that are NOT the preferred url, but 301 re-direct to the preferred URL. Does this essentially "break even" and the incorrect canonical URL becomes obsolete? And/or would this be considered potentially bad and confusing for bots?
Intermediate & Advanced SEO | | lunavista-comm0 -
Should we use URL parameters or plain URL's=
Hi, Me and the development team are having a heated discussion about one of the more important thing in life, i.e. URL structures on our site. Let's say we are creating a AirBNB clone, and we want to be found when people search for apartments new york. As we have both have houses and apartments in all cities in the U.S it would make sense for our url to at least include these, so clone.com/Appartments/New-York but the user are also able to filter on price and size. This isn't really relevant for google, and we all agree on clone.com/Apartments/New-York should be canonical for all apartment/New York searches. But how should the url look like for people having a price for max 300$ and 100 sqft? clone.com/Apartments/New-York?price=30&size=100 or (We are using Node.js so no problem) clone.com/Apartments/New-York/Price/30/Size/100 The developers hate url parameters with a vengeance, and think the last version is the preferable one and most user readable, and says that as long we use canonical on everything to clone.com/Apartments/New-York it won't matter for god old google. I think the url parameters are the way to go for two reasons. One is that google might by themselves figure out that the price parameter doesn't matter (https://support.google.com/webmasters/answer/1235687?hl=en) and also it is possible in webmaster tools to actually tell google that you shouldn't worry about a parameter. We have agreed to disagree on this point, and let the wisdom of Moz decide what we ought to do. What do you all think?
Intermediate & Advanced SEO | | Peekabo0 -
Changing URLs to include a fixed identifier or ID
The Scenario: I got pages that I need to track, located in a domain, within several folders. Adding a common identifier or ID (eg. www.domain.com/folder/page-name-identifier.html) in those URL's will ease my work so I would be able to select, in Anlx, all traffic including URL's with that specific identifier. URL's for which track is needed lack this identifier today. My Plan: add identifier (7 letters fixed and common for all URLs) to those existing pages and 301 redirect from old to new URL's My Question: will this change of URL's and redirections SEO-hurt me in anyway?
Intermediate & Advanced SEO | | Tit0 -
Could a HTML <select>with large numbers of <option value="<url>">'s affect my organic rankings</option></select>
Hi there, I'm currently redesigning my website, and one particular pages lists hotels in New York. Some functionality I'm thinking of adding in is to let the user find hotels close to specific concert venues in New York. My current thinking is to provide the following select element on the page - selecting any one of the options will automatically redirect to my page for that concert venue. The purpose of this isn't to affect the organic traffic - I'm simply introducing this as a tool to help customers find the right hotel, but I certainly don't want it to have an adverse effect on my organic traffic. I'd love to know your thoughts on this. I must add that in certain cities, such as New York, there could be up to 450 different options in this select element. | <select onchange="location=options[selectedIndex].value;"> <option value="">Show convenient hotels for:</option> <option value="http://url1..">1492 New York</option> <option value="http://url2..">Abrons Arts Center</option> <option value="http://url3..">Ace of Clubs New York</option> <option value="http://url4..">Affairs Afloat</option> <option value="http://url5..">Affirmation Arts New York</option> <option value="http://url6..">Al Hirschfeld Theatre</option> <option value="http://url7..">Alice Tully Hall</option> .. .. ..</select> Many thanks Mike |
Intermediate & Advanced SEO | | mjk260 -
I want your opinions on the lack of increase in Pintrest's PR
Many months ago, a fellow marketer at my company introduced me to Pintrest, claiming that it would be good for our business. Pintrest was very much unknown by many just a few short months ago. Since then, I have seen it take off like wildfire, with excessive media coverage, registrations, and people putting the button on their sites. It must have thousands more backlinks now than it did six months ago--high quality ones too, as it's had coverage in virtually every major new media outlet. I want your opinion as to why it has remained a PR6 site this entire time. It was a PR6 site then and it still is now. I know the increase in PR is algorithmic, but come on! Can people share their experiences they've had link building for those higher PR sites? How much harder does it get?
Intermediate & Advanced SEO | | UnderRugSwept1