undefined
Skip to content
Moz logo Menu open Menu close
  • Products
    • Moz Pro
    • Moz Pro Home
    • Moz Local
    • Moz Local Home
    • STAT
    • Moz API
    • Moz API Home
    • Compare SEO Products
    • Moz Data
  • Free SEO Tools
    • Domain Analysis
    • Keyword Explorer
    • Link Explorer
    • Competitive Research
    • MozBar
    • More Free SEO Tools
  • Learn SEO
    • Beginner's Guide to SEO
    • SEO Learning Center
    • Moz Academy
    • SEO Q&A
    • Webinars, Whitepapers, & Guides
  • Blog
  • Why Moz
    • Agency Solutions
    • Enterprise Solutions
    • Small Business Solutions
    • Case Studies
    • The Moz Story
    • New Releases
  • Log in
  • Log out
  • Products
    • Moz Pro

      Your all-in-one suite of SEO essentials.

    • Moz Local

      Raise your local SEO visibility with complete local SEO management.

    • STAT

      SERP tracking and analytics for enterprise SEO experts.

    • Moz API

      Power your SEO with our index of over 44 trillion links.

    • Compare SEO Products

      See which Moz SEO solution best meets your business needs.

    • Moz Data

      Power your SEO strategy & AI models with custom data solutions.

    NEW Keyword Suggestions by Topic
    Moz Pro

    NEW Keyword Suggestions by Topic

    Learn more
  • Free SEO Tools
    • Domain Analysis

      Get top competitive SEO metrics like DA, top pages and more.

    • Keyword Explorer

      Find traffic-driving keywords with our 1.25 billion+ keyword index.

    • Link Explorer

      Explore over 40 trillion links for powerful backlink data.

    • Competitive Research

      Uncover valuable insights on your organic search competitors.

    • MozBar

      See top SEO metrics for free as you browse the web.

    • More Free SEO Tools

      Explore all the free SEO tools Moz has to offer.

    NEW Keyword Suggestions by Topic
    Moz Pro

    NEW Keyword Suggestions by Topic

    Learn more
  • Learn SEO
    • Beginner's Guide to SEO

      The #1 most popular introduction to SEO, trusted by millions.

    • SEO Learning Center

      Broaden your knowledge with SEO resources for all skill levels.

    • On-Demand Webinars

      Learn modern SEO best practices from industry experts.

    • How-To Guides

      Step-by-step guides to search success from the authority on SEO.

    • Moz Academy

      Upskill and get certified with on-demand courses & certifications.

    • MozCon

      Save on Early Bird tickets and join us in London or New York City

    Unlock flexible pricing & new endpoints
    Moz API

    Unlock flexible pricing & new endpoints

    Find your plan
  • Blog
  • Why Moz
    • Small Business Solutions

      Uncover insights to make smarter marketing decisions in less time.

    • Agency Solutions

      Earn & keep valuable clients with unparalleled data & insights.

    • Enterprise Solutions

      Gain a competitive edge in the ever-changing world of search.

    • The Moz Story

      Moz was the first & remains the most trusted SEO company.

    • Case Studies

      Explore how Moz drives ROI with a proven track record of success.

    • New Releases

      Get the scoop on the latest and greatest from Moz.

    Surface actionable competitive intel
    New Feature

    Surface actionable competitive intel

    Learn More
  • Log in
    • Moz Pro
    • Moz Local
    • Moz Local Dashboard
    • Moz API
    • Moz API Dashboard
    • Moz Academy
  • Avatar
    • Moz Home
    • Notifications
    • Account & Billing
    • Manage Users
    • Community Profile
    • My Q&A
    • My Videos
    • Log Out

The Moz Q&A Forum

  • Forum
  • Questions
  • Users
  • Ask the Community

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

  1. Home
  2. SEO Tactics
  3. Intermediate & Advanced SEO
  4. Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)

Moz Q&A is closed.

After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.

Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)

Intermediate & Advanced SEO
3
10
3.3k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as question
Log in to reply
This topic has been deleted. Only users with question management privileges can see it.
  • browndoginteractive
    browndoginteractive last edited by Jan 24, 2014, 4:45 PM

    Hi Guys,

    We have developed a plugin that allows us to display used vehicle listings from a centralized, third-party database. The functionality works similar to autotrader.com or cargurus.com, and there are two primary components:

    1. Vehicle Listings Pages: this is the page where the user can use various filters to narrow the vehicle listings to find the vehicle they want.
    2. Vehicle Details Pages: this is the page where the user actually views the details about said vehicle. It is served up via Ajax, in a dialog box on the Vehicle Listings Pages.

    Example functionality:  http://screencast.com/t/kArKm4tBo

    The Vehicle Listings pages (#1), we do want indexed and to rank. These pages have additional content besides the vehicle listings themselves, and those results are randomized or sliced/diced in different and unique ways. They're also updated twice per day.

    We do not want to index #2, the Vehicle Details pages, as these pages appear and disappear all of the time, based on dealer inventory, and don't have much value in the SERPs. Additionally, other sites such as autotrader.com, Yahoo Autos, and others draw from this same database, so we're worried about duplicate content. For instance, entering a snippet of dealer-provided content for one specific listing that Google indexed yielded 8,200+ results:  Example Google query.

    We did not originally think that Google would even be able to index these pages, as they are served up via Ajax. However, it seems we were wrong, as Google has already begun indexing them. Not only is duplicate content an issue, but these pages are not meant for visitors to navigate to directly! If a user were to navigate to the url directly, from the SERPs, they would see a page that isn't styled right.

    Now we have to determine the right solution to keep these pages out of the index:  robots.txt, noindex meta tags, or hash (#) internal links.

    Robots.txt Advantages:

    • Super easy to implement
    • Conserves crawl budget for large sites
    • Ensures crawler doesn't get stuck. After all, if our website only has 500 pages that we really want indexed and ranked, and vehicle details pages constitute another 1,000,000,000 pages, it doesn't seem to make sense to make Googlebot crawl all of those pages.

    Robots.txt Disadvantages:

    • Doesn't prevent pages from being indexed, as we've seen, probably because there are internal links to these pages. We could nofollow these internal links, thereby minimizing indexation, but this would lead to each 10-25 noindex internal links on each Vehicle Listings page (will Google think we're pagerank sculpting?)

    Noindex Advantages:

    • Does prevent vehicle details pages from being indexed
    • Allows ALL pages to be crawled (advantage?)

    Noindex Disadvantages:

    • Difficult to implement (vehicle details pages are served using ajax, so they have no tag. Solution would have to involve X-Robots-Tag HTTP header and Apache, sending a noindex tag based on querystring variables, similar to this stackoverflow solution. This means the plugin functionality is no longer self-contained, and some hosts may not allow these types of Apache rewrites (as I understand it)

    • Forces (or rather allows) Googlebot to crawl hundreds of thousands of noindex pages.  I say "force" because of the crawl budget required.  Crawler could get stuck/lost in so many pages, and my not like crawling a site with 1,000,000,000 pages, 99.9% of which are noindexed.

    • Cannot be used in conjunction with robots.txt. After all, crawler never reads noindex meta tag if blocked by robots.txt

    Hash (#) URL Advantages:

    • By using for links on Vehicle Listing pages to Vehicle Details pages (such as "Contact Seller" buttons), coupled with Javascript, crawler won't be able to follow/crawl these links.  Best of both worlds:  crawl budget isn't overtaxed by thousands of noindex pages, and internal links used to index robots.txt-disallowed pages are gone.
    • Accomplishes same thing as "nofollowing" these links, but without looking like pagerank sculpting (?)
    • Does not require complex Apache stuff

    Hash (#) URL Disdvantages:

    • Is Google suspicious of sites with (some) internal links structured like this, since they can't crawl/follow them?

    Initially, we implemented robots.txt--the "sledgehammer solution." We figured that we'd have a happier crawler this way, as it wouldn't have to crawl zillions of partially duplicate vehicle details pages, and we wanted it to be like these pages didn't even exist. However, Google seems to be indexing many of these pages anyway, probably based on internal links pointing to them. We could nofollow the links pointing to these pages, but we don't want it to look like we're pagerank sculpting or something like that.

    If we implement noindex on these pages (and doing so is a difficult task itself), then we will be certain these pages aren't indexed. However, to do so we will have to remove the robots.txt disallowal, in order to let the crawler read the noindex tag on these pages. Intuitively, it doesn't make sense to me to make googlebot crawl zillions of vehicle details pages, all of which are noindexed, and it could easily get stuck/lost/etc. It seems like a waste of resources, and in some shadowy way bad for SEO.

    My developers are pushing for the third solution:  using the hash URLs. This works on all hosts and keeps all functionality in the plugin self-contained (unlike noindex), and conserves crawl budget while keeping vehicle details page out of the index (unlike robots.txt). But I don't want Google to slap us 6-12 months from now because it doesn't like links like these ().

    Any thoughts or advice you guys have would be hugely appreciated, as I've been going in circles, circles, circles on this for a couple of days now. Also, I can provide a test site URL if you'd like to see the functionality in action.

    1 Reply Last reply Reply Quote 0
    • Everett
      Everett @browndoginteractive last edited by Jan 30, 2014, 8:19 PM Jan 30, 2014, 8:19 PM

      Perhaps those URLs were indexed before you blocked them. If you have them blocked now, either by robots.txt and/or by robots meta noindex tag, you can use Google's URL Removal Tool in GWT to get them out of the index. It may take awhile though.

      I see nothing wrong with adding a nofollow tag to those href links. Go for it. If nothing else, it could help you salvage your crawl budget.

      1 Reply Last reply Reply Quote 1
      • browndoginteractive
        browndoginteractive @Everett last edited by Jan 30, 2014, 6:57 PM Jan 30, 2014, 6:57 PM

        Oh, I was under the mistaken impression that nofollowing the links would conserve that pagerank--a pretty outdated thought, I now realize.  Thanks for clearing that up!

        However, would you see any negatives to nofollowing the links just to keep Google from indexing the pages they lead to? Just so we avoid a zillion of those "A description for this result is not available because of this site's robots.txt" pages?

        Unfortunately, my developers are having trouble figuring out how to retain the functionality we have without href tags, so it's looking like we're going to keep those links.

        Again, thank you so much for lending your time and knowledge, Everett--you rock!

        Everett 1 Reply Last reply Jan 30, 2014, 8:19 PM Reply Quote 0
        • Everett
          Everett @browndoginteractive last edited by Jan 30, 2014, 6:57 PM Jan 29, 2014, 8:49 PM

          Nofollowing them won't help you conserve any of that pagerank for other links on the page. Instead, you would seek to make those something other than href tags. I'm not a developer, but here is one example that might help explain what I'm trying to say: http://www.quackit.com/javascript/popup_windows.cfm . Notice the javascript for the pop-up window on that page does not contain an href tag.

          browndoginteractive 1 Reply Last reply Jan 30, 2014, 6:57 PM Reply Quote 1
          • browndoginteractive
            browndoginteractive @Everett last edited by Jan 29, 2014, 3:51 PM Jan 29, 2014, 3:50 PM

            Everett,

            Thank you so very much for the thoughtful and really helpful answer.  We will implement the robots.txt disallow statements you suggested, and I will discuss with my developer the ability to reference just the id portion of the url.  We've begun the URL removal process in Webmaster Tools, and fortunately, in the vast majority of cases, the content hasn't been indexed due to robots.txt--just the URL.

            As far as all of the hrefs diluting pagerank, what are your thoughts on nofollowing these links?  We've had this on the table for some time, but haven't been able to come to a decision. It would curb the pagerank dilution, and it would probably keep Google from indexing those robots-disallowed pages. It's good to know these pages probably wouldn't ever trip a Panda/dupe content filter, but it still seems cleaner/neater for them not to be indexed at all. That said, I'm afraid nofollowing the links could look suspicious to Google. All combined, it would result in 25-35 nofollowed internal links on each page, with about the same amount dofollowed (if you include navigation, etc).

            Thank you again for lending your time and expertise to this answer.  It is truly, truly, truly appreciated.

            Everett 1 Reply Last reply Jan 29, 2014, 8:49 PM Reply Quote 1
            • Everett
              Everett last edited by Jan 29, 2014, 3:39 PM Jan 29, 2014, 3:26 PM

              The javascript you shared would allow Google to fairly easily access the page ending in dtc_inventory_ajax.php?id=29935291. If that's the page you want them to not be able to access, perhaps you'd be better off referencing just the id portion of the URL, which should be enough for the database to take the user to the right page.

              Regardless, you "should" be OK with just the robots.txt block, though all of the href tags are sort of diluting the amount of pagerank you can send to other pages from whatever page you're on.

              The robots.txt disallow statement you provided might be improved upon.

              Disallow: /*?

              The one above seems to me like it would only work on URLs that were in the root directory. Try this one instead of, or in addition to, the one above:

              Disallow: /?id=*

              Also I'd add this one to any Wordpress site, which in itself should take care of the issue if the URL in your script is an example of those that you're concerned about:

              Disallow: /wp-content/plugins/

              You can use the URL Removal Tool in Google Webmaster Tools to get the ones that have already been crawled out of the index. You can do it at the URL level, or at the directory level.

              Lastly, if you're blocking Google and the SERP says unable to display because of the robots.txt file I don't think you need to worry about the content on those pages affecting your site with regard to a Panda penalty or anything like that. However, if Google had already indexed the content on those pages you will want to remove the URLs via Webmaster Tools as described above.

              browndoginteractive 1 Reply Last reply Jan 29, 2014, 3:50 PM Reply Quote 2
              • browndoginteractive
                browndoginteractive @Matthew_Edgar last edited by Jan 28, 2014, 1:11 PM Jan 28, 2014, 1:11 PM

                Yes, I hear you on Google seeming to be able to crawl anything.  Here is the million-dollar question:  if Google is finding the links but not crawling the pages to get any content, are these pages still going to part of any Panda filter?  Could we be penalized for robots-disallowed pages?  My worry is yes.

                What are your thoughts on implementing rel=nofollow on these links?  That, combined with robots.txt, combined with the javascript, should have the intended effect.  I'm just a little reluctant for us to nofollow ~25-30 internal links on each page like this.

                As far duplicate content, no the pages are not exact duplicates, and there are things we could do to set them apart from everybody else.  We have some good ideas for functionality, actually.  But...I have to say I don't have enough faith in Google that this will keep us safe.  I'm afraid we could still trip some filter, and CRASH there goes the traffic.

                1 Reply Last reply Reply Quote 0
                • Matthew_Edgar
                  Matthew_Edgar last edited by Jan 27, 2014, 8:42 PM Jan 27, 2014, 8:40 PM

                  I think the JavaScript implementation might still be able to be crawled by Google. Any more, I'm becoming convinced that Google can crawl just about anything. But, I'll be curious to see what the results are. Definitely update this thread with what ends up happening from that approach.

                  As for the robots.txt message, that would indicate that they are finding the link to the page but not crawling the page to get any content.

                  As for duplicated content concerns, just to take a step back, are the pages 100% the same or are you making alterations to the text? If you can do easy things that make that page different from the other sites (even if it is functionality), then the page isn't a true duplicate and there might be some good reasons why people could want to find those pages in the search results.

                  Ultimately, you have the same page, but you are making the page better than those other websites. If that is the case, then you should be safe letting those pages rank. Where having the same content as your competitor really hurts (in my experience, anyway) is when you aren't offering anything different than any other sites.

                  Hope that helps.

                  browndoginteractive 1 Reply Last reply Jan 28, 2014, 1:11 PM Reply Quote 0
                  • browndoginteractive
                    browndoginteractive @Matthew_Edgar last edited by Jan 27, 2014, 5:53 PM Jan 27, 2014, 5:47 PM

                    Matthew, thank you so much for the thoughtful response!

                    We do not currently have a fallback solution for users with Javascript disabled, mainly because--as you said--Google could then access it, and we'd have the same problem we have now. We implemented the Javascript solution this weekend, resulting in button code like this:

                    [Contact Seller](javascript:void(0);)

                    We don't know yet if Google will be able to access this.  Any ideas? We've uploaded this version of our plugin to a new test site, in order to see what happens.

                    As for the robots.txt solution, Google actually indexed the urls after the robots.txt file was uploaded, and we did test the file in Webmaster Tools to confirm that it worked prior to uploading it. We used Disallow: /*? to try and keep Google from crawling/indexing our Ajax urls, which all have question marks in them (like the data-url link in the code above).

                    Some of the indexed pages look normal in the SERPs--like any indexed page with a normal description, etc--and others have the message:  "A description for this result is not available because of this site's robots.txt." I believe, from my research, that Google is indexing these pages based on the internal links to them.

                    It wouldn't be a tragedy if users navigated directly to the vehicle details pages, as we could make sure the pages are styled for them.  The bigger issue is that these pages are not really unique, given that multiple companies are pulling from the same database.

                    Any thoughts on the Javascript implementation?

                    1 Reply Last reply Reply Quote 0
                    • Matthew_Edgar
                      Matthew_Edgar last edited by Jan 27, 2014, 5:47 PM Jan 27, 2014, 5:27 PM

                      Hey,

                      This is definitely a complicated issue, and there is some risk in making a move in the wrong direction.

                      Here are my thoughts which might help you out. Feel free to private message me or shoot me an email (see my profile) and I'd be happy to talk more.

                      On the hash solution, would that require JavaScript be enabled in order to access those pages or would you have a fallback solution for those without JavaScript?

                      If you don't have a fallback solution for those without JavaScript, you might negatively affect visitors with disabilities. For instance, some types of Ajax are challenging for people with disabilities to access (see here to start digging into that: http://webaim.org/techniques/javascript/).

                      Thing is, if you have a fallback solution, Google could still access those. However, Google may still be able to access those pages with JavaScript as Google can execute some forms of JavaScript. Given that, the more appropriate solution would be to use the robots.txt file. You mentioned, though, that the command you put in didn't seem to work since Google kept indexing those pages. Couple questions:

                      First, did Google index those pages after the change or had those pages been indexed prior to the robots.txt change? Things take time, so I'm wondering if you didn't give them enough time to adjust.

                      The other question would be whether or not you tested the robots.txt file in Google Webmaster Tools? That just gives you an extra verification that it should work.

                      Also, you mentioned something interesting about the Vehicle Detail pages: "these pages are not meant for visitors to navigate to directly!" Given that is the case, is it possible for your developers to add some sort of server-side check to see if people are accessing the detail pages from the listing pages?

                      For instance, on some sites I've worked a cookie is set when you've reached the listing page that says "this person is okay to reach the detail page" and then the visitor can only reach the detail page if that cookie is set. Without that cookie, the visitor is redirected back to a listing page. Not sure how exactly that would work on your site, but it might be a way to keep visitors who find those pages in a Google search result from seeing the incorrectly styled page.

                      I hope that helps. Like I said, feel free to email me or private message me if you'd like me to take a look at your site or chat with you about more particulars.

                      Thanks!

                      browndoginteractive 1 Reply Last reply Jan 27, 2014, 5:47 PM Reply Quote 1
                      • 1 / 1
                      1 out of 10
                      • First post
                        1/10
                        Last post

                      Got a burning SEO question?

                      Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.


                      Start my free trial


                      Browse Questions

                      Explore more categories

                      • Moz Tools

                        Chat with the community about the Moz tools.

                      • SEO Tactics

                        Discuss the SEO process with fellow marketers

                      • Community

                        Discuss industry events, jobs, and news!

                      • Digital Marketing

                        Chat about tactics outside of SEO

                      • Research & Trends

                        Dive into research and trends in the search industry.

                      • Support

                        Connect on product support and feature requests.

                      • See all categories

                      Related Questions

                      • davidmac

                        Upper and lower case URLS coming up as duplicate content

                        Hey guys and gals, I'm having a frustrating time with an issue. Our site has around 10 pages that are coming up as duplicate content/ duplicate title. I'm not sure what I can do to fix this. I was going to attempt to 301 direct the upper case to lower but I'm worried how this will affect our SEO. can anyone offer some insight on what I should be doing? Update:  What I'm trying to figure out is what I should do for our URL's. For example, when I run an audit I'm getting two different pages: aaa.com/BusinessAgreement.com and also aaa.com/businessagreement.com. We don't have two pages but for some reason, Google thinks we do.

                        Intermediate & Advanced SEO | Jun 11, 2019, 8:34 PM | davidmac
                        1
                      • ostesmorbrod

                        Landing pages for paid traffic and the use of noindex vs canonical

                        A client of mine has a lot of differentiated landing pages with only a few changes on each, but with the same intent and goal as the generic version. The generic version of the landing page  is included in navigation, sitemap and is indexed on Google. The purpose of the differentiated landing pages is to include the city and some minor changes in the text/imagery to best fit the Adwords text. Other than that, the intent and purpose of the pages are the same as the main / generic page. They are not to be indexed, nor am I trying to have hidden pages linking to the generic and indexed one (I'm not going the blackhat way). So – I want to avoid that the duplicate landing pages are being indexed (obviously), but I'm not sure if I should use noindex (nofollow as well?) or rel=canonical, since these landing pages are localized campaign versions of the generic page with more or less only paid traffic to them. I don't want to be accidentally penalized, but I still need the generic / main page to rank as high as possible... What would be your recommendation on this issue?

                        Intermediate & Advanced SEO | Sep 7, 2017, 7:34 AM | ostesmorbrod
                        0
                      • nchlondon

                        Directory with Duplicate content? what to do?

                        Moz keeps finding loads of pages with duplicate content on my website. The problem is its a directory page to different locations. E.g if we were a clothes shop we would be listing our locations: www.sitename.com/locations/london www.sitename.com/locations/rome www.sitename.com/locations/germany The content on these pages is all the same, except for an embedded google map that shows the location of the place. The problem is that google thinks all these pages are duplicated content. Should i set a canonical link on every single page saying that www.sitename.com/locations/london is the main page? I don't know if i can use canonical links because the page content isn't identical because of the embedded map. Help would be appreciated. Thanks.

                        Intermediate & Advanced SEO | Sep 30, 2016, 8:16 AM | nchlondon
                        0
                      • ajiabs

                        Duplicate content due to parked domains

                        I have a main ecommerce website with unique content and decent back links. I had few domains parked on the main website as well specific product pages. These domains had some type in traffic. Some where exact product names.  So main main website www.maindomain.com had domain1.com , domain2.com parked on it. Also had domian3.com parked on www.maindomain.com/product1. This caused lot of duplicate content issues. 12 months back, all the parked domains were changed to 301 redirects. I also added all the domains to google webmaster tools. Then removed main directory from google index. Now realize few of the additional domains are indexed and causing duplicate content. My question is what other steps can I take to avoid the duplicate content for my my website 1. Provide change of address in Google search console. Is there any downside in providing change of address pointing to a website? Also domains pointing to a specific url , cannot provide change of address 2. Provide a remove page from google index request in Google search console. It is temporary and last 6 months. Even if the pages are removed from Google index, would google still see them duplicates? 3. Ask google to fetch each url under other domains and submit to google index. This would hopefully remove the urls under domain1.com and doamin2.com eventually due to 301 redirects. 4. Add canonical urls for all pages in the main site. so google will eventually remove content from doman1 and domain2.com due to canonical links. This wil take time for google to update their index 5. Point these domains elsewhere to remove duplicate contents eventually. But it will take time for google to update their index with new non duplicate content. Which of these options are best best to my issue and which ones are potentially dangerous? I would rather not to point these domains elsewhere. Any feedback would be greatly appreciated.

                        Intermediate & Advanced SEO | Jan 17, 2016, 6:02 PM | ajiabs
                        0
                      • MyPetWarehouse

                        Duplicate Content through 'Gclid'

                        Hello, We've had the known problem of duplicate content through the gclid parameter caused by Google Adwords. As per Google's recommendation - we added the canonical tag to every page on our site so when the bot came to each page they would go 'Ah-ha, this is the original page'. We also added the paramter to the URL parameters in Google Wemaster Tools. However, now it seems as though a canonical is automatically been given to these newly created gclid pages; below https://www.google.com.au/search?espv=2&q=site%3Awww.mypetwarehouse.com.au+inurl%3Agclid&oq=site%3A&gs_l=serp.3.0.35i39l2j0i67l4j0i10j0i67j0j0i131.58677.61871.0.63823.11.8.3.0.0.0.208.930.0j3j2.5.0....0...1c.1.64.serp..8.3.419.nUJod6dYZmI Therefore these new pages are now being indexed, causing duplicate content. Does anyone have any idea about what to do in this situation? Thanks, Stephen.

                        Intermediate & Advanced SEO | Oct 20, 2015, 9:00 PM | MyPetWarehouse
                        0
                      • YairSpolter

                        Block in robots.txt instead of using canonical?

                        When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed? In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt?

                        Intermediate & Advanced SEO | Jul 23, 2014, 11:19 AM | YairSpolter
                        0
                      • gXeSEO

                        Is an RSS feed considered duplicate content?

                        I have a large client with satellite sites. The large site produces many news articles and they want to put an RSS feed on the satellite sites that will display the articles from the large site. My question is, will the rss feeds on the satellite sites be considered duplicate content? If yes, do you have a suggestion to utilize the data from the large site without being penalized? If no, do you have suggestions on what tags should be used on the satellite pages? EX: wrapped in tags? THANKS for the help. Darlene

                        Intermediate & Advanced SEO | Feb 12, 2013, 8:14 AM | gXeSEO
                        0
                      • gregelwell

                        Could you use a robots.txt file to disalow a duplicate content page from being crawled?

                        A website has duplicate content pages to make it easier for users to find the information from a couple spots in the site navigation. Site owner would like to keep it this way without hurting SEO. I've thought of using the robots.txt file to disallow search engines from crawling one of the pages. Would you think this is a workable/acceptable solution?

                        Intermediate & Advanced SEO | Jun 4, 2012, 9:15 PM | gregelwell
                        0

                      Get started with Moz Pro!

                      Unlock the power of advanced SEO tools and data-driven insights.

                      Start my free trial
                      Products
                      • Moz Pro
                      • Moz Local
                      • Moz API
                      • Moz Data
                      • STAT
                      • Product Updates
                      Moz Solutions
                      • SMB Solutions
                      • Agency Solutions
                      • Enterprise Solutions
                      Free SEO Tools
                      • Domain Authority Checker
                      • Link Explorer
                      • Keyword Explorer
                      • Competitive Research
                      • Brand Authority Checker
                      • Local Citation Checker
                      • MozBar Extension
                      • MozCast
                      Resources
                      • Blog
                      • SEO Learning Center
                      • Help Hub
                      • Beginner's Guide to SEO
                      • How-to Guides
                      • Moz Academy
                      • API Docs
                      About Moz
                      • About
                      • Team
                      • Careers
                      • Contact
                      Why Moz
                      • Case Studies
                      • Testimonials
                      Get Involved
                      • Become an Affiliate
                      • MozCon
                      • Webinars
                      • Practical Marketer Series
                      • MozPod
                      Connect with us

                      Contact the Help team

                      Join our newsletter
                      Moz logo
                      © 2021 - 2025 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                      • Accessibility
                      • Terms of Use
                      • Privacy

                      Looks like your connection to Moz was lost, please wait while we try to reconnect.