Block search engines from URLs created by internal search engine?
-
Hey guys,
I've got a question for you all that I've been pondering for a few days now. I'm currently doing an SEO Technical Audit for a large scale directory.
One major issue that they are having is that their internal search system (Directory Search) will create a new URL everytime a search query is entered by the user. This creates huge amounts of duplication on the website.
I'm wondering if it would be best to block search engines from crawling these URLs entirely with Robots.txt?
What do you guys think? Bearing in mind there are probably thousands of these pages already in the Google index?
Thanks
Kim
-
That sounds perfect - if the user-generated URLs are getting enough traffic, make them permanent pages and 301-redirect or canonical. If not, weed them out of the index.
-
Thanks for your reply Dr. Meyers. I think you're probably right.
Yes I'm recommending they define a canonical set of pages that are the most popular searches, categories and locations which can be reached via internal links and we'll get all those duplicates re-directed back to that canonical set.
But for pages that fall outside those categories and locations, I'll recommend a meta-no-index tag.
-
It can be a complicated question on a very large site, but in most cases I'd META NOINDEX those pages. Robots.txt isn't great at removing content that's already been indexed. Admittedly, NOINDEX will take a while to work (virtually any solution will), as Google probably doesn't crawl these pages very often.
Generally, though, the risk of having your index explode with custom search pages is too high for a site like yours (especially post-Panda). I do think blocking those pages somehow is a good bet.
The only exception I would add is if some of the more popular custom searches are getting traffic and/or links. I assume you have a solid internal link structure and other paths to these listings, but if it looks like a few searches (or a few dozen) have attracted traffic and back-links, you'll want to preserve those somehow.
-
Sure, check below and some of the duplication I mean:
Capitalization Duplication
http://yellow.co.nz/yellow+pages/Car+dealer/Auckland+Region
http://yellow.co.nz/yellow+pages/Car+Dealer/Auckland+Region
With a few URL parameters
And with location duplication
http://yellow.co.nz/yellow+pages/Car+Dealer/Auckland
Let me know if you need any more info!
Cheers
Kim
-
Whats the content look like on the new url? Can you give us an example?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Creating a site search engine while keeping SEO factors in mind
I run and own my own travel photography business. (www.mickeyshannon.com) I've been looking into building a search archive of photos that don't necessarily need to be in the main galleries, as a lot of older photos are starting to really clutter up and take away the emphasis from the better work. However, I still want to keep these older photos around. My plan is to simplify my galleries, and pull out 50-75% of the lesser/older photos. All of these photos will still be reachable by a custom-build simple search engine that I'm building to house all these older photos. The photos will be searchable based on keywords that I attach to each photo as I add them to my website. The question I have is whether this will harm me for having duplicate content? Some of the keywords that would be used in the search archive would be similar or the same to the main gallery names. However, I'm also really trying to push my newer and better images out there to the front. I've read some articles that talk about noindexing search keyword results, but that would make it really difficult for search engines to even find the older photos, as searching for their keywords would be the only way to find them. Any thoughts on a way to work this out that benefits, or at least doesn't hurt me, SEO-wise?
Intermediate & Advanced SEO | | msphotography0 -
Should I include URLs that are 301'd or only include 200 status URLs in my sitemap.xml?
I'm not sure if I should be including old URLs (content) that are being redirected (301) to new URLs (content) in my sitemap.xml. Does anyone know if it is best to include or leave out 301ed URLs in a xml sitemap?
Intermediate & Advanced SEO | | Jonathan.Smith0 -
Does Automated High Quality Content Look Like Low Quality to Search Engines?
I have 1,000+ pages that all have very similar writing, but different results.
Intermediate & Advanced SEO | | khi5
Example:
Nr of days on market
Average sales price
Median sales price
etc etc etc All the results are very different for each neighborhood. However, as per the above, the wording is similar. The content is very valuable to users. However, I am concerned search engines may see it as low quality content, as wording is identical across all these pages (except the results). Any view on this? Any examples to back up such views?0 -
Company Blog at a different URL
Ok, I have been doing a lot of work over the past 6 months, disavowing low quality links from spammy directories to our company website, etc. However, my efforts seem to have had a negative, not positive effect. This has brought me back to reconsidering what we are doing as we have lost a good amount of traction on the nationwide Google rankings specifically. Considering our company blog - platinumcctv(dot)net - we have used this blog for a long time to inform customers of new products, software developments and then to provide them links to purchase those components. Last week, I revamped the nearly default wordpress theme to another on a piece of advice. However, someone told me that all of our links should be nofollow, even though it is a company blog because we have many links coming from this domain, and it could be found as spammy. Potato/Potato - But before I start the tedious task of changing every link to no follow on a whim, i searched a lot, but have found no CLEAR substantiation of this. Any ideas? Other recommendations appreciated as well! Platinum-CCTV(dot)com
Intermediate & Advanced SEO | | PTCCTV0 -
Blog Not Ranking Well at All in Search Engines, Need Help!
Hi Mozers, Need some help on a CMS I've been working with over the last year. The CMS is built by a team of guys here in Washington State. Basically, I'm having issues with clients content on the blog system not getting ranking correctly at all. Here's a few problems I've noticed: Could you confirm and scale these problems based upon being, "not a problem" "a problem" and "critical must fix" 1. The title tag is pulling from the title of the article which is also automatically generating a URL with underscores instead of dashes. Is having a duplicate URL, Title, and Title tag spammy looking to search engines? Are underscores on long URL's confusing google? Where shorter one's are fine (i.e. domain/i_pad/
Intermediate & Advanced SEO | | Keith-Eneix
(i.e.http://www.ductvacnw.com/blog/archives/2013/05/20/5_reasons_to_hire_a_professional_to_clean_your_air_ducts_and_vents), 2. The CMS is resolving all URL's with a canonical instead of a 301 redirect (I've told webmaster tools which preferred url should be indexed). Does using a canonical over a 301 redirect cause any confusion with Google? Is one better practice then the other? 3. The H1 tags on the blog pull from "blog category" instead of the title of the blog post. Is this is a problem? 4. The URl's are quite long with the added "archives/2013/05/20/5". Does this cause problems by pushing the main target keyword further away from the domain name? 5. I'm also noticing the blog post is actually not part of the breadcrumbs where we normally would expect that to populate after the blog category name, Problem? These are some of the things I've noticed and need clarification on. If you see anything else please let me know?0 -
Reducing Booking Engine Indexation
Hi Mozzers, I am working on a site with a very useful room booking engine. Helpful as it may be, all the variations (2 bedrooms, 3 bedrooms, room with a view, etc, etc,) are indexed by Google. Section 13 on Search Pagination in Dr. Pete's great post on Panda http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world speaks to our issue, but I was wondering since 2 (!) years have gone by, if there are any additional solutions y'all might recommend. We want to cut down on the duplicate titles and content and get the useful but not useful for SERPs online booking pages out of the index. Any thoughts? Thanks for your help.
Intermediate & Advanced SEO | | Leverage_Marketing0 -
How do I create a XML Sitemap?
It appears that the free online tools limit the number of URLs they'll include. What tools have you had success with?
Intermediate & Advanced SEO | | NaHoku1 -
Should product searches (on site searches) be noindex?
We have a large new site that is suffering from a sitewide panda like penalty. The site has 200k pages indexed by Google. Lots of category and sub category page content and about 25% of the product pages have unique content hand written (vs the other pages using copied content). So it seems our site is labeled as thin. I'm wondering about using noindex paramaters for the internal site search. We have a canonical tag on search results pointing to domain.com/search/ (client thought that would help) but I'm wondering if we need to just no index all the product search results. Thoughts?
Intermediate & Advanced SEO | | iAnalyst.com0