For large sites, best practices for pages hidden behind internal search?
-
If a website has 1M+ pages, with most of them being hidden behind an internal search, what's the best way to get pages included in an engine's index?
Does a direct clickpath to those pages need to exist from the homepage or other major hub pages on the site?
Is submitting an XML sitemap enough?
-
Hello Vlevit,
You could do several things. I recommend giving Google your product feed, which should accomplish your goals. Another possible solution would be to make those search pages noindex,follow so they don't end up getting indexed, but Google can still use them for discovery.
Thanks for explaining the situation.
Below is more on submitting product feeds. It is for Google Product Search, but I would imagine the "link" field where you put the URL to your product detail page will help those pages get indexed in the standard results:
http://support.google.com/merchants/bin/answer.py?hl=en&answer=188494#USEverett
-
Everett, thanks for your reply. I understand the problems of showing internal search pages. I'm not looking to have internal search results being indexed, just the pages that the results link to. We're in eCommerce.
I was under the impression that there was a clever way to have the individual product pages indexed without establishing a direct click path, but best practices recommend otherwise.
Question answered. Thanks all for your help.
-
Hello Vlevit,
If you can be more specific we may be able to be of more help. Google doesn't want you to show internal search result pages, but if this is a different type of situation it there may be an exception. Are these search result pages, product pages, category pages, content pages.... is it an eCommerce site, community, content site... ?
Generally speaking, 1M+ pages with no links going into them and content that is either sparce/thin or partially/fully duplicated on other similar pages (like a search for widgets and a search for green widgets showing overlapping content) is exactly the type of thing that will get you in hot water that would affect even the rankings of your home page.
Do you feel like your question has been answered or would you like to be more specific about your site and goals?
Cheers,
Everett
-
This is what I was assuming, but was wondering if there was a clever way around creating direct click paths to those pages, while still maintaining their importance to the site. Thanks for the info.
-
Make sure they are part of the actual structure of your website, not just part of search. Meaning, you have to have links pointing at them. Also, you will also want to make sure that those pages have value.
-
Hi vlevit,
The best practice would be to exist a direct path of flow from index page. Something like: index -> category(filter) -> subcategory(filter) -> page/product. But in some cases xml sitemaps can also help you in indexing.
BUT, beware with to large XML sitemaps, try to create more then one sitemap, group them as possible.
A few very good resources can be found under the next links:
http://www.seomoz.org/ugc/solving-new-content-indexation-issues-for-large-b2b-websites
http://www.seomoz.org/qa/view/29009/sitemaps-management-for-big-sites-tens-of-millions-of-pages
I hope it helpes,
Istvan
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Removing site subdomains from Google search
Hi everyone, I hope you are having a good week? My website has several subdomains that I had shut down some time back and pages on these subdomains are still appearing in the Google search result pages. I want all the URLs from these subdomains to stop appearing in the Google search result pages and I was hoping to see if anyone can help me with this. The subdomains are no longer under my control as I don't have web hosting for these sites (so these subdomain sites just show a default hosting server page). Because of this, I cannot verify these in search console and submit a url/site removal request to Google. In total, there are about 70 pages from these subdomains showing up in Google at the moment and I'm concerned in case these pages have any negative impacts on my SEO. Thanks for taking the time to read my post.
Technical SEO | | QuantumWeb620 -
Search Term not appearing in On-Page Grader but is in code?
Hi, I have never had this issue in the past, but I am working on a website currently which is - http://hrfoundations.co.uk I didn't develop this website and the pages are in separate folders on the FTP which I have never come across before but hey ho. Anyway I have added the search term (HR Consultancy West Midlands) to the about us page in the h1, title and description however when I run the On-Page Grader it only says that I have entered it within the body which I haven't even done yet. When I view the page source it has defiantly been uploaded and the search terms are there, I have tried checking the spelling incase I spelt it incorrectly and I still can't seem to find the issue. Has anyone else experience this in the past, I'm going to go away now and check the code for any unclosed divs etc to see if that may be causing any issue's. Thanks Chris
Technical SEO | | chrissmithps0 -
Bringing a large news site back on line - anything to look out for?
Hi, I'm advising an online news site site that has been completely offline for almost 6 months, and is now looking to start back up again. The site seems to be completely gone from google's cache. This might mean moving to new hosting, but with the same URL. The archive has about 7000 original articles. Most of these are date specific news, although there are some longer investigative pieces that are more timeless. Is there any difference (from an SEO/digital marketing perspective) between putting the whole archive online at once, or gradually republishing the old articles? Is there anything I should be aware of, when restarting a website of this size? Thanks - Chris
Technical SEO | | AISFM0 -
Disallow: /404/ - Best Practice?
Hello Moz Community, My developer has added this to my robots.txt file: Disallow: /404/ Is this considered good practice in the world of SEO? Would you do it with your clients? I feel he has great development knowledge but isn't too well versed in SEO. Thank you in advanced, Nico.
Technical SEO | | niconico1011 -
Best way to address duplicate news sections within site
A client has a news section at www.clientsite.com/news and also at subdomain.clientsite.com/news. The stories within each section are identical: www.clientsite.com/news/story-11-5-2011 subdomain.clientsite.com/news/story-11-5-2011 What's the best way to avoid a duplicate content issue within the site? A 301 redirect doesn't seem appropriate from the user experience point of view. Is applying a rel=canonical <www.clientsite.com news="" story-a-b-c="">to each story within the subdomain news section the best option? They have 100's of stories, wondering if there might be an easier way?</www.clientsite.com> Also, the news pages list the story headline and the first 3 lines of copy. Do these summaries present duplicate content issues with the full story page? Thank you!
Technical SEO | | 540SEO0 -
Page rank 2 for home page, 3 for service pages
Hey guys, I have noticed with one of our new sites, the home page is showing page rank two, whereas 2 of the internal service pages are showing as 3. I have checked with both open site explorer and yahoo back links and there are by far more links to the home page. All quality and relevant directory submissions and blog comments. The site is only 4 months old, I wonder if anyone can shed any light on the fact 2 of the lesser linked pages are showing higher PR? Thanks 🙂
Technical SEO | | Nextman0 -
How to do a no follow on site search
We have a site search that is causing a huge amount of errors as the SEOmoz crawler is showing these as duplicate content. Our first thought was to do a no-follow on the site-search directory, but we realized that the site search is /site-search.aspx and URl strings appear at the end for hundreds of pages. How dow we/how can we no-follow an undetermined amount of URL strings?
Technical SEO | | Apptixweb0 -
What are the SEOmoz-suggested best practices for limiting the number of 301 redirects for a given site?
I've read some vague warnings of potential problems with having a long list of 301 redirects within an htaccess file. If this is a problem, could you provide any guidance on how much is too much? And if there is a problem associated with this, what is that problem exactly?
Technical SEO | | roush0