How do the Quoras of this world index their content?
-
I am helping a client index lots and lots of pages, more than one million pages. They can be seen as questions on Quora. In the Quora case, users are often looking for the answer on a specific question, nothing else.
On Quora there is a structure setup on the homepage to let the spiders in. But I think mostly it is done with a lot of sitemaps and internal linking in relevancy terms and nothing else... Correct? Or am I missing something?
I am going to index about a million question and answers, just like Quora. Now I have a hard time dealing with structuring these questions without just doing it for the search engines. Because nobody cares about structuring these questions. The user is interested in related questions and/or popular questions, so I want to structure them in that way too.
This way every question page will be in the sitemap, but not all questions will have links from other question pages linking to them. These questions are super longtail and the idea is that when somebody searches this exact question we can supply them with the answer (onpage will be perfectly optimised for people searching this question). Competition is super low because it is all unique user generated content.
I think best is just to put them in sitemaps and use an internal linking algorithm to make the popular and related questions rank better. I could even make sure every question has at least one other page linking to it, thoughts?
Moz, do you think when publishing one million pages with quality Q/A pages, this strategy is enough to index them and to rank for the question searches? Or do I need to design a structure around it so it will all be crawled and each question will also receive at least one link from a "category" page.
-
Wow, that is insane right?
https://www.quora.com/sitemap/questions?page_id=50
I wonder how long this carries on.
-
Quora don't seem to have a XML sitemap but a HTML one :
[https://www.quora.com/robots.txt](https://www.quora.com/robots.txt) refers to [https://www.quora.com/sitemap](https://www.quora.com/sitemap)
-
Yes there are many challenges and external linking is definitely one of them.
What do you think about sitemaps to get this longtail indexed? I think that a lot can be indexed by submitting the sitemaps.
-
There are many challenges to building a really large site. Most of them are related to building the site, but one that often kills the success of the site is the ability to get the pages into the index and keep them there. This requires a steady flow of spiders into the deepest pages of the site. If you don't have continuous and repetitive spider flow the pages will be indexed, but then forgotten, before the spiders return.
An effective way to get deep spidering is have powerful links permanently connected to many deep hub pages throughout the site. This produces a flow of spiders into the site and forces them to chew their way out, indexing pages as they go. These links must be powerful or the spiders will index a couple of pages and die. These links must be permanent because if they are removed the flow of spiders will stop and pages in the index will be forgotten.
The goal of the hub pages is to create spider webs through the site that allow spiders to index all of the pages on short link paths, rather than requiring the spiders to crawl through long tunnels of many consecutive links to get everything indexed.
Lots of people can build a big site, but only some of those people have the resources to get the powerful, permanent links that are required to get the pages indexed and keep them in the index. You can't rely on internal links alone for the powerful, permanent links because most spiders that enter any site come from external sources rather than spontaneously springing up deep in the bowels of your website.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Question about Indexing of /?limit=all
Hi, i've got your SEO Suite Ultimate installed on my site (www.customlogocases.com). I've got a relatively new magento site (around 1 year). We have recently been doing some pr/seo for the category pages, for example /custom-ipad-cases/ But when I search on google, it seems that google has indexed the /custom-ipad-cases/?limit=all This /?limit=all page is one without any links, and only has a PA of 1. Whereas the standard /custom-ipad-cases/ without the /? query has a much higher pa of 20, and a couple of links pointing towards it. So therefore I would want this particular page to be the one that google indexes. And along the same logic, this page really should be able to achieve higher rankings than the /?limit=all page. Is my thinking here correct? Should I disallow all the /? now, even though these are the ones that are indexed, and the others currently are not. I'd be happy to take the hit while it figures it out, because the higher PA pages are what I ultimately am getting links to... Thoughts?
Intermediate & Advanced SEO | | RobAus0 -
Glossary index and individual pages create duplicate content. How much might this hurt me?
I've got a glossary on my site with an index page for each letter of the alphabet that has a definition. So the M section lists every definition (the whole definition). But each definition also has its own individual page (and we link to those pages internally so the user doesn't have to hunt down the entire M page). So I definitely have duplicate content ... 112 instances (112 terms). Maybe it's not so bad because each definition is just a short paragraph(?) How much does this hurt my potential ranking for each definition? How much does it hurt my site overall? Am I better off making the individual pages no-index? or canonicalizing them?
Intermediate & Advanced SEO | | LeadSEOlogist0 -
Duplicate content based on filters
Hi Community, There have probably been a few answers to this and I have more or less made up my mind about it but would like to pose the question or as that you post a link to the correct article for this please. I have a travel site with multiple accommodations (for example), obviously there are many filter to try find exactly what you want, youcan sort by region, city, rating, price, type of accommodation (hotel, guest house, etc.). This all leads to one invevitable conclusion, many of the results would be the same. My question is how would you handle this? Via a rel canonical to the main categories (such as region or town) thus making it the successor, or no follow all the sub-category pages, thereby not allowing any search to reach deeper in. Thanks for the time and effort.
Intermediate & Advanced SEO | | ProsperoDigital0 -
Duplicate content on subdomains
Hi All, The structure of the main website goes by http://abc.com/state/city/publication - We have a partnership with public libraries to give local users access to the publication content for free. We have over 100 subdomains (each for an specific library) that have duplicate content issues with the root domain, Most subdomains have very high page authority (the main public library and other local .gov websites have links to this subdomains).Currently this subdomains are not index due to the robots text file excluding bots from crawling. I am in the process of setting canonical tags on each subdomain and open the robots text file. Should I set the canonical tag on each subdomain (homepage) to the root domain version or to the specific city within the root domain? Example 1:
Intermediate & Advanced SEO | | NewspaperArchive
Option 1: http://covina.abc.com/ = Canonical Tag = http://abc.com/us/california/covina/
Option 2: http://covina.abc.com/ = Canonical Tag = http://abc.com/ Example 2:
Option 1: http://galveston.abc.com/ = Canonical Tag = http://abc.com/us/texas/galveston/
Option 2: http://galveston.abc.com = Canonical Tag = http://abc.com/ Example 3:
Option 1: http://hutchnews.abc.com/ = Canonical Tag = http://abc.com/us/kansas/hutchinson/
Option 2: http://hutchnews.abc.com/ = Canonical Tag = http://abc.com/ I believe it makes more sense to set the canonical tag to the corresponding city (option 1), but wondering if setting the canonical tag to the root domain will pass "some link juice" to the root domain and it will be more beneficial. Thanks!0 -
Moving some content to a new domain - best practices to avoid duplicate content?
Hi We are setting up a new domain to focus on a specific product and want to use some of the content from the original domain on the new site and remove it from the original. The content is appropriate for the new domain and will be irrelevant for the original domain and we want to avoid creating completely new content. There will be a link between the two domains. What is the best practice for this to avoid duplicate content and a potential Panda penalty?
Intermediate & Advanced SEO | | Citybase0 -
Differentiating Content
I have a piece of content (that is similar) that legitimately shows up on two different sites. I would like both to link, but it seems as if they are "flip flopping" in ranking. Sometimes one shows up, sometimes another. What's the best way to differentiate a piece of content like this? Does it mean rewriting one entirely? http://www.simplifiedbuilding.com/solutions/ada-handrail/ http://simplifiedsafety.com/solutions/ada-handrail/ I want to the Simplified Building one to be found first if I had a preference.
Intermediate & Advanced SEO | | CPollock0 -
Duplicate content that looks unique
OK, bit of an odd one. The SEOmoz crawler has flagged the following pages up as duplicate content. Does anyone have any idea what's going on? http://www.gear-zone.co.uk/blog/november-2011/gear$9zone-guide-to-winter-insulation http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone http://www.gear-zone.co.uk/blog/july-2011/telephone-issues-$9-2nd-july-2011 http://www.gear-zone.co.uk/blog/september-2011/gear$9zone-guide-to-nordic-walking-poles http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone https://www.google.com/webmasters/tools/googlebot-fetch?hl=en&siteUrl=http://www.gear-zone.co.uk/
Intermediate & Advanced SEO | | neooptic0 -
Ranking with other pages not index
The site ranks on page 4-5 with other page like privacy, about us, term pages. I encounter this problem allot in the last weeks; this usually occurs after the page sits 1-2 months on page 1 for the terms. I'm thinking of to much use the same anchor as a primary issue. The sites in questions are 1-5 pages microniche sites. Any suggestions is appreciated. Thank You
Intermediate & Advanced SEO | | m3fan0