How do the Quoras of this world index their content?
-
I am helping a client index lots and lots of pages, more than one million pages. They can be seen as questions on Quora. In the Quora case, users are often looking for the answer on a specific question, nothing else.
On Quora there is a structure setup on the homepage to let the spiders in. But I think mostly it is done with a lot of sitemaps and internal linking in relevancy terms and nothing else... Correct? Or am I missing something?
I am going to index about a million question and answers, just like Quora. Now I have a hard time dealing with structuring these questions without just doing it for the search engines. Because nobody cares about structuring these questions. The user is interested in related questions and/or popular questions, so I want to structure them in that way too.
This way every question page will be in the sitemap, but not all questions will have links from other question pages linking to them. These questions are super longtail and the idea is that when somebody searches this exact question we can supply them with the answer (onpage will be perfectly optimised for people searching this question). Competition is super low because it is all unique user generated content.
I think best is just to put them in sitemaps and use an internal linking algorithm to make the popular and related questions rank better. I could even make sure every question has at least one other page linking to it, thoughts?
Moz, do you think when publishing one million pages with quality Q/A pages, this strategy is enough to index them and to rank for the question searches? Or do I need to design a structure around it so it will all be crawled and each question will also receive at least one link from a "category" page.
-
Wow, that is insane right?
https://www.quora.com/sitemap/questions?page_id=50
I wonder how long this carries on.
-
Quora don't seem to have a XML sitemap but a HTML one :
[https://www.quora.com/robots.txt](https://www.quora.com/robots.txt) refers to [https://www.quora.com/sitemap](https://www.quora.com/sitemap)
-
Yes there are many challenges and external linking is definitely one of them.
What do you think about sitemaps to get this longtail indexed? I think that a lot can be indexed by submitting the sitemaps.
-
There are many challenges to building a really large site. Most of them are related to building the site, but one that often kills the success of the site is the ability to get the pages into the index and keep them there. This requires a steady flow of spiders into the deepest pages of the site. If you don't have continuous and repetitive spider flow the pages will be indexed, but then forgotten, before the spiders return.
An effective way to get deep spidering is have powerful links permanently connected to many deep hub pages throughout the site. This produces a flow of spiders into the site and forces them to chew their way out, indexing pages as they go. These links must be powerful or the spiders will index a couple of pages and die. These links must be permanent because if they are removed the flow of spiders will stop and pages in the index will be forgotten.
The goal of the hub pages is to create spider webs through the site that allow spiders to index all of the pages on short link paths, rather than requiring the spiders to crawl through long tunnels of many consecutive links to get everything indexed.
Lots of people can build a big site, but only some of those people have the resources to get the powerful, permanent links that are required to get the pages indexed and keep them in the index. You can't rely on internal links alone for the powerful, permanent links because most spiders that enter any site come from external sources rather than spontaneously springing up deep in the bowels of your website.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Content From API - Remove or to Redirect ?
Hi Guys,
Intermediate & Advanced SEO | | PaddyM556
I am working on a site at the moment,
Previous developer used a API to pull in HealthCare content (HSE) .
So the API basically generates landing pages within the site, and generates the content.
To date it has over 2k in pages being generated.
Some actually rank organically and some don't. New site being launch: So a new site is being launched & the "health advice" where this content used to live be not included in the new site. So this content will not have a place to be displayed. My Query: Would you allow the old content die off in the migration process & just become 404's
Or
Would you 301 redirect the all or only ranking pages to the homepage ? Other considerations, site will be moved to https:// so site will be submitted to search console & re-indexed by Google. Would love to hear if anyone had similar situation or suggestions.
Best Regards
Pat0 -
Why is a canonicalized URL still in index?
Hi Mozers, We recently canonicalized a few thousand URLs but when I search for these pages using the site: operator I can see that they are all still in Google's index. Why is that? Is it reasonable to expect that they would be taken out of the index? Or should we only expect that they won't rank as high as the canonical URLs? Thanks!
Intermediate & Advanced SEO | | yaelslater0 -
Blog Content In different language not indexed - HELP PLEASE!
I have an ecommerce site in English and a blog that is in Malay language. We have started the blog 3 weeks ago with about 20-30 articles written. Ecommerce is using MAgento CMS and Blog is wordpress. URL Structure: Ecommerce: www.example.com Blog: www.example.com/blog Blog category: www.example.com/blog/category/ However, google is indexing all pages including blog category but not individual post that is in Malay language. What could be the issue here? PLEASE help me!
Intermediate & Advanced SEO | | WayneRooney0 -
Duplicate content within sections of a page but not full page duplicate content
Hi, I am working on a website redesign and the client offers several services and within those services some elements of the services crossover with one another. For example, they offer a service called Modelling and when you click onto that page several elements that build up that service are featured, so in this case 'mentoring'. Now mentoring is common to other services therefore will feature on other service pages. The page will feature a mixture of unique content to that service and small sections of duplicate content and I'm not sure how to treat this. One thing we have come up with is take the user through to a unique page to host all the content however some features do not warrant a page being created for this. Another idea is to have the feature pop up with inline content. Any thoughts/experience on this would be much appreciated.
Intermediate & Advanced SEO | | J_Sinclair0 -
Please help with some content ideas
I was reading this post http://www.clambr.com/link-building-tools/ about how he had basically outreached to experts in the field and each one had shared this post with their followers. I am wondering how this could translate to our small business marketing and design blog I am really struggling for content ideas that will work in regards to popularity and link building.
Intermediate & Advanced SEO | | BobAnderson0 -
Copy content from news about your own company
I´ve a client who uses news articles about his company in newspaper and other magazines in his blog. It´s fair because he wants to show how how big and important he is when someone check his website but in otherwise he is just copying content from others sites. ( there is lot of original content also) Should I use noindex on these pages or use a rel=canonical? We already ask permission and show the font with the link in these cases.
Intermediate & Advanced SEO | | SeoMartin10 -
Duplicate content on index.htm page
How do I avoid duplicate content on the index.htm page . I need to redirect the spider from the /index.htm file to the main root of http://www.manandhisvan.com.au and hence avoid duplicate content. Does anyone know of a foolproof way of achieving this without me buggering up the complete site Cheers Freddy
Intermediate & Advanced SEO | | Fatfreddy0 -
Removing pages from index
Hello, I run an e-commerce website. I just realized that Google has "pagination" pages in the index which should not be there. In fact, I have no idea how they got there. For example, www.mydomain.com/category-name.asp?page=3434532
Intermediate & Advanced SEO | | AlexGop
There are hundreds of these pages in the index. There are no links to these pages on the website, so I am assuming someone is trying to ruin my rankings by linking to the pages that do not exist. The page content displays category information with no products. I realize that its a flaw in design, and I am working on fixing it (301 none existent pages). Meanwhile, I am not sure if I should request removal of these pages. If so, what is the best way to request bulk removal. Also, should I 301, 404 or 410 these pages? Any help would be appreciated. Thanks, Alex0