How do the Quoras of this world index their content?
-
I am helping a client index lots and lots of pages, more than one million pages. They can be seen as questions on Quora. In the Quora case, users are often looking for the answer on a specific question, nothing else.
On Quora there is a structure setup on the homepage to let the spiders in. But I think mostly it is done with a lot of sitemaps and internal linking in relevancy terms and nothing else... Correct? Or am I missing something?
I am going to index about a million question and answers, just like Quora. Now I have a hard time dealing with structuring these questions without just doing it for the search engines. Because nobody cares about structuring these questions. The user is interested in related questions and/or popular questions, so I want to structure them in that way too.
This way every question page will be in the sitemap, but not all questions will have links from other question pages linking to them. These questions are super longtail and the idea is that when somebody searches this exact question we can supply them with the answer (onpage will be perfectly optimised for people searching this question). Competition is super low because it is all unique user generated content.
I think best is just to put them in sitemaps and use an internal linking algorithm to make the popular and related questions rank better. I could even make sure every question has at least one other page linking to it, thoughts?
Moz, do you think when publishing one million pages with quality Q/A pages, this strategy is enough to index them and to rank for the question searches? Or do I need to design a structure around it so it will all be crawled and each question will also receive at least one link from a "category" page.
-
Wow, that is insane right?
https://www.quora.com/sitemap/questions?page_id=50
I wonder how long this carries on.
-
Quora don't seem to have a XML sitemap but a HTML one :
[https://www.quora.com/robots.txt](https://www.quora.com/robots.txt) refers to [https://www.quora.com/sitemap](https://www.quora.com/sitemap)
-
Yes there are many challenges and external linking is definitely one of them.
What do you think about sitemaps to get this longtail indexed? I think that a lot can be indexed by submitting the sitemaps.
-
There are many challenges to building a really large site. Most of them are related to building the site, but one that often kills the success of the site is the ability to get the pages into the index and keep them there. This requires a steady flow of spiders into the deepest pages of the site. If you don't have continuous and repetitive spider flow the pages will be indexed, but then forgotten, before the spiders return.
An effective way to get deep spidering is have powerful links permanently connected to many deep hub pages throughout the site. This produces a flow of spiders into the site and forces them to chew their way out, indexing pages as they go. These links must be powerful or the spiders will index a couple of pages and die. These links must be permanent because if they are removed the flow of spiders will stop and pages in the index will be forgotten.
The goal of the hub pages is to create spider webs through the site that allow spiders to index all of the pages on short link paths, rather than requiring the spiders to crawl through long tunnels of many consecutive links to get everything indexed.
Lots of people can build a big site, but only some of those people have the resources to get the powerful, permanent links that are required to get the pages indexed and keep them in the index. You can't rely on internal links alone for the powerful, permanent links because most spiders that enter any site come from external sources rather than spontaneously springing up deep in the bowels of your website.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Copied Content - Define Canonical
Hello, The Story I am working on a news organization. Our website is the https://www.neakriti.gr My question regards copied content with source references. Sometimes a small portion of our content is based on some third article that is posted on some site (that is about 1% of our content). We always put "source" reference if that is the case. This is inevitable as "news" is something that sometimes has sources on other news sites, especially if there is something you cannot verify or don't have immediate sources, and therefore you need to state that "according to this source, something has happened". Here is one article of ours that has a source from another site: https://www.neakriti.gr/article/ellada-nea/1503363/nekros-vrethike-o-agnooumenos-arhimandritis-stin-lakonia/ if you open the above article you will see we have a link to the equivalent article of the original source site http://lakonikos.gr/epikairothta/item/133664-nekros-entopistike-o-arximandritis-p-andreas-bolovinos-synexis-enimerosi Now here is my question. I have read in other MOZ forum articles that a "canonical" approach solves this issue... How can we be legit when it comes to duplicate content in the eyes of search engines? Should we use some kind of canonical link to the source site? Should the "canonical" be inside the link in some way? Should it be on our section? Our site has AMP equivalent pages (if you add the /amp keyword at the end of the article URL). Our AMP pages have canonical to our original article. So if we have a "canonical" approach how would the AMP be effected as well? Also by applying a possible canonical solution to the source URL, does that "canonical" effect our article as not being shown in search results, thus passing all indexing to the canonical site? (I know that canonical indicates what URL is to be indexed). Additionally, does such a canonical indication make us legit in such a case in the eyes of search engines? (i.e. it eliminates any possible article duplication for original content in the eyes of search engines?). Or simply put, having a simple link to the original article (as we have it now) is enough for the search engines to understand that we have reference to original article URL? How would we approach this problem in our site based on its current structure?
Intermediate & Advanced SEO | | ioannisanif0 -
What should I do if same content ranked twice or more on Google?
I have a Bangla SEO related blog where I have written article like "Domain Selection" "SEO Tools" "MOZ" etc. All the article has been written in Bengali language. I have used wp tag for every post. I have submit xml site map generated by Yoast SEO. However I kept "no index" for category. I know well duplicate content is a major problem for SEO. After publishing my content Google ranked them on 1st page. But my fear is that most of the content twice or more. The keywords are ranked by post, wp post tag and Archive. Now I have a fear of penalty. Please check the screenshot and please suggest me what to do. uRCHf yq7m2 rSLKFLG
Intermediate & Advanced SEO | | AccessTechBD0 -
"Null" appearing as top keyword in "Content Keywords" under Google index in Google Search Console
Hi, "Null" is appearing as top keyword in Google search console > Google Index > Content Keywords for our site http://goo.gl/cKaQ4K . We do not use "null" as keyword on site. We are not able to find why Google is treating "null" as a keyword for our site. Is anyone facing such issue. Thanks & Regards
Intermediate & Advanced SEO | | vivekrathore0 -
HTTP Pages Indexed as HTTPS
My site used to be entirely HTTPS. I switched months ago so that all links in the pages that the public has access to are now http only. But I see now that when I do a site:www.qjamba.com, the results include many pages with https in the beginning (including the home page!), which is not what I want. I can redirect to http but that doesn't remove https from the indexing, right? How do I solve this problem? sample of results: Qjamba: Free Local and Online Coupons, coupon codes ... **<cite class="_Rm">https://www.qjamba.com/</cite>**One and Done savings. Printable coupons and coupon codes for thousands of local and online merchants. No signups, just click and save. Chicnova online coupons and shopping - Qjamba **<cite class="_Rm">https://www.qjamba.com/online-savings/Chicnova</cite>**Online Coupons and Shopping Savings for Chicnova. Coupon codes for online discounts on Apparel & Accessories products. Singlehop online coupons and shopping - Qjamba <cite class="_Rm">https://www.qjamba.com/online-savings/singlehop</cite>Online Coupons and Shopping Savings for Singlehop. Coupon codes for online discounts on Business & Industrial, Service products. Automotix online coupons and shopping - Qjamba <cite class="_Rm">https://www.qjamba.com/online-savings/automotix</cite>Online Coupons and Shopping Savings for Automotix. Coupon codes for online discounts on Vehicles & Parts products. Online Hockey Savings: Free Local Fast | Qjamba **<cite class="_Rm">www.qjamba.com/online-shopping/hockey</cite>**Find big online savings at popular and specialty stores on Hockey, and more. Hitcase online coupons and shopping - Qjamba **<cite class="_Rm">www.qjamba.com/online-savings/hitcase</cite>**Online Coupons and Shopping Savings for Hitcase. Coupon codes for online discounts on Electronics, Cameras & Optics products. Avanquest online coupons and shopping - Qjamba <cite class="_Rm">https://www.qjamba.com/online-savings/avanquest</cite>Online Coupons and Shopping Savings for Avanquest. Coupon codes for online discounts on Software products.
Intermediate & Advanced SEO | | friendoffood0 -
Product Syndication and duplicate content
Hi, It's a duplicate content question. We sell products (vacation rental homes) on a number of websites as well as our own. Generally, these affiliate sites have a higher domain authority and much more traffic than our site. The product content (text, images, and often availability and rates) is pulled by our affiliates into their websites daily and is exactly the same as the content on our site, not including their page structure. We receive enquiries by email and any links from their domains to ours are nofollow. For example, all of the listing text on mysite.com/listing_id is identical to my-first-affiliate-site.com/listing_id and my-second-affiliate-site.com/listing_id. Does this count as duplicate content and, if so, can anyone suggest a strategy to make the best of the situation? Thanks
Intermediate & Advanced SEO | | McCaldin0 -
Why are some pages indexed but not cached by Google?
The question is simple but I don't understand the answer. I found a webpage that was linking to my personal site. The page was indexed in Google. However, there was no cache option and I received a 404 from Google when I tried using cache:www.thewebpage.com/link/. What exactly does this mean? Also, does it have any negative implication on the SEO value of the link that points to my personal website?
Intermediate & Advanced SEO | | mRELEVANCE0 -
Images Sitemap GWT - not indexed?
So we went ahead and created an image sitemap of 2387 images, one for each product - I was hoping it would give us better exposure in image results. No joy, over 7 days and they only showing as "sent" but not "indexed". Any ideas?
Intermediate & Advanced SEO | | bjs20100 -
Duplicate content
I have just read http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world and I would like to know which option is the best fit for my case. I have the website http://www.hotelelgreco.gr and every image in image library http://www.hotelelgreco.gr/image-library.aspx has a different url but is considered duplicate with others of the library. Please suggest me what should i do.
Intermediate & Advanced SEO | | socrateskirtsios0