Indexing a several millions pages new website
-
Hello everyone,
I am currently working for a huge classified website who will be released in France in September 2013.
The website will have up to 10 millions pages. I know the indexing of a website of such size should be done step by step and not in only one time to avoid a long sandbox risk and to have more control about it.
Do you guys have any recommandations or good practices for such a task ? Maybe some personal experience you might have had ?
The website will cover about 300 jobs :
- In all region (= 300 * 22 pages)
- In all departments (= 300 * 101 pages)
- In all cities (= 300 * 37 000 pages)
Do you think it would be wiser to index couple of jobs by couple of jobs (for instance 10 jobs every week) or to index with levels of pages (for exemple, 1st step with jobs in region, 2nd step with jobs in departements, etc.) ?
More generally speaking, how would you do in order to avoid penalties from Google and to index the whole site as fast as possible ?
One more specification : we'll rely on a (big ?) press followup and on a linking job that still has to be determined yet.
Thanks for your help !
Best Regards,
Raphael
-
Hello everyone,
Thanks for sharing your experience and your answers, it's greatly appreciated.
The website is build in order to avoid cookie cutter pages : each page will have unique content from classifieds (unique because classifieds won't be indexed in the first place, to avoid having too much pages).
The linking is as well though in order for each page to have permanents internal links in a logical way.
I understand from your answers that it is better to take time and to index the site step by step : mostly according to the number and the quality of classifieds (and thus the content) for each jobs/locality. It's not worth to index pages without any classifieds (and thus unique content) as they will be cut off by Google in a near future.
-
I really don't think Google likes it when you release a website that big. It would much rather you build it slowly. I would urge you to have main pages and noindex the sub categories.
-
We worked in partnership with a similar large scale site last year and found the exact same. Google simply cut off 60% of our pages out of the index as they were cookie cutter.
You have to ensure that pages have relevant, unique and worthy content. Otherwise if all your doing is replacing the odd word here and there for the locality and job name its not going to work.
Focus on having an on going SEO campaign for each target audience be that for e.g. by job type / locality / etc.
-
If you plan to get a website that big indexed you will need to have a few things in order...
First, you will need thousands of deep links that connect to hub pages deep within the site. These will force spiders down there and make them chew their way out through the unindexed pages. These must be permanent links. If you remove them then spiders will stop visiting and google will forget your pages. For a 10 million page site you will need thousands of links hitting thousands of hub pages.
Second, for a site this big.... are you going to have substantive amounts of unique content? If your pages are made from a cookie cutter and look like this....
"yada yada yada yada yada yada yada yada SEO job in Paris yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada send application to Joseph Blowe, 11 Anystreet, Paris, France yada yada yada yada yada yada yada yadayada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada"
.... then Google will index these pages, then a few weeks to a few months later your entire site might receive a Panda penalty and drop from google.
Finally... all of those links needed to get the site in the index... they need to be Penguin proof.
It is not easy to get a big site in the index. Google is tired of big cookie cutter sites with no information or yada yada content. They are quickly toasted these days.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
The client wants to close the current e-commerce website and open a new one.
The client wants to close the current e-commerce website and open a new one on a completely different engine without losing income. I have no idea how to approach this topic. Old site has over 100 000 pages, and in terms of SEO is quite great - we hit almost every important keyword in our niche but thanks to heavy modifications of source code site become unmaintainable. Content on new shop will be almost 1:1 with old page but: domain will be different (I can't explain to the client that this will damage our core brand). Beacuse of that I'm forcing idea of going with brandname.com/shop domain instead of newshop.com beacuse our main brand is well known to our customers, not as much as old shop but still better than new shop brand. engine and design will be different we will lost almost 30 000 backlinks. budget: only IT. No content and seo tools budget. BONUS: client hired before me some "SEO magician" - now SEO audit score with tools like ahrefs etc. is around 6 - 12% for 100 000 pages on new shop. Great. Does anyone have idea how to approach such task with minimal losses?
Intermediate & Advanced SEO | | meliegree0 -
Google slow to index pages
Hi We've recently had a product launch for one of our clients. Historically speaking Google has been quick to respond, i.e when the page for the product goes live it's indexed and performing for branded terms within 10 minutes (without 'Fetch and Render'). This time however, we found that it took Google over an hour to index the pages. we found initially that press coverage ranked until we were indexed. Nothing major had changed in terms of the page structure, content, internal linking etc; these were brand new pages, with new product content. Has anyone ever experienced Google having an 'off' day or being uncharacteristically slow with indexing? We do have a few ideas what could have caused this, but we were interested to see if anyone else had experienced this sort of change in Google's behaviour, either recently or previously? Thanks.
Intermediate & Advanced SEO | | punchseo0 -
Set Placeholder Page ASAP or Wait For Full Website?
It can take some time for a new business website to get picked up by all the search engines and indexed. Let's assume it's going to take a month to build your new full-fledged business website. Would it be advantageous in the mean time to immediately launch the domain with an introductory website using a template site so you might have just two pages, a home page with logo, title, brief description of pages, a couple images, etc and a contact page. Would this help give the site a "jump start" on being indexed? Or could that do more harm than good by putting up something "quick & dirty" versus the complete website with much more content, that has been SEO optimized?
Intermediate & Advanced SEO | | Jazee0 -
Help! Website Page Structure.
Hi there, I have a cupcake website; www.cupcakesdelivered.com.au To date, we have sold only regular cupcakes. Moving forward, we are about to start selling lots of different sorts of cupcakes and want to categorise them - i.e.; sport cupcakes, corporate cupcakes, movie-themed cupcakes etc. I am looking for a recommendation on how best to structure this in terms of pages / domains / subdomains etc, so as to best support SEO. Your help would be greatly appreciated!! Thank you, Laura.
Intermediate & Advanced SEO | | cupcakesdelivered0 -
Any idea why this page isn't indexing?
Hi Mozzers, Question for all of you. Any idea why this page isn't indexing in Google? It's indexing in Bing, but we don't see it in Google's results. It doesn't seem like we have any noindex tags or anyway issues with the robots files either. Any ideas? http://ohva.k12.com/
Intermediate & Advanced SEO | | petertong230 -
Adding Orphaned Pages to the Google Index
Hey folks, How do you think Google will treat adding 300K orphaned pages to a 4.5 million page site. The URLs would resolve but there would be no on site navigation to those pages, Google would only know about them through sitemap.xmls. These pages are super low competition. The plot thickens, what we are really after is to get 150k real pages back on the site, these pages do have crawlable paths on the site but in order to do that (for technical reasons) we need to push these other 300k orphaned pages live (it's an all or nothing deal) a) Do you think Google will have a problem with this or just decide to not index some or most these pages since they are orphaned. b) If these pages will just fall out of the index or not get included, and have no chance of ever accumulating PR anyway since they are not linked to, would it make sense to just noindex them? c) Should we not submit sitemap.xml files at all, and take our 150k and just ignore these 300k and hope Google ignores them as well since they are orhpaned? d) If Google is OK with this maybe we should submit the sitemap.xmls and keep an eye on the pages, maybe they will rank and bring us a bit of traffic, but we don't want to do that if it could be an issue with Google. Thanks for your opinions and if you have any hard evidence either way especially thanks for that info. 😉
Intermediate & Advanced SEO | | irvingw0 -
Does google detect all updated page with new links
as paid links? Example: A PR 4 page updates the page a year later with new links. Does Google discredit these links as being fishy?
Intermediate & Advanced SEO | | imageworks-2612900 -
Google swapped our website's long standing ranking home page for a less authoritative product page?
Our website has ranked for two variations of a keyword, one singular & the other plural in Google at #1 & #2 (for over a year). Keep in mind both links in serps were pointed to our home page. This year we targeted both variations of the keyword in PPC to a products landing page(still relevant to the keywords) within our website. After about 6 weeks, Google swapped out the long standing ranked home page links (p.a. 55) rank #1,2 with the ppc directed product page links (p.a. 01) and dropped us to #2 & #8 respectively in search results for the singular and plural version of the keyword. Would you consider this swapping of pages temporary, if the volume of traffic slowed on our product page?
Intermediate & Advanced SEO | | JingShack0