Indexing a several millions pages new website
-
Hello everyone,
I am currently working for a huge classified website who will be released in France in September 2013.
The website will have up to 10 millions pages. I know the indexing of a website of such size should be done step by step and not in only one time to avoid a long sandbox risk and to have more control about it.
Do you guys have any recommandations or good practices for such a task ? Maybe some personal experience you might have had ?
The website will cover about 300 jobs :
- In all region (= 300 * 22 pages)
- In all departments (= 300 * 101 pages)
- In all cities (= 300 * 37 000 pages)
Do you think it would be wiser to index couple of jobs by couple of jobs (for instance 10 jobs every week) or to index with levels of pages (for exemple, 1st step with jobs in region, 2nd step with jobs in departements, etc.) ?
More generally speaking, how would you do in order to avoid penalties from Google and to index the whole site as fast as possible ?
One more specification : we'll rely on a (big ?) press followup and on a linking job that still has to be determined yet.
Thanks for your help !
Best Regards,
Raphael
-
Hello everyone,
Thanks for sharing your experience and your answers, it's greatly appreciated.
The website is build in order to avoid cookie cutter pages : each page will have unique content from classifieds (unique because classifieds won't be indexed in the first place, to avoid having too much pages).
The linking is as well though in order for each page to have permanents internal links in a logical way.
I understand from your answers that it is better to take time and to index the site step by step : mostly according to the number and the quality of classifieds (and thus the content) for each jobs/locality. It's not worth to index pages without any classifieds (and thus unique content) as they will be cut off by Google in a near future.
-
I really don't think Google likes it when you release a website that big. It would much rather you build it slowly. I would urge you to have main pages and noindex the sub categories.
-
We worked in partnership with a similar large scale site last year and found the exact same. Google simply cut off 60% of our pages out of the index as they were cookie cutter.
You have to ensure that pages have relevant, unique and worthy content. Otherwise if all your doing is replacing the odd word here and there for the locality and job name its not going to work.
Focus on having an on going SEO campaign for each target audience be that for e.g. by job type / locality / etc.
-
If you plan to get a website that big indexed you will need to have a few things in order...
First, you will need thousands of deep links that connect to hub pages deep within the site. These will force spiders down there and make them chew their way out through the unindexed pages. These must be permanent links. If you remove them then spiders will stop visiting and google will forget your pages. For a 10 million page site you will need thousands of links hitting thousands of hub pages.
Second, for a site this big.... are you going to have substantive amounts of unique content? If your pages are made from a cookie cutter and look like this....
"yada yada yada yada yada yada yada yada SEO job in Paris yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada send application to Joseph Blowe, 11 Anystreet, Paris, France yada yada yada yada yada yada yada yadayada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada"
.... then Google will index these pages, then a few weeks to a few months later your entire site might receive a Panda penalty and drop from google.
Finally... all of those links needed to get the site in the index... they need to be Penguin proof.
It is not easy to get a big site in the index. Google is tired of big cookie cutter sites with no information or yada yada content. They are quickly toasted these days.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
New page disappeared from the ranks, under dubious circumstances
I've had an odd situation happen today. Published a blog post and it ranked No 6 within 2 or 3 hours, just come back now (About 12 hours later) and it has completely vanished! I have checked to page 9, and used a couple of keyword tools and it appears nowhere! It didn't have any back links, but it was unique and high quality. I have checked on the page does still exist and it is still readable. Any thoughts would be gratefully received.
Intermediate & Advanced SEO | | seoman100 -
Need help in de-indexing URL parameters in my website.
Hi, Need some help.
Intermediate & Advanced SEO | | ImranZafar
So this is my website _https://www.memeraki.com/ _
If you hover over any of the products, there's a quick view option..that opens up a popup window of that product
That popup is triggered by this URL. _https://www.memeraki.com/products/never-alone?view=quick _
In the URL you can see the parameters "view=quick" which is infact responsible for the pop-up. The problem is that the google and even your Moz crawler is picking up this URL as a separate webpage, hence, resulting in crawl issues, like missing tags.
I've already used the webmaster tools to block the "view" parameter URLs in my website from indexing but it's not fixing the issue
Can someone please provide some insights as to how I can fix this?0 -
How to speed indexing of web pages after website overhaul.
We have recently overhauled our website and that has meant new urls as we moved from asp to php. we also moved from http to https. The website (https://) has 694 urls submitted through site map with 679 indexed in sitemap of google search console. As we look through the google search console analytics we notice that google index section / index status it says: https://www.xyz.com version - index status 2
Intermediate & Advanced SEO | | Direct_Ram
www.xyz.com version - index status 37
xyz.com version - index status 8 how can we get more pages to be indexed or found by google sooner rather than later as we have lost major traffic. thanks for your help in advance0 -
NEW WEBSITE WHAT IS THE BEST WAY TO RECOVERY THE AUTHORITY OF OLD DOMAIN NAME?
HOW TO DO RECOVERY AUTHORITY OF OLD DOMAIN NAME? I got some advise on this in another post here on MOZ based on this i need a few answers TO SUMMERIZE**:****.** My client got some REALLY bad advice when they got their new website. So they ended up changing the domain name and just redirecting everything from the old domain and old website to the front page of the new domain and new website. As the new domain not optimized for SEO they of cause now are not ranking on anything in Google anymore. QUESTION 1 According to my client, they use to rank well on keywords for the old domain and get a lot of organic traffic. They don’t have access to their old google analytics account, and don’t have any reports on their rankings. Can anyone suggestions how I can find out what keywords they were ranking on? QUESTION 2 I will change the domain name back to the old domnain name (the client actually prefer the old domain name) But how to get back most possible page authority: For information titles, descriptions, content has all been rewritten. A - Redirect I will try to match the old urls with the new ones. B - Recreate site structure Make the URL structure of the new website look like the old URL structure Etc. the old structure use to be like olddomain.com/our-destinations/cambadia.html (old) newdomain.com/destinations/Cambodia (new) Or olddomain.com/private-tours.html (old) newdomain.com/tailor-made (new) does the html in the old urls need any attention when recreating the permalinks in the new websites. Look forward to hear your thoughts on this, thanks!
Intermediate & Advanced SEO | | nm19770 -
Merging 4 websites into one for a new site release (301 question)
Hi guys and girls, I have a client that has 4 very outdated websites with about 50 pages on each. They are made up like: 1 brand group and 3 for each individual key service they offer, so let's call them: brand.com (A) brand-service-1.com (B) brand-service-2.com (C) brand-service-3.com (D) We've rebuilt the main site and aggregated all the content from the others (99% re-written). Am I correct in thinking the process for the new lauch would be: 1. Launch the new site on brand.com (A) and 301 all the old brand.com (A) pages to the related pages on the new site. 2. Redirect the other websites (B,C,D) on a domain level to the new site on the brand.com (A) domain. 3. Clean up the old URL's, sitemaps, errors in Google WMT Is this right? Anything I missed/better practices? I was also wondering if I should redirect B,C,D in stages, or use page level redirects.
Intermediate & Advanced SEO | | shloy23-2945840 -
Need Reviews on my new website
Hi, I recently developed this website: http://goo.gl/fl5a5 And started link building to that website and getting some very good links so far. So far ok, but i would request some experienced guys here to post some reviews and help me with your suggestions so that i can rank better. Its been a month since i started link building to this site. . PS: I have cloned my competitors site with unique content. Will this becomes an issue? You can check my competitors site by Google'in my site entire title. Please let me know your thoughts on this.
Intermediate & Advanced SEO | | Vegit0 -
Duplicating an existing website - new name and reskin
Would re-skinning, duplicating an exising ecommerce website with a new domain name cause any ranking issues? The plan would be that all product data, pricing info etc would be identical, the site would have a minor redesign to change colours, logos etc and all duplicate content would be rel=canonicaled to the original site. In case you are wondering the reason for this is a customer with an existing site wants to try out a new brand without incorporating a massive development costs. The majority of traffic would be driving through google shopping, a bit of PPC, social etc. Is this site duplication likely to harm the original site or will setting up rel=canonical to point to the original site going to be sufficient enough to prevent this happening? Is there anything else is should consider? Many thanks for your help
Intermediate & Advanced SEO | | JustinTaylor880 -
404 with a Javascript Redirect to the index page...
I have a client that is wanting me to issue a 404 on her links that are no longer valid to a custom 404, pause for 10 seconds, then rediirect to the root page (or whatever other redirect logic she wants)...to me it seems trying to game googlebot this way is a "bad idea" Can anyone confirm/deny or offer up a better suggestion?
Intermediate & Advanced SEO | | JusinDuff0