Indexing a several millions pages new website
-
Hello everyone,
I am currently working for a huge classified website who will be released in France in September 2013.
The website will have up to 10 millions pages. I know the indexing of a website of such size should be done step by step and not in only one time to avoid a long sandbox risk and to have more control about it.
Do you guys have any recommandations or good practices for such a task ? Maybe some personal experience you might have had ?
The website will cover about 300 jobs :
- In all region (= 300 * 22 pages)
- In all departments (= 300 * 101 pages)
- In all cities (= 300 * 37 000 pages)
Do you think it would be wiser to index couple of jobs by couple of jobs (for instance 10 jobs every week) or to index with levels of pages (for exemple, 1st step with jobs in region, 2nd step with jobs in departements, etc.) ?
More generally speaking, how would you do in order to avoid penalties from Google and to index the whole site as fast as possible ?
One more specification : we'll rely on a (big ?) press followup and on a linking job that still has to be determined yet.
Thanks for your help !
Best Regards,
Raphael
-
Hello everyone,
Thanks for sharing your experience and your answers, it's greatly appreciated.
The website is build in order to avoid cookie cutter pages : each page will have unique content from classifieds (unique because classifieds won't be indexed in the first place, to avoid having too much pages).
The linking is as well though in order for each page to have permanents internal links in a logical way.
I understand from your answers that it is better to take time and to index the site step by step : mostly according to the number and the quality of classifieds (and thus the content) for each jobs/locality. It's not worth to index pages without any classifieds (and thus unique content) as they will be cut off by Google in a near future.
-
I really don't think Google likes it when you release a website that big. It would much rather you build it slowly. I would urge you to have main pages and noindex the sub categories.
-
We worked in partnership with a similar large scale site last year and found the exact same. Google simply cut off 60% of our pages out of the index as they were cookie cutter.
You have to ensure that pages have relevant, unique and worthy content. Otherwise if all your doing is replacing the odd word here and there for the locality and job name its not going to work.
Focus on having an on going SEO campaign for each target audience be that for e.g. by job type / locality / etc.
-
If you plan to get a website that big indexed you will need to have a few things in order...
First, you will need thousands of deep links that connect to hub pages deep within the site. These will force spiders down there and make them chew their way out through the unindexed pages. These must be permanent links. If you remove them then spiders will stop visiting and google will forget your pages. For a 10 million page site you will need thousands of links hitting thousands of hub pages.
Second, for a site this big.... are you going to have substantive amounts of unique content? If your pages are made from a cookie cutter and look like this....
"yada yada yada yada yada yada yada yada SEO job in Paris yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada send application to Joseph Blowe, 11 Anystreet, Paris, France yada yada yada yada yada yada yada yadayada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada"
.... then Google will index these pages, then a few weeks to a few months later your entire site might receive a Panda penalty and drop from google.
Finally... all of those links needed to get the site in the index... they need to be Penguin proof.
It is not easy to get a big site in the index. Google is tired of big cookie cutter sites with no information or yada yada content. They are quickly toasted these days.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why do I have so many extra indexed pages?
Stats- Webmaster Tools Indexed Pages- 96,995 Site: Search- 97,800 Pages Sitemap Submitted- 18,832 Sitemap Indexed- 9,746 I went through the search results through page 28 and every item it showed was correct. How do I figure out where these extra 80,000 items are coming from? I tried crawling the site with screaming frog awhile back but it locked because of so many urls. The site is a Magento site so there are a million urls, but I checked and all of the canonicals are setup properly. Where should I start looking?
Intermediate & Advanced SEO | | Tylerj0 -
Old pages STILL indexed...
Our new website has been live for around 3 months and the URL structure has completely changed. We weren't able to dynamically create 301 redirects for over 5,000 of our products because of how different the URL's were so we've been redirecting them as and when. 3 months on and we're still getting hundreds of 404 errors daily in our Webmaster Tools account. I've checked the server logs and it looks like Bing Bot still seems to want to crawl our old /product/ URL's. Also, if I perform a "site:example.co.uk/product" on Google or Bing - lots of results are still returned, indicating the both still haven't dropped them from their index. Should I ignore the 404 errors and continue to wait for them to drop off or should I just block /product/ in my robots.txt? After 3 months I'd have thought they'd have naturally dropped off by now! I'm half-debating this: User-agent: *
Intermediate & Advanced SEO | | LiamMcArthur
Disallow: /some-directory-for-all/* User-agent: Bingbot
User-agent: MSNBot
Disallow: /product/ Sitemap: http://www.example.co.uk/sitemap.xml0 -
Client rebranded with a new website but can't migrate now defunct franchise website to new website.
Hi everyone, My client is a chain of franchised restaurants with a local domain website named after the franchise. The franchise exited the market while the client stayed and built its own brand with a separate website. The franchise website (which is extremely popular) will be shut down soon but the client will not be able to redirect the franchise website to the new website for legal reasons. What can I do to ensure that we start ranking immediately for the franchise keyphrase as soon as the franchise website is shutdown. We currently have the new website and access to the old website (which we can't redirect) Thanks, T
Intermediate & Advanced SEO | | Tarek_Lel0 -
SEO transfer to new website
My website currently has some strong SEO and i will be re-developing the website on a wordpress platform...which will change many of the existing URL. Will this affect the current pages that are well indexed in Google? Does using Wordpress or changing the URL extension (.html to .php) make a difference? If i want to make a clean transition without effecting our existing SEO...what are some essential steps i need to take? Example. Current page is www.mydomain.com/name.html .... and the new URL would be www.mydomain.com/product/name.php Thanks
Intermediate & Advanced SEO | | Souk0 -
An affiliate website uses datafeeds and around 65.000 products are deleted in the new feeds. What are the best practises to do with the product pages? 404 ALL pages, 301 Redirect to the upper catagory?
Note: All product pages are on INDEX FOLLOW. Right now this is happening with the deleted productpages: 1. When a product is removed from the new datafeed the pages stay online and are showing simliar products for 3 months. The productpages are removed from the categorie pages but not from the sitemap! 2. Pages receiving more than 3 hits after the first 3 months keep on existing and also in the sitemaps. These pages are not shown in the categories. 3. Pages from deleted datafeeds that receive 2 hits or less, are getting a 301 redirect to the upper categorie for again 3 months 4. Afther the last 3 months all 301 redirects are getting a customized 404 page with similar products. Any suggestions of Comments about this structure? 🙂 Issues to think about:
Intermediate & Advanced SEO | | Zanox
- The amount of 404 pages Google is warning about in GWT
- Right now all productpages are indexed
- Use as much value as possible in the right way from all pages
- Usability for the visitor Extra info about the near future: Beceause of the duplicate content issue with datafeeds we are going to put all product pages on NOINDEX, FOLLOW and focus only on category and subcategory pages.0 -
Sudden Change In Indexed Pages
Every week I check the number of pages indexed by google using the "site:" function. I have set up a permanent redirect from all the non-www pages to www pages. When I used to run the function for the: non-www pages (i.e site:mysite.com), would have 12K results www pages (i.e site:www.mysite.com) would have about 36K The past few days, this has reversed! I get 12K for www pages, and 36K for non-www pages. Things I have changed: I have added canonical URL links in the header, all have www in the URL. My questions: Is this cause for concern? Can anyone explain this to me?
Intermediate & Advanced SEO | | inhouseseo0 -
Landing page indexed and ranking in less then 24 hours
Hi, I got a landing page which went up last night about 11pm. Its been indexed and ranked since then. Its a EMD and has about 600 words of unqiue content. It currently sits on page 9 for what I would say is a non competitive term (the top result is not an EMD and has 10 backlinks from the same site, which has no PR). Now my question is this: Would you say that page 9 is the given position Google thinks this website should sit at? Or because its so new could I very much expect some more movement? Basically up the rankings? Cheers
Intermediate & Advanced SEO | | activitysuper0