XML Sitemap Questions For Big Site
-
Hey Guys,
I have a few question about XML Sitemaps.
-
For a social site that is going to have presonal accounts created, what is the best way to get them indexed? When it comes to profiles I found out that twitter (https://twitter.com/i/directory/profiles) and facebook (https://www.facebook.com/find-friends?ref=pf) have directory pages, but Google plus has xml index pages (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml).
-
If we go the XML route, how would we automatically add new profiles to the sitemap? Or is the only option to keep updating your xml profiles using a third party software (sitemapwriter)?
-
If a user chooses to not have their profile indexed (by default it will be index-able), how do we go about deindexing that profile? Is their an automatic way of doing this?
-
Lastly, has anyone dappled with google sitemap generator (https://code.google.com/p/googlesitemapgenerator/) if so do you recommend it?
Thank you!
-
-
Thanks for the input guys!
I believe Twitter and Facebook don't run sitemaps for their profiles, what they have is a directory for all their profiles (twitter: https://twitter.com/i/directory/profiles Facebook: https://www.facebook.com/find-friends?ref=pf) and use that to get their profiles crawled, however I feel the best approach is through xml sitemaps and Google plus actually does this with their profiles (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml) and quite frankly I would rather follow Google then FB or Twitter... I'm just now wondering how the hell they upkeep that monster! Does it create a new sitemap everything one hits 50k? When do they update their sitemap? daily, weekly, or monthly and how?
One other question I have is if their is any penalties to getting a lot of pages crawled at once? Meaning one day we have 10 pages and the next we have 10,000 pages or 50,000 pages...
Thanks again guys!
-
I guess the way I was explaining it was for scalabilty on a large site. You have to think a site like fb or twitter with hundreds of millions of users still has the limitation of only having 50k records in a site map. So if they are running site maps, they have hundreds.
-
I'm not a web developer, so this might may be wrong, but I feel like it might be easier to just add every user to the xml sitemap and then add a noindex robots meta tag ons users pages that don't want to their profiles to be indexed.
-
If it were me and someone were asking me to design a system like that, I would design it in a few parts.
First I would create an application that handled the sitemap minus profiles, just for your tos, sign up pages, terms, and what ever pages like that.
Then I would design a system that handled the actual profiles. It would be pretty complex and resource intensive as the site grew. But the main idea flows like this
Start generation, grab the user record with id 1 in the database, check to see if indexable (move to next if not), see what pages are connected, write to xml file, loop back and start with record #2.
There are a few concessions you have to make, you need to keep up with the number of records in a file before you start another file. You can only have 50k records in one file.
The way I would handle the process in total for a large site would be this, sync the required tables via a weekly or daily cron to another instance (server). Call the php script (because that is what I use) that creates the first sitemap for the normal site wide pages. At the end of that site map, put a location for the user profile sitemap, then at the end of the scrip, execute the user profile site map generating script. At the end of each site map, put the location of the next site map file, because as you grow it might take 2-10000 site map files.
One thing that I would ensure to do is get a list of crawler ip addresses and in your .htaccess have an allow / deny rule. That way you can make the site maps only visible to the search engines.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Mobile First Index: What Could Happen To Sites w Large Desktop but Small Mobile Sites?
I have a question about how Mobile First could affect websites with separate (and smaller) mobile vs desktop sites. Referencing this SE Roundtable article (seorountable dot com /google-mobile-first-index-22953.html), "If you have less content on your mobile version than on your desktop version - Google will probably see the less content mobile version. Google said they are indexing the mobile version first." But Google/ Gary Illyes are also on the record stating the switch to mobile-first should be minimally disruptive. Does "Mobile First" mean that they'll consider desktop URLs "second", or will they actually just completely discount the desktop site in lieu of the mobile one? In other words: will content on your desktop site that does not appear in mobile count in desktop searches? I can't find clear answer anywhere (see also: /jlh-marketing dot com/mobile-first-unanswered-questions/). Obviously the writing is on the wall (and has been for years) that responsive is the way to go moving forward - but just looking for any other viewpoints/feedback here since it can be really expensive for some people to upgrade. I'm basically torn between "okay we gotta upgrade to responsive now" and "well, this may not be as critical as it seems". Sigh... Thanks in advance for any feedback and thoughts. LOL - I selected "there may not be a right answer to this question" when submitting this to the Moz community. 🙂
Intermediate & Advanced SEO | | mirabile0 -
SEO Menu Question
I have a question regarding to the SEO benefits of different types of menus. Recently, I have noticed an increasing number of websites with the sort of menu like at www.sportsdirect.com, where there is only one main dropdown and then everything is a sub-menu of the sub-menus if that makes sense. Is this approach more, less or equal beneficial to what you see at http://www.wiggle.co.uk/ where there are multiple initial dropdown menus? Appreciate the feedback.
Intermediate & Advanced SEO | | simonukss0 -
SEO question
Hi there! I'm the SEO manager for 5 Star Loans. I have 2 city pages running. We are running our business in 2 locations: Berkeley, CA & San Jose, CA. For those offices we've created 2 google listings with separate gmail accounts. Berkeley (http://5starloans.com/berkeley/) ranks well in Berkeley in Gmaps and it shows on first page in organic results. However the second city page San Jose (http://5starloans.com/san-jose/) doesn't show in the Gmaps local pack results and also doesn't rank well in organic results. Both of them have authentic backlinks and reviews. It has been a year already and it's high time we knew the problem 🙂 any comment would be helpful. thanks a lot
Intermediate & Advanced SEO | | moonalev0 -
I have 2 Questions
what if we do the interlinking on the exact keywords? Is this comes under spam technique? For example - http://blog.payscout.com/automotive-merchant-services/ I interlink the exact keyword in the above URL. Can we use same image 2-3 times on the same website with different anchor tags? For example - http://packforcity.com/what-to-wear-in-new-orleans-in-january/ http://packforcity.com/what-to-wear-in-san-francisco-in-october/ Same image used on the website with different alt tag.
Intermediate & Advanced SEO | | AlexanderWhite0 -
Site not progressing at all....
We relaunched our site almost a year ago after our old site dropped out of ranking due to what we think was overused anchor text.... We transferred over the content to the new site, but started fresh in terms of links etc. And did not redirect the old site. Since the launch we have focused on producing good content and social, but the site has made no progress at all. The only factor I can think off is that one site linked to us from all of their pages, which we asked them to remove which they did over 3 months ago, but still showing in Webmaster tools.... Any help would be appreciated. Thanks
Intermediate & Advanced SEO | | jj34340 -
Merging 11 community sites into 1 regional site
I am merging 11 real estate community sites into 1 regional site and don't really know what type of redirect should I use for the homepage?, for instance: www.homepage.com redirect to www.regionalsite.com/community-page Should I 301 this redirect? If yes, how could I 301 redirect a homepage to an internal page in my new site? Cheers 🙂
Intermediate & Advanced SEO | | mbulox0 -
Another deduplication question.
Where an existing website has duplicate content issues - specifically the www. and non-www. type; what is the most effective way to inform the searchers and spiders that there is only one page? I have a site where the ecommerce software (Shopfitter 4) allows a fair bit of meta data to be inserted into each product page but I am uncertain, after a couple of attempts to deduplicate some pages, which is the most effective way to ensure that the www related duplication is eliminated sitewide - there is such a solution. I have to own up to having looked at ,htaccess 301 redirects webmaster tools and become increasingly bamboozled by the conflicting advice as to which is the most effective way or combination to get rid of this problem. too olod to learn new tricks I reckon 😉 Your help and clarification would be appreciated as this may help head off more fruitless work.
Intermediate & Advanced SEO | | SkiBum0 -
On-Site Optimization Tips for Job site?
I am working on a job site that only ranks well for the homepage with very low ranking internal pages. My job pages do not rank what so ever and are database driven and often times turn to 404 pages after the job has been filled. The job pages have to no content either. Anybody have any technical on-site recommendations for a job site I am working on especially regarding my internal pages? (Cross Country Allied.com)
Intermediate & Advanced SEO | | Melia0