XML Sitemap Questions For Big Site
-
Hey Guys,
I have a few question about XML Sitemaps.
-
For a social site that is going to have presonal accounts created, what is the best way to get them indexed? When it comes to profiles I found out that twitter (https://twitter.com/i/directory/profiles) and facebook (https://www.facebook.com/find-friends?ref=pf) have directory pages, but Google plus has xml index pages (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml).
-
If we go the XML route, how would we automatically add new profiles to the sitemap? Or is the only option to keep updating your xml profiles using a third party software (sitemapwriter)?
-
If a user chooses to not have their profile indexed (by default it will be index-able), how do we go about deindexing that profile? Is their an automatic way of doing this?
-
Lastly, has anyone dappled with google sitemap generator (https://code.google.com/p/googlesitemapgenerator/) if so do you recommend it?
Thank you!
-
-
Thanks for the input guys!
I believe Twitter and Facebook don't run sitemaps for their profiles, what they have is a directory for all their profiles (twitter: https://twitter.com/i/directory/profiles Facebook: https://www.facebook.com/find-friends?ref=pf) and use that to get their profiles crawled, however I feel the best approach is through xml sitemaps and Google plus actually does this with their profiles (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml) and quite frankly I would rather follow Google then FB or Twitter... I'm just now wondering how the hell they upkeep that monster! Does it create a new sitemap everything one hits 50k? When do they update their sitemap? daily, weekly, or monthly and how?
One other question I have is if their is any penalties to getting a lot of pages crawled at once? Meaning one day we have 10 pages and the next we have 10,000 pages or 50,000 pages...
Thanks again guys!
-
I guess the way I was explaining it was for scalabilty on a large site. You have to think a site like fb or twitter with hundreds of millions of users still has the limitation of only having 50k records in a site map. So if they are running site maps, they have hundreds.
-
I'm not a web developer, so this might may be wrong, but I feel like it might be easier to just add every user to the xml sitemap and then add a noindex robots meta tag ons users pages that don't want to their profiles to be indexed.
-
If it were me and someone were asking me to design a system like that, I would design it in a few parts.
First I would create an application that handled the sitemap minus profiles, just for your tos, sign up pages, terms, and what ever pages like that.
Then I would design a system that handled the actual profiles. It would be pretty complex and resource intensive as the site grew. But the main idea flows like this
Start generation, grab the user record with id 1 in the database, check to see if indexable (move to next if not), see what pages are connected, write to xml file, loop back and start with record #2.
There are a few concessions you have to make, you need to keep up with the number of records in a file before you start another file. You can only have 50k records in one file.
The way I would handle the process in total for a large site would be this, sync the required tables via a weekly or daily cron to another instance (server). Call the php script (because that is what I use) that creates the first sitemap for the normal site wide pages. At the end of that site map, put a location for the user profile sitemap, then at the end of the scrip, execute the user profile site map generating script. At the end of each site map, put the location of the next site map file, because as you grow it might take 2-10000 site map files.
One thing that I would ensure to do is get a list of crawler ip addresses and in your .htaccess have an allow / deny rule. That way you can make the site maps only visible to the search engines.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is there an issue with my site?
Been mostly hanging around top of page two for the last couple of years for “Liverpool Wedding photographer” although got myself on page 1 for “Liverpool photographer” I have split the title of the page to target these two keywords. I took the Liverpool photographer off the title to see if it was being detrimental to the “Liverpool wedding photographer” I didn’t see no increase in ranking so put it back as I get a bit of commercial work from it. Since last year I have got onto page 1 at least three times around position 5-6. Within a week or two I start sliding down again and end up back at top of page two. I could understand this slow push out if my competitors were busy SEO wise but from what I have seen they are not. There is a guy using the keywords in URL and calls himself “Liverpool wedding photographer” last time I checked he literally had no links but is in the first 5 positions. I have I think a better link profile than every one else. Although I am on and off with Facebook and Instagram, (more off) so that probably isn’t helping. Although I have a colleague in the video side of things and he doesn’t use social media at all and it hasn’t harmed him. A few years ago I was burned quite badly by a total charlatan. He sunk my home page to page 4. He talked the talk about creating landing pages but his methods were shoddy to say the least. I can’t believe I was taken in by him, although I was only with him for 2 months. He was still using spammy link techniques to generate lots of toxic links for me! I disavowed all of his links and put the keywords back on the home page and was back to my usual top of page 2 position within a week. Since then I have disavowed all directory links and anything not wedding related. I have an article which ranks 1st or second for “Nikon CLS”. I have also another article of 2000 words or so on another reasonable placed photography website. A few links from other vendors or people I have taken photographs for. I have about 10 featured weddings with a link on 4 good weddings blogs. I don’t think a massive amount of blog comments although I have stopped doing this. If I look at most of the competitors these are their main links, with directories as well! Last winter I put a quite substantial article about documentary wedding photography on my home page. I flew to number 2, although I photographed The World Transformed (the alternative labour conference in Liverpool). I got a lot of clicks to a gallery page (few thousand off social media} so I don’t know if that coincided with it. Same thing – watching the website go down a few positions every day until within just over a week or two I was about 4<sup>th</sup> on page 2! Its like my website is on a spring which can push into page 1 but rebounds back to top of page 2. I am staring to worry that my site has been marked as a bad character in some way because I get what seems to be rough treatment from google compared to my peers. I have written I think 4 or 5 (1500 word) articles the last couple of months talking about lenses and wedding photography related topics and Google pushed me back to page 1, peaking At position 5. I was there for a few weeks and then the slide happened again. Bit demoralised at the moment, what to do? Any help or pointers would be most appreciated. Best wishes. David.
Intermediate & Advanced SEO | | WallerD0 -
Moving multiple Sites to One Site and SEO Impact/Ideas
Hi there, We are in the process of moving 2 sites with higher page authority to another site we own (that is our company brand), so essentially 3 sites into one. We're at risk of losing a lot of SEO from the original 2 sites that have all the product information. We are doing this since we merged companies a couple years back and need one web precense. Anyhow, the site launch date is in 3 months and the recommendation is to start moving content over prior to that for top pages, which is a big undertaking when we are launching all the pages again with new content, redeisgn and moving sites in 3 months. If it's the right move, we should do it, but I just wanted to get opinions on how others have handled something similiar when moving to a site with lower site authority and trying not to lose rankings.
Intermediate & Advanced SEO | | lauramrobinson320 -
Add versioning to an xml sitemap?
Is there a way to add versioning to an xml sitemap? Something like <version>x.x</version> outside of the <urlset>?</urlset> I've looked at a bunch of sitemaps for various sites and don't see anyone adding versioning information, but it seems like it would be a common issue - I can't believe someone hasn't come up with some way to do it.
Intermediate & Advanced SEO | | ATT_SEO0 -
Open Site Explorer - Spam analysis: need help with inbound links... from my site!
hallo, reading my spam analysis report from open explorer, I found somenthing I don't understand (please see attached image): The long list of links inside the red rectangle are inbound links with a spam score of 5 coming from my same site. How is that possible? Should I remove those links? Also , I see that many of those links are links present in the top navigation bar (about page, home page, service description etc.) or in the sidebar section of the website (categories, recent posts, recent comments). Should I treat them differently? Thank you for your time.
Intermediate & Advanced SEO | | micvitale0 -
Linking from a corporate site to a brand site.
Is there an SEO impact to a large corporation linking from a corporate and/or a divisional site to a specific brand site with it's own top level domain? We would like to keep the traffic coming, but not if it will be seen as a black hat tactic. My guess is that Google will be smart enough to see that the corporation owns the brand and at least not penalize us, but I am wondering if anyone else has this experience? Google Analytics is calling it self-referral.
Intermediate & Advanced SEO | | mrbobland0 -
Why is this site not ranking?
http://www.petstoreunlimited.com They get good grades from the on-page tool. The links are not amazing, but are not super spammy. Yet it ranks for nothing they target Any reason why?
Intermediate & Advanced SEO | | Atomicx0 -
Sitemap Dissappearance??
Greetings Mozzers, Doing my standard run through Webmaster tools and I discover up to 30% of my sitemaps no longer exist. Has anyone else experienced the recent loss of sitemaps/can suggest reasons why this may have happened? Re-submitting all sitemaps now but just concerned this might become an on-going issue...
Intermediate & Advanced SEO | | RobertChapman0 -
Question regarding error url while checking in open-site explorer tool
Hello friends, My website home url & inner page url shows error while checking the open-site site explorer tool from SEOMoz, for a website** eg.www.abc.com website as**
Intermediate & Advanced SEO | | zco_seo
**"Oh Hey! It looks like that URL redirects to www.abc.com/error.aspx?aspxerrorpath=/default.aspx. Would you like to see data for that URL instead?"**May I know the reason, why this url showing this result while checking back link report from the tool?**May I know on what basis this tool is evaluating the website url as well?****May I know, Will this affect the Google SERPs for this website?**Thanks0