Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Is there a way for me to automatically download a website's sitemap.xml every month?
-
From now on we want to store all our sitemap.xml over the next years. Its a nice archive to have that allows us to analyse how many pages we have on our website and which ones were removed/redirected.
Any suggestions?
Thanks
-
If you use a MySQL database to store your website data, I think that to do this kind of automatic "archival" work by creating an automatic PHP script would take between 2 to 5 hours work. I don't see why it should take more than that.
If someone tells you that it is going to take more than that, I would be suspicious. Either the programmer is not good enough, or wants to cheat on you. That unfortunately happens more than you think!!
Be sure to ask for a step-by-step description of how they plan to complete the job. If you have doubts, please feel free to ask me, I am a pretty expert PHP programmer. I don't work for others, but just for myself (I built and keep tweaking my own websites virtualsheetmusic.com, musicianspage.com and others with very little help from external programmers).
Good luck!
-
Hi Fabrizo,
How long would it take for a PHP programmer to write this code approximately? Since we would have to outsource this I would like an indication to oversee costs involved.
Thanks!
-
The way I would do it would be to make a simple PHP (or Perl) program that every day, week or month (as you may need it), archives your sitemap.xml on a specific directory on your server, and possibly zip it. As a PHP programmer myself, I can tell you that that's really simply to do. Just ask to a PHP programmer, I am sure it will make it in a couple hours!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What's the best way to test Angular JS heavy page for SEO?
Hi Moz community, Our tech team has recently decided to try switching our product pages to be JavaScript dependent, this includes links, product descriptions and things like breadcrumbs in JS. Given my concerns, they will create a proof of concept with a few product pages in a QA environment so I can test the SEO implications of these changes. They are planning to use Angular 5 client side rendering without any prerendering. I suggested universal but they said the lift was too great, so we're testing to see if this works. I've read a lot of the articles in this guide to all things SEO and JS and am fairly confident in understanding when a site uses JS and how to troubleshoot to make sure everything is getting crawled and indexed. https://sitebulb.com/resources/guides/javascript-seo-resources/ However, I am not sure I'll be able to test the QA pages since they aren't indexable and lives behind a login. I will be able to crawl the page using Screaming Frog but that's generally regarded as what a crawler should be able to crawl and not really what Googlebot will actually be able to crawl and index. Any thoughts on this, is this concern valid? Thanks!
Technical SEO | | znotes0 -
Desktop & Mobile XML Sitemap Submitted But Only Desktop Sitemap Indexed On Google Search Console
Hi! The Problem We have submitted to GSC a sitemap index. Within that index there are 4 XML Sitemaps. Including one for the desktop site and one for the mobile site. The desktop sitemap has 3300 URLs, of which Google has indexed (according to GSC) 3,000 (approx). The mobile sitemap has 1,000 URLs of which Google has indexed 74 of them. The pages are crawlable, the site structure is logical. And performing a Landing Page URL search (showing only Google/Organic source/medium) on Google Analytics I can see that hundreds of those mobile URLs are being landed on. A search on mobile for a longtail keyword from a (randomly selected) page shows a result in the SERPs for the mobile page that judging by GSC has not been indexed. Could this be because we have recently added rel=alternate tags on our desktop pages (and of course corresponding canonical ones on mobile). Would Google then 'not index' rel=alternate page versions? Thanks for any input on this one. PmHmG
Technical SEO | | AlisonMills0 -
Spam URL'S in search results
We built a new website for a client. When I do 'site:clientswebsite.com' in Google it shows some of the real, recently submitted pages. But it also shows many pages of spam url results, like this 'clientswebsite.com/gockumamaso/22753.htm' - all of which then go to the sites 404 page. They have page titles and meta descriptions in Chinese or Japanese too. Some of the urls are of real pages, and link to the correct page, despite having the same Chinese page titles and descriptions in the SERPS. When I went to remove all the spammy urls in Search Console (it only allowed me to temporarily hide them), a whole load of new ones popped up in the SERPS after a day or two. The site files itself are all fine, with no errors in the server logs. All the usual stuff...robots.txt, sitemap etc seems ok and the proper pages have all been requested for indexing and are slowly appearing. The spammy ones continue though. What is going on and how can I fix it?
Technical SEO | | Digital-Murph0 -
My old URL's are still indexing when I have redirected all of them, why is this happening?
I have built a new website and have redirected all my old URL's to their new ones but for some reason Google is still indexing the old URL's. Also, the page authority for all of my pages has dropped to 1 (apart from the homepage) but before they were between 12 to 15. Can anyone help me with this?
Technical SEO | | One2OneDigital0 -
Redirecting old Sitemaps to a new XML
I've discovered a ton of 404s from Google's WMT crawler looking for mydomain.com/sitemap_archive_MONTH_YEAR. There are tons of these monthly archive xmls. I've used a plugin that for some reason created individual monthly archive xml sitemaps and now I get 404s. Creating rules for each archive seems a bad solution. My current sitemap plugin creates a single clean one mydomain.com/sitemap_index.xml. How can I create a redirect rule in the Redirection WP plugin that will redirect any URL that has the 'sitemap' and 'xml' string in it to my current xml sitemap? I've tried using a wildcard like so: mysite.com/sitemap*.*, mysite.com/sitemap ., mysite.com/sitemap(.), mysite.com/sitemap (.) but none of the wildcard uses got the general redirect to work. Is there a way to make this happen with the WP Redirection plugin? If not, is there a htaccess rule, and what would the code be for it? Im not very fluent with using general redirects in htaccess unfortunately. Thanks!
Technical SEO | | IgorMateski0 -
I can buy a domain from a competitor. Whats the best way to make good use of these links for my existing website
I can buy a domain from a competitor. Whats the best way to make good use of these links for my existing website
Technical SEO | | Archers0 -
Sitemap for dynamic website with over 10,000 pages
If I have a website with thousands of products, is it a good idea to create a sitemap for this website for the search engines where you show maybe 250 products on a page so it makes it easy for the search engine to find the part and also puts that part closer to the home page? Seems like google likes pages that are the closest to the home page (less clicks the better)
Technical SEO | | roundbrix0 -
Does 'framing' a website create duplicate content?
Something I have not come across before, but hope others here are able offer advice based on experience: A client has independently created a series of mini-sites, aimed at targeting specific locations. The tactic has worked very well and they have achieved a large amount of well targeted traffic as a result. Each mini-site is different but then in the nav, if you want to view prices or go to the booking page, that then links to what at first appears to be their main site. However, you then notice that the URL is actually situated on the mini-site. What they have done is 'framed' the main site so that it appears exactly the same even when navigating through this exact replica site. Checking the code, there is almost nothing there - in fact there is actually no content at all. Below the head, there is a piece of code: <frameset rows="*" framespacing=0 frameborder=0> <frame src="[http://www.example.com](view-source:http://www.yellowskips.com/)" frameborder=0 marginwidth=0 marginheight=0> <noframes>Your browser does not support frames. Click [here](http://www.example.com) to view.noframes> frameset> Given that main site content does not appear to show in the source code, do we have an issue with duplicate content? This issue is that these 'referrals' are showing in Analytics, despite the fact that the code does not appear in the source, which is slightly confusing for me. They have done this without consultation and I'm very concerned that this could potentially be creating duplicate content of their ENTIRE main site on dozens of mini-sites. I should also add that there are no links to the mini-sites from the main site, so if you guys advise that this is creating duplicate content, I would not be worried about creating a link-wheel if I advise them to link directly to the main site rather than the framed pages. Thanks!
Technical SEO | | RiceMedia0