How can I tell Google, that a page has not changed?
-
Hello,
we have a website with many thousands of pages. Some of them change frequently, some never. Our problem is, that googlebot is generating way too much traffic. Half of our page views are generated by googlebot.
We would like to tell googlebot, to stop crawling pages that never change. This one for instance:
http://www.prinz.de/party/partybilder/bilder-party-pics,412598,9545978-1,VnPartypics.html
As you can see, there is almost no content on the page and the picture will never change.So I am wondering, if it makes sense to tell google that there is no need to come back.
The following header fields might be relevant. Currently our webserver answers with the following headers:
Cache-Control:
no-cache, must-revalidate, post-check=0, pre-check=0, public
Pragma:no-cache
Expires:Thu, 19 Nov 1981 08:52:00 GMT
Does Google honor these fields? Should we remove no-cache, must-revalidate, pragma: no-cache and set expires e.g. to 30 days in the future?
I also read, that a webpage that has not changed, should answer with 304 instead of 200. Does it make sense to implement that? Unfortunatly that would be quite hard for us.
Maybe Google would also spend more time then on pages that actually changed, instead of wasting it on unchanged pages.
Do you have any other suggestions, how we can reduce the traffic of google bot on unrelevant pages?
Thanks for your help
Cord
-
Unfortunately, I don't think there are many reliable options, in the sense that Google will always honor them. I don't think they gauge crawl frequency by the "expires" field - or, at least, it carries very little weight. As John and Rob mentioned, you can set the "changefreq" in the XML sitemap, but again, that's just a hint to Google. They seem to frequently ignore it.
If it's really critical, a 304 probably is a stronger signal, but I suspect even that's hit or miss. I've never seen a site implement it on a large scale (100s or 1000s of pages), so I can't speak to that.
Two broader questions/comments:
(1) If you currently list all of these pages in your XML sitemap, consider taking them out. The XML sitemap doesn't have to contain every page on your site, and in many cases, I think it shouldn't. If you list these pages, you're basically telling Google to re-crawl them (regardless of the changefreq setting).
(2) You may have overly complex crawl paths. In other words, it may not be the quantity of pages that's at issue, but how Google accesses those pages. They could be getting stuck in a loop, etc. It's going to take some research on a large site, but it'd be worth running a desktop crawler like Xenu or Screaming Frog. This could represent a site architecture problem (from an SEO standpoint).
(3) Should all of these pages even be indexed at all, especially as time passes? More and more (especially post-Panda), more indexed pages is often worse. If Googlebot is really hitting you that hard, it might be time to canonicalize some older content or 301-redirect it to newer, more relevant content. If it's not active at all, you could even NOINDEX or 404 it.
-
Thanks for the answers so far. The tips are not really solving my problems yet, though: I don't want to set down general crawling speed in the webmaster tools, because pages that frequently change should also be crawled frequently. We do have XML Sitemaps, although we did not include these picture pages, as in our example. There are ten- maybe houndreds- of thousands of these pages. If everyone agrees on this, we can include these pages in our XML Sitemaps of course. Using "meta refresh" to indicate, that the page never changed, seems a bit odd to me. But I'll look into it.
But what about the http headers, I asked about? Does anyone have any ideas on that?
-
Your best bet is to build an Excel report using a crawl tool (like Xenu, Frog, Moz, etc), and export that data. Then look to map out the pages you want to log and mark as 'not changing'.
Make sure to built (or have a functioning XML sitemap file) for the site, and as John said, state which URL's NEVER change. Over time, this will tell googlebot that it isn't neccessary yo crawl those page URL's as they never change.
You could also place a META REFRESH tag on those individual pages, and set that to never as well.
Hope some of this helps! Cheers
-
If you have Google Webmaster Tools set up, go to Site configuration > Settings, and you can set a custom crawl rate for you site. That will change it site-wide, so if you have other pages that change frequently, that might not be so great for you.
Another thing you could try is generate a sitemap, and set a change frequency of never (or yearly) for all of the pages you don't expect to change. That also might slow down Google's crawl rate of those pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Search Console Showing 404 errors for product pages not in sitemap?
We have some products with url changes over the past several months. Google is showing these as having 404 errors even though they are not in sitemap (sitemap shows the correct NEW url). Is this expected? Will these errors eventually go away/stop being monitored by Google?
Technical SEO | | woshea0 -
Can we re rank our Penalyzed website in Google?
Hello This is Maqbul, from India. I have a jobs portal blog [ bharatrecruit.com]. It was getting around 50K to 100K Views a Day and made me $100 a day. But after a few months, my competitor made negative SEO with 12,000 Spammy backlinks. Suddenly my site was hit by Google and now it is getting 200 to 300 Pageviews a day. So the question is I did not disavow bad links for a long time like 3 to 4 months. Now I disavow all the bad links but the website is not ranking. Can we re-rank this site or create another website. Please reply must. None of the bloggers can answer this. Thanks, Regards Maqbul
Technical SEO | | vinaso960 -
Get List Of All Indexed Google Pages
I know how to run site:domain.com but I am looking for software that will put these results into a list and return server status (200, 404, etc). Anyone have any tips?
Technical SEO | | InfinityTechnologySolutions0 -
Google Seeing Way More Pages Than My Site Actually Has
For one of my sites, A-1 Scuba Diving And Snorkeling Adventures, Google is seeing way more pages than I actually have. It sees almost 550 pages but I only have about 50 pages in my XML. I am sure this is an error on my part. Here is the search results that show all my pages. Can anyone give me some guidance on what I did wrong. Is it a canonical url problem, a redirect problem or something else. Built on Wordpress. Thanks in advance for any help you can give. I just want to make sure I am delivering everything I can for the client.
Technical SEO | | InfinityTechnologySolutions0 -
Is this okay with google if i can access my sub categories from two different path?
My website is url is abcd.com. One of my category url is abcd.com/mobile.aspx. Which contains 5 sub categories :- samung Mobile 2) Nokia Mobile 3) Sony Mobile 4) HTC Mobile 5) Blackberry Mobile Now if i go in to HTC Mobile sub categories i.e. abcd.com/htcmobile.aspx here i will see all the product related to HTC Mobile. But at below of all product i will find all sub categories that is samsung mobile, nokia mobile, sony mobile and blackberry mobile. So i want to task is this okay? Google will not count these categories as duplicate that is i can access all 4 categories i.e. samsung, nokia, sony and blackberry from here 1) abcd.com/mobile.aspx and 2) abcd.com/htcmobile.aspx Thanks! Dev
Technical SEO | | devdan0 -
From page 1th to page 18th @ Google
Hello Mozzers! I have a question, you may help.. How may it be possible that a page ranking well (1th result) goes from 1th result to the 18th page just in 1 day? It doesnt seem to be any kind of penalization.. I now had all suspicious outgoing links to be nofollow (they were not before), this may be a cause .. (?) Do you have any other suggestion? Thanks
Technical SEO | | socialengaged0 -
How Does Google's "index" find the location of pages in the "page directory" to return?
This is my understanding of how Google's search works, and I am unsure about one thing in specific: Google continuously crawls websites and stores each page it finds (let's call it "page directory") Google's "page directory" is a cache so it isn't the "live" version of the page Google has separate storage called "the index" which contains all the keywords searched. These keywords in "the index" point to the pages in the "page directory" that contain the same keywords. When someone searches a keyword, that keyword is accessed in the "index" and returns all relevant pages in the "page directory" These returned pages are given ranks based on the algorithm The one part I'm unsure of is how Google's "index" knows the location of relevant pages in the "page directory". The keyword entries in the "index" point to the "page directory" somehow. I'm thinking each page has a url in the "page directory", and the entries in the "index" contain these urls. Since Google's "page directory" is a cache, would the urls be the same as the live website (and would the keywords in the "index" point to these urls)? For example if webpage is found at wwww.website.com/page1, would the "page directory" store this page under that url in Google's cache? The reason I want to discuss this is to know the effects of changing a pages url by understanding how the search process works better.
Technical SEO | | reidsteven750 -
Google sees 2 home pages while I only have 1
How to solve the problem of google seeing both domain.com and domain.com/index.htm when I only have one file? Will the cannonical work? If so which? Or any other solutions for a novice? I learned from previous blogs that it needs to be done by hosting service, but Yahoo has no solution.
Technical SEO | | Kurtyj0